Read an Excel file without opening it with VBA

Solved
kenza -  
f894009 Posted messages 17417 Registration date   Status Membre Last intervention   -
Hello,

I want to extract data from a very large Excel file (read the file without opening it).

The data I want to extract contains the following information:

Cells from the code-techno field = '4GF'. --> [this field is located in column D]

Can you please help me find a solution with VBA?

Thank you in advance for your assistance.

Best regards,
Kenza

14 réponses

yg_be Posted messages 23437 Registration date   Status Contributeur Last intervention   Ambassadeur 1 588
 
Hello,
I don't really see how it's possible without opening it.
0
Kenza
 
The file is very large, it exceeds its capacity so it is not possible to open it; that's why I'm looking for a solution to extract this information using VBA code.
0
yg_be Posted messages 23437 Registration date   Status Contributeur Last intervention   1 588 > Kenza
 
It means "it exceeds its capacity."
What is the exact format of this file?
Your explanation is not very clear; I assume you want to start by examining all the values in column D?
0
Kenza > yg_be Posted messages 23437 Registration date   Status Contributeur Last intervention  
 
A file that has more than 1,000,000 lines cannot be opened when I try to open it, so I need to use a split to break it down. However, I do not want to use split, which is why I am looking for a way to process this file.

For the file format, it is a CSV extension.

Regarding your third question, yes, we need to examine the entire column D and search for the word '4GF', then extract the entire rows (copy them to another Excel file, for example).
0
yg_be Posted messages 23437 Registration date   Status Contributeur Last intervention   1 588 > Kenza
 
""I can't" isn't very factual. What happens when you try?
A CSV file can simply be opened as a text file. Have you tried that?
0
Kenza > yg_be Posted messages 23437 Registration date   Status Contributeur Last intervention  
 
In fact, what I want to know is whether we can process or extract data from a file without opening it; we can, for example, just mention the location and the name of the file.
0
f894009 Posted messages 17417 Registration date   Status Membre Last intervention   1 717
 
Hello to both of you,

yg_be, how's it going?

Kenza:
Which Excel do you have?
0
kenza
 
Hello,

the Excel version: 2002
the version number: 12527.21330
0
f894009 Posted messages 17417 Registration date   Status Membre Last intervention   1 717
 
Re,

Okay, I’m not familiar with that version. Have you already programmed in VBA with your Excel?
0
Kenza > f894009 Posted messages 17417 Registration date   Status Membre Last intervention  
 
Yes, I have programmed before.
0
Patrice33740 Posted messages 8400 Registration date   Status Membre Last intervention   1 781
 
Hello,

We always have to open a file to access its content!
To claim that we access a closed file is a misuse of language.

We can access a file without needing to load it into an application (unlike Excel or a text editor), for example by directly accessing the content with Open pathname For ... or by opening a stream with ADODB.Stream.
We can then use line-by-line reading to extract only what we want.

Your request is rather vague; what exactly do you want to extract?
Only data from one field, multiple fields, complete records?
What are the criteria: equals, contains, ...?

That said, CSV is not a standardized format; it needs to be specified:
- the record (line) separator
- the field (cell) separator
- the text identifier (in case the text contains one of the above separators).
- the decimal separator (point or comma)
- the thousands separator if it is used.

Best regards
Patrice

No one can possess all knowledge; that’s why we share it.
0
Kenza
 
Yes, you are absolutely right, I just didn't express my problem clearly.

I want to extract data from an Excel file without having to load the workbook. I just want to process this file and retrieve data without opening it.

Regarding the data I want to extract:
There is a field called code_techno that is located in column D which contains several pieces of information; I want to apply a filter on this field and keep only the info "4GF" and then copy all the other columns and paste them into another Excel file.

Thank you for your help @patrice :D
0
Patrice33740 Posted messages 8400 Registration date   Status Membre Last intervention   1 781 > Kenza
 
With a csv file, it's quite simple, but with an Excel file (especially with the old non-XML formats), it's much more complex!
0
Kenza > Patrice33740 Posted messages 8400 Registration date   Status Membre Last intervention  
 
How can we do it with a text file, please?
0
f894009 Posted messages 17417 Registration date   Status Membre Last intervention   1 717
 
Hello again everyone

Kenza:
data from an excel file
It's a csv (text) file or Excel because you switch between the two depending on the moment

Do you know VBA programming?
0
Kenza
 
RE

Not a text file --> CSV extension
In terms of VBA programming, I am a beginner
0
Patrice33740 Posted messages 8400 Registration date   Status Membre Last intervention   1 781 > Kenza
 
Je ne peux pas répondre à cette demande. Veuillez fournir le texte à traduire.
0
Kenza > Patrice33740 Posted messages 8400 Registration date   Status Membre Last intervention  
 
All fields are separated by '"' ; '"
0
f894009 Posted messages 17417 Registration date   Status Membre Last intervention   1 717 > Kenza
 
Hello everyone,

Kenza:
You must answer all the questions from Patrice3340 plus one:
The "lines" in the file all have the same number of "fields" (delimited format)
0
Patrice33740 Posted messages 8400 Registration date   Status Membre Last intervention   1 781 > Kenza
 
"All fields are separated by "' ; '""
This confirms that this "csv" file is not in the "standard" format; csv means in French, values separated by commas.
It is therefore important to answer all questions, including that of f894009.
In general, the rules are as follows:
- the line separator is the pair CR LF (13 10 or in hex 0D 0A) but it can simply be CR or LF.
- the field separator is the comma, but in French-speaking areas, it is often the semicolon, especially when the comma is used as a decimal separator.
- the text identifier is the double quote (") but sometimes it is the apostrophe ('). It is not necessarily used. It is mandatory when a field contains a line separator and/or a field separator; in this case, if the field contains a text identifier, it is doubled. Sometimes it is systematically placed on all fields.
- the decimal separator is the point, but when the field separator is the semicolon, the decimal separator is often the comma.
- the thousands separator is the apostrophe (') but it is not always used, and when the decimal separator is the comma, the thousands separator is a space.
- some csv files advance to the next record (to the next line) when all the following fields are empty, hence the question of f894009 regarding the number of fields in each line.

As you can see, there is no standard csv, hence the importance of knowing the structure of your file precisely. This requires analyzing the file with a text editor and hexadecimal code.

On the other hand, to extract the lines, you need to know the extraction criteria precisely.

Edit: CR stands for carriage return and LF for line feed, like on old typewriters.
0
kenza
 
OK, it's done.
You can find the file at this link:

https://mon-partage.fr/f/ZMFBRCBV/
0
yg_be Posted messages 23437 Registration date   Status Contributeur Last intervention   1 588
 
Would it help you to have a VBA code that creates another CSV file, containing only the rows with code_techno = 4GF?
0
Kenza > yg_be Posted messages 23437 Registration date   Status Contributeur Last intervention  
 
Yes, that's what I'm looking for!
0
yg_be Posted messages 23437 Registration date   Status Contributeur Last intervention   1 588 > Kenza
 
suggestion:
Option Explicit Sub kenza4gf() Dim source_file As String Dim dest_file As String Dim numin As Long, numout As Long, valss() As String, line As String source_file = ThisWorkbook.Path + "\file.csv" dest_file = ThisWorkbook.Path + "\file4gf.csv" numin = FreeFile 'open the file for reading Open source_file For Input As #numin numout = FreeFile Open dest_file For Output As #numout 'loop until the end of the file While Not EOF(numin) Line Input #numin, line valss = Split(line, ";", 5) If UBound(valss) > 2 Then If valss(3) = "4GF" Then Print #numout, line End If End If Wend Close #numin 'close the file Close #numout End Sub
0
Patrice33740 Posted messages 8400 Registration date   Status Membre Last intervention   1 781
 
Hello

An example with a stream:
https://mon-partage.fr/f/6rAx7eFH/

The code:
'———————————————————————————————————————————————————————————————————————————————————————————————————————————————— ' Module: mImportCsvFiltre ' Object: Reading and filtering line by line of a large UTF8 csv file '———————————————————————————————————————————————————————————————————————————————————————————————————————————————— ' 12/21/2020 Patrice33740 V1-0-00 Creation ' ' Establish the reference to ADODB: Microsoft ActiveX Data Objects 6.1 Library ' Option Explicit ' ' Separators Const sepL$ = vbCrLf 'line separator Const sepV$ = ";" 'value separator Const idTxt$ = """" 'text identifier chr(34) ' Sub Read_csv_UTF8_filter() ' File choice and reading csv while filtering ' Dim wbk As Workbook Dim fullName As Variant, column As Long, filter As Variant On Error Resume Next column = Feuil1.Columns(Feuil1.Range("E4").Value).Column If Err <> 0 Then MsgBox "Incorrect column", vbCritical: Exit Sub On Error GoTo 0 If Feuil1.Range("E5").Text = "" Then MsgBox "Specify the value to filter", vbExclamation: Exit Sub filter = Feuil1.Range("E5").Value ' File choice fullName = ChooseFile(".csv", ThisWorkbook.Path & "\") If fullName = "" Then Exit Sub Set wbk = Read_Filter_csv_UTF8(fullName, column, filter) wbk.Saved = True End Sub Private Function Read_Filter_csv_UTF8(ByVal fullFileName As String, col As Long, filter As Variant) As Workbook ' Reading and filtering line by line of a [very large] csv file encoded in UTF8 (with or without BOM) ' Dim fUtf8 As ADODB.Stream Dim wbk As Excel.Workbook Dim cel As Range Dim txt As String Dim lgn As String Dim lgr As Long Application.ScreenUpdating = False Application.Calculation = xlCalculationManual Set wbk = Application.Workbooks.Add(xlWBATWorksheet) Set cel = wbk.Worksheets(1).Range("A1") Set fUtf8 = New Stream With fUtf8 .Charset = "utf-8" .Mode = adModeReadWrite .Type = adTypeText .LineSeparator = adCRLF .Open .LoadFromFile fullFileName Do Until .EOS lgn = .ReadText(-2) '-2 = one line If cel.Row = 1 Then Call WriteLineCSV(lgn, cel) Else Call FilterLineCSV(lgn, cel, col, filter) End If Loop .Close End With Set fUtf8 = Nothing wbk.Worksheets(1).Columns.AutoFit wbk.Worksheets(1).Rows.AutoFit Application.Calculation = xlCalculationAutomatic Application.ScreenUpdating = True Set Read_Filter_csv_UTF8 = wbk End Function Private Sub FilterLineCSV(lgn As String, cel As Range, col As Long, filter As Variant) ' Filtering and writing a line of a csv file ' Dim txt As String Dim frm As String Dim lgr As Long Dim nbC As Long Dim t As Variant Dim i As Long If lgn = "" Then Exit Sub t = Split(lgn, sepV) If t(col - 1) = filter Then For i = LBound(t) To UBound(t) cel.Offset(0, nbC).FormulaLocal = t(i) nbC = nbC + 1 Next i Set cel = cel.Offset(1) End If End Sub Private Sub WriteLineCSV(lgn As String, cel As Range) ' Writing a line of a csv file ' Dim txt As String Dim frm As String Dim lgr As Long Dim nbC As Long Dim t As Variant Dim i As Long If lgn = "" Then Exit Sub t = Split(lgn, sepV) For i = LBound(t) To UBound(t) cel.Offset(0, nbC).FormulaLocal = t(i) nbC = nbC + 1 Next i Set cel = cel.Offset(1) End Sub Private Function ChooseFile(ByVal strExtension As String, Optional ByVal strPath As String = "") As String ' File selection ' Dim dlgBrowse As FileDialog If strPath = "" Then strPath = ThisWorkbook.Path Set dlgBrowse = Application.FileDialog(msoFileDialogFilePicker) With dlgBrowse .InitialFileName = strPath .Title = "Select a file " & strExtension & " :" .AllowMultiSelect = False .InitialView = msoFileDialogViewDetails .ButtonName = "Select file" If .Filters.Count > 0 Then .Filters.Delete .Filters.Add "Files " & strExtension, "*" & strExtension, 1 If .Show = -1 Then ChooseFile = .SelectedItems(1) Else ChooseFile = "" End With Set dlgBrowse = Nothing End Function


--
Best regards
Patrice

No one can hold all the knowledge, that's why we share it.
0
kenza
 
Thank you very much for this code, it works but not for my file which contains about 30,000,000 lines. When I executed the macro with this file, it crashes, which I think is normal because it exceeds the capacity of an Excel file (it can fill up to a million lines and then the code stops)

Is there no way to save the result (the filter) directly without displaying it on the workbook?

Or add a condition (if it exceeds the max number of a sheet, it creates another sheet for example)?
0
Patrice33740 Posted messages 8400 Registration date   Status Membre Last intervention   1 781
 
That's what the yg_be code does in the post #30
0
yg_be Posted messages 23437 Registration date   Status Contributeur Last intervention   1 588
 
What the code suggested in #30 does.
0
kenza > yg_be Posted messages 23437 Registration date   Status Contributeur Last intervention  
 
Thank you very much for your help, but the code suggested in #30 doesn't work for me.

Can I please ask you to upload it to the sharing site with my example? Maybe I made a mistake somewhere. Also, please let me know if it's possible to highlight the changes I need to make for the code to work on my end.

I apologize for all these questions; I'm really a beginner and I’m having trouble understanding this code.

Thanks once again for your responsiveness (@patrice and @yg_be) :D
0
yg_be Posted messages 23437 Registration date   Status Contributeur Last intervention   1 588 > kenza
 
The code must be in an Excel file located in the same folder as fichier.csv.
It creates a file named fichier4gf.csv, still in the same folder.
0
kenza > Patrice33740 Posted messages 8400 Registration date   Status Membre Last intervention  
 
@patrice, the code in post 30 doesn't work for me unfortunately.

your code works well, it's just that I can't figure out how to save the result without displaying it using your code.

can you please help me?
0
Patrice33740 Posted messages 8400 Registration date   Status Membre Last intervention   1 781
 
Re,

Here is an adaptation of yg_be's code:
https://mon-partage.fr/f/io41NRNu/

--
Best regards
Patrice

No one can hold all knowledge, that's why we share it.
0
Kenza
 
Thank you very much @patrice

The same problem has been running for 2 hours
I don't know if there is a way to optimize the time
0
Patrice33740 Posted messages 8400 Registration date   Status Membre Last intervention   1 781
 
The file needs to be split into smaller files.

At my place, it takes 12 seconds for 2 million lines.
15 x 12 = 3 minutes

--
Best regards
Patrice

No one can hold all the knowledge, that's why we share it.
0
kenza
 
He just finished the treatment after 4H
I got an error at the end:

runtime error 14
insufficient string space
0
yg_be Posted messages 23437 Registration date   Status Contributeur Last intervention   1 588
 
error on the split?
0
kenza > yg_be Posted messages 23437 Registration date   Status Contributeur Last intervention  
 
No, it's an error that appears after executing the VBA code from @patrice on my file that contains 120,000,000 lines.
0
yg_be Posted messages 23437 Registration date   Status Contributeur Last intervention   1 588 > kenza
 
On which line of code does the error occur?
0
Kenza > yg_be Posted messages 23437 Registration date   Status Contributeur Last intervention  
 
Normally it is on this line
Do while not EOF(n°F)
0
kenza
 
indeed on a 10 million file it works really well (I will just split my files into smaller files)

thank you all so much you are amazing.

problems solved :D
0
Patrice33740 Posted messages 8400 Registration date   Status Membre Last intervention   1 781
 
Glad it’s working, hopefully it lasts...
1 million lines 3 days ago, 30 million yesterday, 120 million today...
Your file is growing very quickly!
0
kenza > Patrice33740 Posted messages 8400 Registration date   Status Membre Last intervention  
 
hhh yes

there has been a last-minute change, the code_techno field is now in column E and not in column D.

is it possible to adjust the program to take this change into account, please? I tried to do it but it doesn't work :(
0
yg_be Posted messages 23437 Registration date   Status Contributeur Last intervention   1 588 > kenza
 
What did you try?
0
yg_be Posted messages 23437 Registration date   Status Contributeur Last intervention   1 588 > yg_be Posted messages 23437 Registration date   Status Contributeur Last intervention  
 
I think we need to replace
colonne = Feuil1.Columns(Feuil1.Range("E4").Value).Column

with
colonne = 5
0
Patrice33740 Posted messages 8400 Registration date   Status Membre Last intervention   1 781
 
On the launch sheet in E4, just write E instead of D!
0
kenza
 
Alright

Thank you very much
0
f894009 Posted messages 17417 Registration date   Status Membre Last intervention   1 717
 
Hello everyone,

Change of "column"
By replacing the Split with an InStr, it would do the trick

 Do While Not EOF(n°F) Line Input #n°F, line If Not data Then ' Titles Print #n°C, line data = True Else If InStr(line, filter) Then ' Filtered line Print #n°C, line End If End If Loop 
0
Patrice33740 Posted messages 8400 Registration date   Status Membre Last intervention   1 781 > f894009 Posted messages 17417 Registration date   Status Membre Last intervention  
 
It's much less rigorous:
- it's no longer a search on the complete field, it also returns a partial field
- the searched string can be found in one of the other fields ...
0
yg_be Posted messages 23437 Registration date   Status Contributeur Last intervention   1 588 > Patrice33740 Posted messages 8400 Registration date   Status Membre Last intervention  
 
If we add the "";"" before and after the filter, it performs a search on the full field (in any column, indeed).
Could it be that the use of multiple splits leads to the error?
runtime error 14
string space insufficient
0
f894009 Posted messages 17417 Registration date   Status Membre Last intervention   1 717 > yg_be Posted messages 23437 Registration date   Status Contributeur Last intervention  
 
Re,
Thanks yg_be, it indeed locks things down better for this search.
0