Read an Excel file without opening it with VBA
Solved
kenza
-
f894009 Posted messages 17417 Registration date Status Membre Last intervention -
f894009 Posted messages 17417 Registration date Status Membre Last intervention -
Hello,
I want to extract data from a very large Excel file (read the file without opening it).
The data I want to extract contains the following information:
Cells from the code-techno field = '4GF'. --> [this field is located in column D]
Can you please help me find a solution with VBA?
Thank you in advance for your assistance.
Best regards,
Kenza
I want to extract data from a very large Excel file (read the file without opening it).
The data I want to extract contains the following information:
Cells from the code-techno field = '4GF'. --> [this field is located in column D]
Can you please help me find a solution with VBA?
Thank you in advance for your assistance.
Best regards,
Kenza
14 réponses
yg_be
Posted messages
23437
Registration date
Status
Contributeur
Last intervention
Ambassadeur
1 588
Hello,
I don't really see how it's possible without opening it.
I don't really see how it's possible without opening it.
Hello,
We always have to open a file to access its content!
To claim that we access a closed file is a misuse of language.
We can access a file without needing to load it into an application (unlike Excel or a text editor), for example by directly accessing the content with Open pathname For ... or by opening a stream with ADODB.Stream.
We can then use line-by-line reading to extract only what we want.
Your request is rather vague; what exactly do you want to extract?
Only data from one field, multiple fields, complete records?
What are the criteria: equals, contains, ...?
That said, CSV is not a standardized format; it needs to be specified:
- the record (line) separator
- the field (cell) separator
- the text identifier (in case the text contains one of the above separators).
- the decimal separator (point or comma)
- the thousands separator if it is used.
Best regards
Patrice
No one can possess all knowledge; that’s why we share it.
We always have to open a file to access its content!
To claim that we access a closed file is a misuse of language.
We can access a file without needing to load it into an application (unlike Excel or a text editor), for example by directly accessing the content with Open pathname For ... or by opening a stream with ADODB.Stream.
We can then use line-by-line reading to extract only what we want.
Your request is rather vague; what exactly do you want to extract?
Only data from one field, multiple fields, complete records?
What are the criteria: equals, contains, ...?
That said, CSV is not a standardized format; it needs to be specified:
- the record (line) separator
- the field (cell) separator
- the text identifier (in case the text contains one of the above separators).
- the decimal separator (point or comma)
- the thousands separator if it is used.
Best regards
Patrice
No one can possess all knowledge; that’s why we share it.
Yes, you are absolutely right, I just didn't express my problem clearly.
I want to extract data from an Excel file without having to load the workbook. I just want to process this file and retrieve data without opening it.
Regarding the data I want to extract:
There is a field called code_techno that is located in column D which contains several pieces of information; I want to apply a filter on this field and keep only the info "4GF" and then copy all the other columns and paste them into another Excel file.
Thank you for your help @patrice :D
I want to extract data from an Excel file without having to load the workbook. I just want to process this file and retrieve data without opening it.
Regarding the data I want to extract:
There is a field called code_techno that is located in column D which contains several pieces of information; I want to apply a filter on this field and keep only the info "4GF" and then copy all the other columns and paste them into another Excel file.
Thank you for your help @patrice :D
Hello again everyone
Kenza:
data from an excel file
It's a csv (text) file or Excel because you switch between the two depending on the moment
Do you know VBA programming?
Kenza:
data from an excel file
It's a csv (text) file or Excel because you switch between the two depending on the moment
Do you know VBA programming?
"All fields are separated by "' ; '""
This confirms that this "csv" file is not in the "standard" format; csv means in French, values separated by commas.
It is therefore important to answer all questions, including that of f894009.
In general, the rules are as follows:
- the line separator is the pair CR LF (13 10 or in hex 0D 0A) but it can simply be CR or LF.
- the field separator is the comma, but in French-speaking areas, it is often the semicolon, especially when the comma is used as a decimal separator.
- the text identifier is the double quote (") but sometimes it is the apostrophe ('). It is not necessarily used. It is mandatory when a field contains a line separator and/or a field separator; in this case, if the field contains a text identifier, it is doubled. Sometimes it is systematically placed on all fields.
- the decimal separator is the point, but when the field separator is the semicolon, the decimal separator is often the comma.
- the thousands separator is the apostrophe (') but it is not always used, and when the decimal separator is the comma, the thousands separator is a space.
- some csv files advance to the next record (to the next line) when all the following fields are empty, hence the question of f894009 regarding the number of fields in each line.
As you can see, there is no standard csv, hence the importance of knowing the structure of your file precisely. This requires analyzing the file with a text editor and hexadecimal code.
On the other hand, to extract the lines, you need to know the extraction criteria precisely.
Edit: CR stands for carriage return and LF for line feed, like on old typewriters.
This confirms that this "csv" file is not in the "standard" format; csv means in French, values separated by commas.
It is therefore important to answer all questions, including that of f894009.
In general, the rules are as follows:
- the line separator is the pair CR LF (13 10 or in hex 0D 0A) but it can simply be CR or LF.
- the field separator is the comma, but in French-speaking areas, it is often the semicolon, especially when the comma is used as a decimal separator.
- the text identifier is the double quote (") but sometimes it is the apostrophe ('). It is not necessarily used. It is mandatory when a field contains a line separator and/or a field separator; in this case, if the field contains a text identifier, it is doubled. Sometimes it is systematically placed on all fields.
- the decimal separator is the point, but when the field separator is the semicolon, the decimal separator is often the comma.
- the thousands separator is the apostrophe (') but it is not always used, and when the decimal separator is the comma, the thousands separator is a space.
- some csv files advance to the next record (to the next line) when all the following fields are empty, hence the question of f894009 regarding the number of fields in each line.
As you can see, there is no standard csv, hence the importance of knowing the structure of your file precisely. This requires analyzing the file with a text editor and hexadecimal code.
On the other hand, to extract the lines, you need to know the extraction criteria precisely.
Edit: CR stands for carriage return and LF for line feed, like on old typewriters.
suggestion:
Option Explicit Sub kenza4gf() Dim source_file As String Dim dest_file As String Dim numin As Long, numout As Long, valss() As String, line As String source_file = ThisWorkbook.Path + "\file.csv" dest_file = ThisWorkbook.Path + "\file4gf.csv" numin = FreeFile 'open the file for reading Open source_file For Input As #numin numout = FreeFile Open dest_file For Output As #numout 'loop until the end of the file While Not EOF(numin) Line Input #numin, line valss = Split(line, ";", 5) If UBound(valss) > 2 Then If valss(3) = "4GF" Then Print #numout, line End If End If Wend Close #numin 'close the file Close #numout End Sub
Hello
An example with a stream:
https://mon-partage.fr/f/6rAx7eFH/
The code:
--
Best regards
Patrice
No one can hold all the knowledge, that's why we share it.
An example with a stream:
https://mon-partage.fr/f/6rAx7eFH/
The code:
'———————————————————————————————————————————————————————————————————————————————————————————————————————————————— ' Module: mImportCsvFiltre ' Object: Reading and filtering line by line of a large UTF8 csv file '———————————————————————————————————————————————————————————————————————————————————————————————————————————————— ' 12/21/2020 Patrice33740 V1-0-00 Creation ' ' Establish the reference to ADODB: Microsoft ActiveX Data Objects 6.1 Library ' Option Explicit ' ' Separators Const sepL$ = vbCrLf 'line separator Const sepV$ = ";" 'value separator Const idTxt$ = """" 'text identifier chr(34) ' Sub Read_csv_UTF8_filter() ' File choice and reading csv while filtering ' Dim wbk As Workbook Dim fullName As Variant, column As Long, filter As Variant On Error Resume Next column = Feuil1.Columns(Feuil1.Range("E4").Value).Column If Err <> 0 Then MsgBox "Incorrect column", vbCritical: Exit Sub On Error GoTo 0 If Feuil1.Range("E5").Text = "" Then MsgBox "Specify the value to filter", vbExclamation: Exit Sub filter = Feuil1.Range("E5").Value ' File choice fullName = ChooseFile(".csv", ThisWorkbook.Path & "\") If fullName = "" Then Exit Sub Set wbk = Read_Filter_csv_UTF8(fullName, column, filter) wbk.Saved = True End Sub Private Function Read_Filter_csv_UTF8(ByVal fullFileName As String, col As Long, filter As Variant) As Workbook ' Reading and filtering line by line of a [very large] csv file encoded in UTF8 (with or without BOM) ' Dim fUtf8 As ADODB.Stream Dim wbk As Excel.Workbook Dim cel As Range Dim txt As String Dim lgn As String Dim lgr As Long Application.ScreenUpdating = False Application.Calculation = xlCalculationManual Set wbk = Application.Workbooks.Add(xlWBATWorksheet) Set cel = wbk.Worksheets(1).Range("A1") Set fUtf8 = New Stream With fUtf8 .Charset = "utf-8" .Mode = adModeReadWrite .Type = adTypeText .LineSeparator = adCRLF .Open .LoadFromFile fullFileName Do Until .EOS lgn = .ReadText(-2) '-2 = one line If cel.Row = 1 Then Call WriteLineCSV(lgn, cel) Else Call FilterLineCSV(lgn, cel, col, filter) End If Loop .Close End With Set fUtf8 = Nothing wbk.Worksheets(1).Columns.AutoFit wbk.Worksheets(1).Rows.AutoFit Application.Calculation = xlCalculationAutomatic Application.ScreenUpdating = True Set Read_Filter_csv_UTF8 = wbk End Function Private Sub FilterLineCSV(lgn As String, cel As Range, col As Long, filter As Variant) ' Filtering and writing a line of a csv file ' Dim txt As String Dim frm As String Dim lgr As Long Dim nbC As Long Dim t As Variant Dim i As Long If lgn = "" Then Exit Sub t = Split(lgn, sepV) If t(col - 1) = filter Then For i = LBound(t) To UBound(t) cel.Offset(0, nbC).FormulaLocal = t(i) nbC = nbC + 1 Next i Set cel = cel.Offset(1) End If End Sub Private Sub WriteLineCSV(lgn As String, cel As Range) ' Writing a line of a csv file ' Dim txt As String Dim frm As String Dim lgr As Long Dim nbC As Long Dim t As Variant Dim i As Long If lgn = "" Then Exit Sub t = Split(lgn, sepV) For i = LBound(t) To UBound(t) cel.Offset(0, nbC).FormulaLocal = t(i) nbC = nbC + 1 Next i Set cel = cel.Offset(1) End Sub Private Function ChooseFile(ByVal strExtension As String, Optional ByVal strPath As String = "") As String ' File selection ' Dim dlgBrowse As FileDialog If strPath = "" Then strPath = ThisWorkbook.Path Set dlgBrowse = Application.FileDialog(msoFileDialogFilePicker) With dlgBrowse .InitialFileName = strPath .Title = "Select a file " & strExtension & " :" .AllowMultiSelect = False .InitialView = msoFileDialogViewDetails .ButtonName = "Select file" If .Filters.Count > 0 Then .Filters.Delete .Filters.Add "Files " & strExtension, "*" & strExtension, 1 If .Show = -1 Then ChooseFile = .SelectedItems(1) Else ChooseFile = "" End With Set dlgBrowse = Nothing End Function --
Best regards
Patrice
No one can hold all the knowledge, that's why we share it.
Thank you very much for this code, it works but not for my file which contains about 30,000,000 lines. When I executed the macro with this file, it crashes, which I think is normal because it exceeds the capacity of an Excel file (it can fill up to a million lines and then the code stops)
Is there no way to save the result (the filter) directly without displaying it on the workbook?
Or add a condition (if it exceeds the max number of a sheet, it creates another sheet for example)?
Is there no way to save the result (the filter) directly without displaying it on the workbook?
Or add a condition (if it exceeds the max number of a sheet, it creates another sheet for example)?
Thank you very much for your help, but the code suggested in #30 doesn't work for me.
Can I please ask you to upload it to the sharing site with my example? Maybe I made a mistake somewhere. Also, please let me know if it's possible to highlight the changes I need to make for the code to work on my end.
I apologize for all these questions; I'm really a beginner and I’m having trouble understanding this code.
Thanks once again for your responsiveness (@patrice and @yg_be) :D
Can I please ask you to upload it to the sharing site with my example? Maybe I made a mistake somewhere. Also, please let me know if it's possible to highlight the changes I need to make for the code to work on my end.
I apologize for all these questions; I'm really a beginner and I’m having trouble understanding this code.
Thanks once again for your responsiveness (@patrice and @yg_be) :D
Re,
Here is an adaptation of yg_be's code:
https://mon-partage.fr/f/io41NRNu/
--
Best regards
Patrice
No one can hold all knowledge, that's why we share it.
Here is an adaptation of yg_be's code:
https://mon-partage.fr/f/io41NRNu/
--
Best regards
Patrice
No one can hold all knowledge, that's why we share it.
The file needs to be split into smaller files.
At my place, it takes 12 seconds for 2 million lines.
15 x 12 = 3 minutes
--
Best regards
Patrice
No one can hold all the knowledge, that's why we share it.
At my place, it takes 12 seconds for 2 million lines.
15 x 12 = 3 minutes
--
Best regards
Patrice
No one can hold all the knowledge, that's why we share it.
He just finished the treatment after 4H
I got an error at the end:
runtime error 14
insufficient string space
I got an error at the end:
runtime error 14
insufficient string space
indeed on a 10 million file it works really well (I will just split my files into smaller files)
thank you all so much you are amazing.
problems solved :D
thank you all so much you are amazing.
problems solved :D
What is the exact format of this file?
Your explanation is not very clear; I assume you want to start by examining all the values in column D?
For the file format, it is a CSV extension.
Regarding your third question, yes, we need to examine the entire column D and search for the word '4GF', then extract the entire rows (copy them to another Excel file, for example).
A CSV file can simply be opened as a text file. Have you tried that?