Open a large CSV file + make queries
Solved
Julien002
-
julien002 -
julien002 -
Good evening,
I have 3 CSV files that are too large (one CSV with over 25 million lines).
I need to open them to make a simple query to find out how many times certain companies appear in a column.
However, I don't know how to open these documents and perform this query.
Excel cannot open files that are too large, and I don't have knowledge of programming or databases.
Do you have a solution? Thank you in advance
Julien
I have 3 CSV files that are too large (one CSV with over 25 million lines).
I need to open them to make a simple query to find out how many times certain companies appear in a column.
However, I don't know how to open these documents and perform this query.
Excel cannot open files that are too large, and I don't have knowledge of programming or databases.
Do you have a solution? Thank you in advance
Julien
5 réponses
Hello,
With a recent version of "Excel," you can go to 'Data / From a Text/CSV file' to process it with Power Query.
Import your file, at the bottom click on 'Transform Data.'
Select your column and 'Group By,' then confirm.
Right-click on the columns of the table if you want to add any treatments; otherwise, click 'Close & Load' to retrieve your table in a sheet.
Eric
--
By continuously trying, we eventually succeed.
So the more it fails, the more chances we have that it works. (the Shadoks)
In addition to the thank you (yes, yes, it happens!!!), remember to mark it as resolved. Thank you.
With a recent version of "Excel," you can go to 'Data / From a Text/CSV file' to process it with Power Query.
Import your file, at the bottom click on 'Transform Data.'
Select your column and 'Group By,' then confirm.
Right-click on the columns of the table if you want to add any treatments; otherwise, click 'Close & Load' to retrieve your table in a sheet.
Eric
--
By continuously trying, we eventually succeed.
So the more it fails, the more chances we have that it works. (the Shadoks)
In addition to the thank you (yes, yes, it happens!!!), remember to mark it as resolved. Thank you.
Hello,
Excel is limited to just over a million rows.
Check with Access.
--
a stranger is a friend we haven't met yet.
Excel is limited to just over a million rows.
Check with Access.
--
a stranger is a friend we haven't met yet.
Hello Eric,
You will find below the link with dummy data:
https://www.cjoint.com/c/JFvnfS7RuAH
My request is to keep only the rows where there is "19" in the "date" column and to retrieve the count of occurrences of the values present in the "Company" column.
Thank you in advance
Julien
You will find below the link with dummy data:
https://www.cjoint.com/c/JFvnfS7RuAH
My request is to keep only the rows where there is "19" in the "date" column and to retrieve the count of occurrences of the values present in the "Company" column.
Thank you in advance
Julien
I wasn't familiar with Power Query in Excel. Like any tool, you need to learn the basics before diving in.
https://docs.microsoft.com/en-us/powerquery-m/
Here’s what I applied, following the instructions from eriiic, in Excel 2016: Data/New Query/From File/CSV File, provide the CSV file location.
Once the file is displayed click Edit.
On the first date, right-click/Date Filter/After or Equal, this generates a query in the right block "Filtered Rows", click on this query name, you can then select >=01/01/2019 and <01/01/2020, which gives you all of 2019.
In the menu bar "Group By"/Company/Count/Count.
Then close and load, we then have the result in Excel:
And if we go back to PQ, on "Advanced Editor", we obtain the query, where we find the selection and grouping. I think it would be wise to learn the language for more advanced needs that are not covered by guided queries.
By importing the CSV file again and applying the above code in "Advanced Editor", we get the result without going through the guided steps. The code is therefore portable.
https://docs.microsoft.com/en-us/powerquery-m/
Here’s what I applied, following the instructions from eriiic, in Excel 2016: Data/New Query/From File/CSV File, provide the CSV file location.
Once the file is displayed click Edit.
On the first date, right-click/Date Filter/After or Equal, this generates a query in the right block "Filtered Rows", click on this query name, you can then select >=01/01/2019 and <01/01/2020, which gives you all of 2019.
In the menu bar "Group By"/Company/Count/Count.
Then close and load, we then have the result in Excel:
Company Count Apple 2 Cisco 1 Microsoft 1
And if we go back to PQ, on "Advanced Editor", we obtain the query, where we find the selection and grouping. I think it would be wise to learn the language for more advanced needs that are not covered by guided queries.
let Source = Csv.Document(File.Contents("D:\JFvnfS7RuAH_test.csv"),[Delimiter=",", Columns=3, Encoding=1252, QuoteStyle=QuoteStyle.None]), #"Promoted Headers" = Table.PromoteHeaders(Source, [PromoteAllScalars=true]), #"Modified Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"Company", type text}, {"Date", type date}, {"City", type text}}), #"Filtered Rows" = Table.SelectRows(#"Modified Type", each [Date] >= #date(2019, 1, 1) and [Date] < #date(2020, 1, 1)), #"Grouped Rows" = Table.Group(#"Filtered Rows", {"Company"}, {{"Count", each Table.RowCount(_), type number}}) in #"Grouped Rows" By importing the CSV file again and applying the above code in "Advanced Editor", we get the result without going through the guided steps. The code is therefore portable.
I have one last request.
Thanks to Eric, I was able to retrieve the number of occurrences of the values present in one column.
I would like to do the same thing after applying a filter in another column. Basically, in another column, there are dates in the format dd/mm/YY. I would like to keep the values from 2019 and then repeat the process to retrieve the number of occurrences in one column.
Thank you in advance
Julien
if you want us to test, upload a csv with some dummy values.
cjoint.com and paste the provided link here.
eric