Discover which product pairs sell together
Son1
-
Son1 -
Son1 -
[Expert Level Question]
Hello everyone,
Be careful, this question seems quite complex and requires a very good level of Excel (or database management)
I have a database (Excel) of ordered items with the order number on the line, as well as the quantity, as follows:
order number | product number | qty
I am looking to identify the pairs (of two or more items) that sell the most together within the same order.
Example:
order number | product number | qty
-----------------------------
1 | A | 1
1 | B | 1
1 | C | 1
2 | C | 1
2 | A | 1
3 | A | 1
3 | C | 1
Results: The most sold pairs together:
1 - Pair AC: with 3 counts
2 - Pair ABC: with 1 count
I am looking to perform this exercise via automation (Excel formula or macro) because I have a database of over 100,000 rows.
What do you think? Is it possible with Excel? Am I forced to process it programmatically via SQL by injecting the data into a database?
Hello everyone,
Be careful, this question seems quite complex and requires a very good level of Excel (or database management)
I have a database (Excel) of ordered items with the order number on the line, as well as the quantity, as follows:
order number | product number | qty
I am looking to identify the pairs (of two or more items) that sell the most together within the same order.
Example:
order number | product number | qty
-----------------------------
1 | A | 1
1 | B | 1
1 | C | 1
2 | C | 1
2 | A | 1
3 | A | 1
3 | C | 1
Results: The most sold pairs together:
1 - Pair AC: with 3 counts
2 - Pair ABC: with 1 count
I am looking to perform this exercise via automation (Excel formula or macro) because I have a database of over 100,000 rows.
What do you think? Is it possible with Excel? Am I forced to process it programmatically via SQL by injecting the data into a database?
5 réponses
Hello Eric, JvDo
Thank you Eric for this info about the wild post-crossing
So, I'm not going to bother writing code for Son1
This is what I had planned: virtual tables and 2 dictionary objects
Here’s the idea;
after sorting
in the first table (memory of the sheet), dico1 goes through the orders and generates the items (separated by a space) for each order, which will give the list of items ((dico.items) as needed, taking into account the number of sold pairs - for example, CA quantity 3 would give: CA CA CA
with dico2: we go through the item-list with a split of each line so that, for each pair encountered, we increment in item. The list of keys obtained (pair) next to the list of items corresponds to the count of where the pair is sold
apparently, it’s a job but not so complicated (I didn’t see the need to use SQL)
I'm leaving this discussion
For Son1
First of all, I don't think the request is feasible in Excel
So why ask the question on Excel forums?
Michel
Thank you Eric for this info about the wild post-crossing
So, I'm not going to bother writing code for Son1
This is what I had planned: virtual tables and 2 dictionary objects
Here’s the idea;
after sorting
in the first table (memory of the sheet), dico1 goes through the orders and generates the items (separated by a space) for each order, which will give the list of items ((dico.items) as needed, taking into account the number of sold pairs - for example, CA quantity 3 would give: CA CA CA
with dico2: we go through the item-list with a split of each line so that, for each pair encountered, we increment in item. The list of keys obtained (pair) next to the list of items corresponds to the count of where the pair is sold
apparently, it’s a job but not so complicated (I didn’t see the need to use SQL)
I'm leaving this discussion
For Son1
First of all, I don't think the request is feasible in Excel
So why ask the question on Excel forums?
Michel
Hello michel_m,
Thank you for your response on the subject; it further strengthens my decision.
No, of course, I'm not asking for help with the code; I am capable of doing it myself.
As for the question about Excel, no, I didn't initially know that it wasn't feasible on Excel; it's the result of reflection and brainstorming with people skilled in Excel that led me to this conclusion.
Regarding the wild post-crossing, I do not share your view at all: just because I spread information across multiple channels doesn't discredit each of those channels. On the contrary, it's much better for those who follow, as it will be easier for them to find the answer to this issue (and in fact, the "solution" came much more quickly thanks to post-crossing). You are having a visceral reaction to the post-crossing simply out of unconditional love for your preferred communication channel.
Thank you for your response on the subject; it further strengthens my decision.
No, of course, I'm not asking for help with the code; I am capable of doing it myself.
As for the question about Excel, no, I didn't initially know that it wasn't feasible on Excel; it's the result of reflection and brainstorming with people skilled in Excel that led me to this conclusion.
Regarding the wild post-crossing, I do not share your view at all: just because I spread information across multiple channels doesn't discredit each of those channels. On the contrary, it's much better for those who follow, as it will be easier for them to find the answer to this issue (and in fact, the "solution" came much more quickly thanks to post-crossing). You are having a visceral reaction to the post-crossing simply out of unconditional love for your preferred communication channel.
Good evening,
You should provide us with a more substantial table (10 or 15,000 rows) of your data (of course anonymized), and close to your reality.
Your example is too simple for you to generalize a solution for your 100,000 rows.
How many different product numbers do you have, how many different product numbers at most in an order, are the quantities always 1?
If you have 5 A, 2 B, and 7 C in an order, do you consider that you have 2 triplets (A, B, C), 5 pairs (A, C), 2 (A, B), and 2 (B, C)?
Best regards.
You should provide us with a more substantial table (10 or 15,000 rows) of your data (of course anonymized), and close to your reality.
Your example is too simple for you to generalize a solution for your 100,000 rows.
How many different product numbers do you have, how many different product numbers at most in an order, are the quantities always 1?
If you have 5 A, 2 B, and 7 C in an order, do you consider that you have 2 triplets (A, B, C), 5 pairs (A, C), 2 (A, B), and 2 (B, C)?
Best regards.
I do not agree; with this example, we can determine a process that will work over an infinite number of lines.
After reflection on the subject, the quantity of articles is not useful for the study.
The number of lines will only affect the performance of the calculation and the choice of technology: I have 1 million lines, and that makes me think that it will not be feasible in Excel...!
After reflection on the subject, the quantity of articles is not useful for the study.
The number of lines will only affect the performance of the calculation and the choice of technology: I have 1 million lines, and that makes me think that it will not be feasible in Excel...!
Hello,
it's a substantial task you're asking for.
Could you point out the other requests so we don't search for nothing if they've already been found elsewhere?
https://forum.excel-pratique.com/viewtopic.php?forum_uri=excel&t=101297&start=
eric
--
By continuously trying, we eventually succeed.
So the more it fails, the more chances we have that it works. (the Shadoks)
In addition to the thank you (yes, yes, it happens!!!), remember to mark as resolved. Thank you
it's a substantial task you're asking for.
Could you point out the other requests so we don't search for nothing if they've already been found elsewhere?
https://forum.excel-pratique.com/viewtopic.php?forum_uri=excel&t=101297&start=
eric
--
By continuously trying, we eventually succeed.
So the more it fails, the more chances we have that it works. (the Shadoks)
In addition to the thank you (yes, yes, it happens!!!), remember to mark as resolved. Thank you
Small answer to my own question:
- First of all, I think the request is not feasible in Excel, as it would require about 500,000 searches in a million rows: this would likely crash Excel (to be verified?).
- The solution I had already thought of and which has been confirmed to me on another forum is the following: Create an algorithm that sorts the orders and identifies the x pairs from each order (we insert them into a key-value dictionary), and for each of these x pairs, search in the million rows of orders to see if there is a presence of that pair and count it. (In the case where an already existing pair is identified in another order: do not overwrite the pair but increment it). I intend to do this in PL/SQL.
Well, this is certainly not the ideal solution, especially since it won't be done in Excel... If anyone has a better solution, I am of course open to suggestions!
- First of all, I think the request is not feasible in Excel, as it would require about 500,000 searches in a million rows: this would likely crash Excel (to be verified?).
- The solution I had already thought of and which has been confirmed to me on another forum is the following: Create an algorithm that sorts the orders and identifies the x pairs from each order (we insert them into a key-value dictionary), and for each of these x pairs, search in the million rows of orders to see if there is a presence of that pair and count it. (In the case where an already existing pair is identified in another order: do not overwrite the pair but increment it). I intend to do this in PL/SQL.
Well, this is certainly not the ideal solution, especially since it won't be done in Excel... If anyone has a better solution, I am of course open to suggestions!
The sorting is fine, however, I would like to point out that I have both product references as integers and others as varchar (so I'm not sorting the numbers in ascending order but in alphabetical order).