Compare two CSV files with Python

Solved
Lisana -  
Bruno83200_6929 Posted messages 724 Registration date   Status Member Last intervention   -

Hello,

I’m trying to compare 2 CSV files to extract the similarities into another file, but at the moment the program outputs the information from the 2 files.

Here is the program:

import csv
with open('Recherche.csv', 'r',encoding='utf-8') as t1, open('TravailSFE.csv', 'r',encoding='utf-8') as t2:
    fileone = t1.readlines()
    filetwo = t2.readlines()
with open('update.csv', 'w',encoding='utf-8') as outFile:
    for line in filetwo:
        if line in fileone:
            outFile.write(lines)

To explain, the Recherche file contains in the first column the company addresses and in the second column the company SIRENs.

And the Travail file contains just the addresses, and I would therefore like the update file to return the addresses that are similar so as to extract the SIRENs (I’m not sure if that’s clear enough ;))

If you can help me, I’d appreciate it.

13 answers

Bruno83200_6929 Posted messages 724 Registration date   Status Member Last intervention   170
 
Hello, If I understand correctly what you want to do. You have two CSV files: one contains company addresses with their SIREN, and the other contains only the addresses. You want to create an output file with the common addresses and the corresponding SIREN. If that’s right, you should first check that the CSV files are properly formatted and that the addresses in both files are in the same format to avoid errors during comparison. Next, you need to adapt your program to read the CSV files using the csv module to handle the data in a structured way. Create a dictionary from the first CSV file to store the addresses and SIREN. Compare the addresses from the second file with those in the dictionary. Write the common addresses and their SIREN to the output file. A small program like this:
import csv

# Lire le fichier Recherche.csv et créer un dictionnaire pour les adresses et SIREN
adresses_siren = {}
with open('Recherche.csv', 'r', encoding='utf-8') as recherche_file:
    reader = csv.reader(recherche_file)
    next(reader)  # Si votre fichier a une ligne d'en-tête, sinon retirez cette ligne
    for row in reader:
        adresse = row[0]
        siren = row[1]
        adresses_siren[adresse] = siren

# Lire le fichier TravailSFE.csv et comparer les adresses avec celles du dictionnaire
with open('TravailSFE.csv', 'r', encoding='utf-8') as travail_file, \
     open('update.csv', 'w', encoding='utf-8', newline='') as update_file:
    reader = csv.reader(travail_file)
    writer = csv.writer(update_file)

    # Écrire l'en-tête dans le fichier de sortie si nécessaire
    writer.writerow(['Adresse', 'SIREN'])

    for row in reader:
        adresse = row[0]
        if adresse in adresses_siren:
            writer.writerow([adresse, adresses_siren[adresse]])

print("La comparaison est terminée. Les résultats ont été écrits dans 'update.csv'.")
On utilise csv.reader pour lire les fichiers ligne par ligne. On passe les en-têtes avec next(reader) si votre fichier a des en-têtes. Le dictionnaire adresses_siren associe chaque adresse à son SIREN, facilitant la recherche rapide. On vérifie si chaque adresse du fichier TravailSFE.csv est présente dans le dictionnaire. Si elle l’est, on écrit l’adresse et le SIREN correspondant dans le fichier update.csv.
1
Lisana_69 Posted messages 20 Status Member
 

Thank you very much for your reply

0
Lisana_69 Posted messages 20 Status Member
 

I followed your program but it shows an error:

Traceback (most recent call last):
  File "C:\Users\l.rupert\PycharmProjects\Saleforce\TEST 47.py", line 9, in <module>
    siren = row[1]
            ~~~^^^
IndexError: list index out of range

0
Bruno83200_6929 Posted messages 724 Registration date   Status Member Last intervention   170
 

Well, I added the extra check in the code.

import csv # Read the file Recherche.csv and create a dictionary for addresses and SIREN addresses_siren = {} with open('Recherche.csv', 'r', encoding='utf-8') as recherche_file: reader = csv.reader(recherche_file) next(reader) # If your file has a header line, otherwise remove this line for row in reader: # Check that the line has exactly two columns (address and SIREN) if len(row) < 2: print(f"Row ignored (missing column) : {row}") continue # Move to the next row if the line does not contain 2 columns address = row[0] siren = row[1] adresses_siren[adresse] = siren # Read the file TravailSFE.csv and compare the addresses with those in the dictionary with open('TravailSFE.csv', 'r', encoding='utf-8') as travail_file, \ open('update.csv', 'w', encoding='utf-8', newline='') as update_file: reader = csv.reader(travail_file) writer = csv.writer(update_file) # Write the header to the output file if necessary writer.writerow(['Adresse', 'SIREN']) for row in reader: if len(row) == 0: # Check if the row is empty print(f"Empty row ignored : {row}") continue adresse = row[0] if adresse in adresses_siren: writer.writerow([adresse, adresses_siren[adresse]]) print("The comparison is finished. The results have been written to 'update.csv'.") 

1
Bruno83200_6929 Posted messages 724 Registration date   Status Member Last intervention   170
 

OK

The message you obtained shows that the address in the file contains unexpected characters. The ASCII code [32, 59, 59, 59, 59] represents the following characters:

32: a space ( ),
59: a semicolon (;).


This indicates that some lines in your CSV files contain unexpected or badly formatted characters, such as consecutive semicolons (;;;;). These characters may result from incorrect formatting in the original file or mishandling of the CSV file.

I will re-prepare a script for you. I’ll send it to you later. 


1
Bruno83200_6929 Posted messages 724 Registration date   Status Member Last intervention   170
 

IndexError: list index out of range means that the program is trying to access an element in a list (here row[1] for the SIREN), but the line in question does not contain enough elements (columns) to access that index.

This can happen for several reasons:

Some lines in your CSV file do not contain two columns.
There may be empty lines.


There may be formatting issues in the file (such as newline characters or incorrect delimiters).

Check that each line has exactly two columns in the file Recherche.csv.

Add error checks to handle empty or badly formatted lines.

I kept your code, I will come back and provide a version of the code that adds an extra check to ensure that each line contains at least two columns before attempting to access the indices.


0
Lisana_69 Posted messages 20 Status Member
 

Thank you very much for the help you are giving me ????

0
Lisana_69 Posted messages 20 Status Member
 
 import csv adresse_siren = {} with open('BDS.csv', 'r', encoding='utf-8') as recherche_file: reader = csv.reader(recherche_file) next(reader) for row in reader: adresse = row[0] siren = row[1] adresse_siren[adresse] = siren with open('BDT.csv', 'r', encoding='utf-8') as travail_file, \ open('update.csv', 'w', encoding='utf-8', newline='') as update_file: reader = csv.reader(travail_file) writer = csv.writer(update_file) writer.writerow(['adresse', 'siren']) for row in reader: adresse = row[0] if adresse in adresse_siren: writer.writerow([adresse, adresse_siren[adresse]]) print("La comparaison est terminée. Les résultats ont été écrits dans 'update.csv'.")

La première image est le CSV de BDS (anciennement recherche) et la deuxième est le CSV de BDT (Travail SFE).

Je vous ai aussi renvoyé le programme, peut être cela vous permettra d'identifier mieux mon erreur.

0
Bruno83200_6929 Posted messages 724 Registration date   Status Member Last intervention   170
 


Your program is very close to the desired result, but it could encounter some of the same issues previously mentioned.

It is necessary to check empty or poorly formatted lines: some CSV files may contain empty lines or lines with fewer columns than expected. It is always good to check before accessing column indices. That is why I modified the code have you try this:

import csv adresse_siren = {} # Read the BDS.csv file and create a dictionary for addresses and SIREN with open('BDS.csv', 'r', encoding='utf-8') as recherche_file: reader = csv.reader(recherche_file) next(reader) # Skip header row if it exists for row in reader: if len(row) < 2: # Check that the line contains at least 2 columns (address, siren) print(f"Ligne ignorée (colonne manquante ou mal formatée) : {row}") continue # Skip incorrect lines adresse = row[0].strip() # Remove extra spaces siren = row[1].strip() # Remove extra spaces adresse_siren[adresse] = siren # Read the BDT.csv file and compare addresses with those in the dictionary with open('BDT.csv', 'r', encoding='utf-8') as travail_file, \ open('update.csv', 'w', encoding='utf-8', newline='') as update_file: reader = csv.reader(travail_file) writer = csv.writer(update_file) writer.writerow(['adresse', 'siren']) # Write header to the output file for row in reader: if len(row) == 0: # Check if the line is empty print(f"Ligne vide ignorée : {row}") continue adresse = row[0].strip() # Clean up the address if adresse in adresse_siren: writer.writerow([adresse, adresse_siren[adresse]]) print("La comparaison est terminée. Les résultats ont été écrits dans 'update.csv'.")
0
Lisana_69 Posted messages 20 Status Member
 

I just tried the new program:

it displays the ignored lines in the Python console, but the update file is empty.

0
Bruno83200_6929 Posted messages 724 Registration date   Status Member Last intervention   170
 


If the update.csv file is empty, it means that the program did not find any matches between the addresses in the BDT.csv file and those in the BDS.csv file. This can be due to several reasons, namely:

Inconsistency in the address format (e.g., extra spaces, uppercase/lowercase letters, accents).


Case sensitivity issue (uppercase/lowercase): the addresses might be written differently in the two files, making comparison impossible.


Subtle differences in addresses (such as commas or different abbreviations, for example, "Rue" instead of "R.").

I will add a function to the code with normalize_string() to convert addresses to lowercase, remove accents (é, è, à, etc.) and extra spaces. This will make the addresses in the two files comparable even if they differ slightly in case or accents.

If an address from the BDT.csv file does not match any address in BDS.csv, it will be displayed in the console. This will help you identify any potential differences.

I will modify the code and send it to you. We will achieve this; don’t despair, it’s only a formatting issue for me.


0
Bruno83200_6929 Posted messages 724 Registration date   Status Member Last intervention   170
 

Here, try this one now.

import csv import unicodedata # Function to normalize addresses (lowercase, remove accents) def normalize_string(s): s = s.strip().lower() # Remove spaces and convert to lowercase s = unicodedata.normalize('NFKD', s).encode('ASCII', 'ignore').decode('ASCII') # Remove accents return s adresse_siren = {} # Read the BDS.csv file and create a dictionary for addresses and SIREN with open('BDS.csv', 'r', encoding='utf-8') as recherche_file: reader = csv.reader(recherche_file) next(reader) # Skip header row if it exists for row in reader: if len(row) < 2: # Check that the row has at least 2 columns (address, siren) print(f"Line ignored (missing or badly formatted columns) : {row}") continue # Skip incorrect rows adresse = normalize_string(row[0]) # Normalize the address siren = row[1].strip() # Remove extra spaces for the siren adresse_siren[adresse] = siren # Read the BDT.csv file and compare addresses with those in the dictionary with open('BDT.csv', 'r', encoding='utf-8') as travail_file, \ open('update.csv', 'w', encoding='utf-8', newline='') as update_file: reader = csv.reader(travail_file) writer = csv.writer(update_file) writer.writerow(['adresse', 'siren']) # Write the header to the output file for row in reader: if len(row) == 0: # Check if the line is empty print(f"Empty line ignored : {row}") continue adresse = normalize_string(row[0]) # Normalize the address in BDT.csv if adresse in adresse_siren: writer.writerow([row[0], adresse_siren[adresse]]) # Use the original address in the output file else: print(f"Address not found : {row[0]}") # Print addresses that do not match print("The comparison is finished. The results have been written to 'update.csv'.") 
0
Lisana_69 Posted messages 20 Status Member > Bruno83200_6929 Posted messages 724 Registration date   Status Member Last intervention  
 

I’m going to try to get this working in any case, thanks for your help

0
Lisana_69 Posted messages 20 Status Member
 

The update file remains empty even after modifying the program.

I don't understand why, because I have checked that the addresses of the BDS file match those of the BDT.

0
Bruno83200_6929 Posted messages 724 Registration date   Status Member Last intervention   170
 


If the update.csv file remains empty even after modifications and you are sure that the addresses in the BDS.csv and BDT.csv files match, this may indicate a more subtle issue, such as an imperceptible difference in the data (invisible spaces, formatting differences, encoding, etc.).

We will diagnose and resolve this problem step by step.

Try this new code:

import csv import unicodedata # Function to normalize addresses (lowercase, remove accents) def normalize_string(s): s = s.strip().lower() # Remove spaces and convert to lowercase s = unicodedata.normalize('NFKD', s).encode('ASCII', 'ignore').decode('ASCII') # Remove accents return s adresse_siren = {} # Read BDS.csv and create a dictionary for addresses and SIREN with open('BDS.csv', 'r', encoding='utf-8') as recherche_file: reader = csv.reader(recherche_file) next(reader) # Skip header line if it exists for row in reader: if len(row) < 2: # Check that the line has at least 2 columns (address, siren) print(f"Ligne ignorée (colonne manquante ou mal formatée) : {row}") continue # Skip incorrect lines adresse = normalize_string(row[0]) # Normalize the address siren = row[1].strip() # Trim extra spaces for the siren print(f"Ajout au dictionnaire : {row[0]} -> {siren} (normalisé : {adresse})") adresse_siren[adresse] = siren # Read BDT.csv and compare addresses with those in the dictionary with open('BDT.csv', 'r', encoding='utf-8') as travail_file, \ open('update.csv', 'w', encoding='utf-8', newline='') as update_file: reader = csv.reader(travail_file) writer = csv.writer(update_file) writer.writerow(['adresse', 'siren']) # Write header in the output file for row in reader: if len(row) == 0: # Check if the line is empty print(f"Ligne vide ignorée : {row}") continue adresse = normalize_string(row[0]) # Normalize the address in BDT.csv print(f"Comparaison de : {row[0]} (normalisé : {adresse})") if adresse in adresse_siren: print(f"Adresse correspondante trouvée : {row[0]} -> {adresse_siren[adresse]}") writer.writerow([row[0], adresse_siren[adresse]]) # Use the original address in the output file else: print(f"Adresse non trouvée : {row[0]} (normalisé : {adresse})") print("La comparaison est terminée. Les résultats ont été écrits dans 'update.csv'.") 

Exécutez ce script et observez les lignes dans la console.

Si les adresses ne correspondent toujours pas malgré la normalisation, vérifiez que les fichiers BDS.csv et BDT.csv utilisent bien le même encodage (UTF-8, par exemple).


If necessary, try forcing encoding when opening the files with 'utf-8-sig' by adding this line :

with open('BDS.csv', 'r', encoding='utf-8-sig') as recherche_file: 

Voir ajouter celle-ci également pour révéler d'éventuels caractères invisibles :

print(f"Adresse originale (code ASCII) : {[ord(c) for c in row[0]]}") 

Une fois que vous aurez identifié le problème, nous pourrons ajuster le programme en fonction de ces informations.

0
Lisana_69 Posted messages 20 Status Member > Bruno83200_6929 Posted messages 724 Registration date   Status Member Last intervention  
 

After all the changes, the problem remains the same and the console shows me this message:

Original address (ASCII code): [32, 59, 59, 59, 59]

0
Lisana_69 Posted messages 20 Status Member
 

Okay I understand

No problem see you later and thank you very much.

0
mamiemando Posted messages 33228 Registration date   Status Moderator Last intervention   7 940
 

Hello,

So that everyone can test the proposed programs, would it be possible to share the CSV files in question?

Have you considered pandas ? Besides the fact that it is possible to easily load CSV files (see pd.read_csv), pandas provides numerous very efficient primitives for data manipulation. If I understand correctly, the goal here is to find a join between the two files based on the siren column (if that's the case, you can use pd.join). To export a dataframe, use the method to_csv.

Example :

fichier1.csv

nom,prenom,siren solo,han,1111 skywalker,luke,1111 the hutt,jabba,0 vador,dark,333

fichier2.csv

siren,cause 1111,rebellion 333,empire

toto.py

#!/usr/bin/env python3 import pandas as pd df1 = pd.read_csv("fichier1.csv") print(df1) print("-" * 50) df2 = pd.read_csv("fichier2.csv") print(df2) print("-" * 50) df = df1.set_index("siren").join(df2.set_index("siren")) print(df) print("-" * 50) print(df.to_csv()

Result :

 nom prenom siren 0 solo han 1111 1 skywalker luke 1111 2 the hutt jabba 0 3 vador dark 333 -------------------------------------------------- siren cause 0 1111 rebellion 1 333 empire -------------------------------------------------- nom prenom cause siren 1111 solo han rebellion 1111 skywalker luke rebellion 0 the hutt jabba NaN 333 vador dark empire -------------------------------------------------- siren,nom,prenom,cause 1111,solo,han,rebellion 1111,skywalker,luke,rebellion 0,the hutt,jabba, 333,vador,dark,empire

Good luck

0
Lisana_69 Posted messages 20 Status Member
 

Thank you for your response but unfortunately I cannot share the files because they are business contacts.

But I will try to think about it with the pandas function

Thank you very much

0
Bruno83200_6929 Posted messages 724 Registration date   Status Member Last intervention   170
 

Semicolons could be present instead of valid colons, which means the CSV files are not correctly structured or read improperly.

Here is the modified code to use a semicolon as the separator, in case that is the case in your files :

import csv import unicodedata # Function to normalize addresses (lowercase, remove accents) def normalize_string(s): s = s.strip().lower() # Remove spaces and convert to lowercase s = unicodedata.normalize('NFKD', s).encode('ASCII', 'ignore').decode('ASCII') # Remove accents return s adresse_siren = {} # Read the BDS.csv file with semicolon separator with open('BDS.csv', 'r', encoding='utf-8') as recherche_file: reader = csv.reader(recherche_file, delimiter=';') # Specify the separator next(reader) # Skip header row if present for row in reader: if len(row) < 2: # Check that the line contains at least 2 columns (address, siren) print(f"Ligne ignorée (colonne manquante ou mal formatée) : {row}") continue # Skip incorrect rows adresse = normalize_string(row[0]) # Normalize the address siren = row[1].strip() # Remove extra spaces for the siren print(f"Ajout au dictionnaire : {row[0]} -> {siren} (normalisé : {adresse})") adresse_siren[adresse] = siren # Read the BDT.csv file with semicolon separator and compare addresses with open('BDT.csv', 'r', encoding='utf-8') as travail_file, \ open('update.csv', 'w', encoding='utf-8', newline='') as update_file: reader = csv.reader(travail_file, delimiter=';') # Specify the separator writer = csv.writer(update_file) writer.writerow(['adresse', 'siren']) # Write header in the output file for row in reader: if len(row) == 0: # Check if the line is empty print(f"Ligne vide ignorée : {row}") continue adresse = normalize_string(row[0]) # Normalize the address in BDT.csv print(f"Comparaison de : {row[0]} (normalisé : {adresse})") if adresse in adresse_siren: print(f"Adresse correspondante trouvée : {row[0]} -> {adresse_siren[adresse]}") writer.writerow([row[0], adresse_siren[adresse]]) # Use the original address in the output file else: print(f"Adresse non trouvée : {row[0]} (normalisé : {adresse})") print("La comparaison est terminée. Les résultats ont été écrits dans 'update.csv'.") 

Si vos fichiers utilisent le point-virgule comme séparateur (ce qui semble être le cas vu les caractères ;;;;), cela permet au programme de correctement lire les colonnes des fichiers.

Si cette solution ne fonctionne pas, il pourrait être utile de vérifier manuellement les fichiers CSV pour vous assurer que les colonnes sont bien séparées par des virgules ou des points-virgules.


Si vos fichiers ne sont pas bien structurés, essayez de les réexporter avec un outil comme Excel ou un éditeur de texte pour vous assurer qu'ils respectent le format CSV correct (avec des séparateurs clairs).

La manipulation des fichiers CSV est toujours très délicate, les erreurs sont souvent liées au formatage des fichiers. Il est toujours plus judicieux d'utiliser des outils comme Excel et LibreOffice Calc (qui est gratuit et open source) qui sont des tableurs populaires pour manipuler des fichiers CSV. Si vous n'y parvenez pas avec python, je vous conseille d'utiliser un de ces tableurs.

Vous pouvez copier/coller les adresses d'un fichier dans un nouveau tableau, puis utiliser des formules comme RECHERCHEV pour associer les SIREN aux adresses correspondantes.

Exemple de formule RECHERCHEV :

=RECHERCHEV(A2;BDS!A:B;2;FAUX)
 

Je ne sais plus quelle solution vous apporter.


0
Lisana_69 Posted messages 20 Status Member
 

The program worked, thank you for your help.

I have one last request: I would like the client code to be displayed together with the associated SIREN (I'm not sure if that's very clear).

0
Lisana_69 Posted messages 20 Status Member
 

Avec le code nic aussi

0
Bruno83200_6929 Posted messages 724 Registration date   Status Member Last intervention   170
 

To add the client code and NIC code to your results, you can integrate them into the input CSV file structure (BDS.csv or BDT.csv, depending on the data source). Here is how you can add them to your results file.

The BDS.csv file contains, in addition to addresses and SIREN, the columns client_code and NIC_code.


You need to extract these two new columns and add them to the update.csv file.

import csv import unicodedata # Function to normalize addresses (lowercase, remove accents) def normalize_string(s): s = s.strip().lower() # Remove spaces and lowercase s = unicodedata.normalize('NFKD', s).encode('ASCII', 'ignore').decode('ASCII') # Remove accents return s adresse_siren = {} # Read the BDS.csv file with semicolon separator with open('BDS.csv', 'r', encoding='utf-8') as recherche_file: reader = csv.reader(recherche_file, delimiter=';') # Specify the separator, modify if necessary next(reader) # Skip header line if it exists for row in reader: if len(row) < 4: # Check that the row has at least 4 columns (address, siren, client_code, NIC_code) print(f"Ligne ignorée (colonne manquante ou mal formatée) : {row}") continue # Skip incorrect rows adresse = normalize_string(row[0]) # Normalize address siren = row[1].strip() # SIREN code_client = row[2].strip() # Client code code_nic = row[3].strip() # NIC code print(f"Ajout au dictionnaire : {row[0]} -> SIREN: {siren}, Code Client: {code_client}, Code NIC: {code_nic} (normalisé : {adresse})") adresse_siren[adresse] = (siren, code_client, code_nic) # Add a tuple with SIREN, client code, and NIC code # Read the BDT.csv file and compare addresses with open('BDT.csv', 'r', encoding='utf-8') as travail_file, \ open('update.csv', 'w', encoding='utf-8', newline='') as update_file: reader = csv.reader(travail_file, delimiter=';') # Specify the separator writer = csv.writer(update_file) # Write header to output file writer.writerow(['adresse', 'siren', 'code_client', 'code_nic']) for row in reader: if len(row) == 0: # Check for empty line print(f"Ligne vide ignorée : {row}") continue adresse = normalize_string(row[0]) # Normalize address in BDT.csv print(f"Comparaison de : {row[0]} (normalisé : {adresse})") if adresse in adresse_siren: siren, code_client, code_nic = adresse_siren[adresse] print(f"Adresse correspondante trouvée : {row[0]} -> SIREN: {siren}, Code Client: {code_client}, Code NIC: {code_nic}") writer.writerow([row[0], siren, code_client, code_nic]) # Use the original address and add the other columns else: print(f"Adresse non trouvée : {row[0]} (normalisé : {adresse})") print("La comparaison est terminée. Les résultats ont été écrits dans 'update.csv'.")

The BDS.csv file should be organized like this:

adresse;siren;code_client;code_nic Adresse 1;123456789;CL12345;NIC001 Adresse 2;987654321;CL54321;NIC002 

And the BDT.csv file should have at least the address column:

adresse Adresse 1 Adresse 2 

The update.csv file will contain the corresponding addresses, with the associated SIREN, client code, and NIC code:

adresse,siren,code_client,code_nic Adresse 1,123456789,CL12345,NIC001 Adresse 2,987654321,CL54321,NIC002 

This code now allows you to compare addresses while extracting the associated information such as the SIREN, the client code, and the NIC code into the output file.

0
Lisana_69 Posted messages 20 Status Member > Bruno83200_6929 Posted messages 724 Registration date   Status Member Last intervention  
 

So in the BDT file there is the address and the customer code and the BDS there is the address, the SIREN and the NIC

0
Bruno83200_6929 Posted messages 724 Registration date   Status Member Last intervention   170 > Lisana_69 Posted messages 20 Status Member
 

Thanks for the clarification, so here it is based on these elements:

import csv import unicodedata # Function to normalize addresses (lowercase, remove accents) def normalize_string(s): s = s.strip().lower() # Remove spaces and convert to lowercase s = unicodedata.normalize('NFKD', s).encode('ASCII', 'ignore').decode('ASCII') # Remove accents return s address_siren_nic = {} # Read BDS.csv file (with address, siren, nic) with open('BDS.csv', 'r', encoding='utf-8') as recherche_file: reader = csv.reader(recherche_file, delimiter=';') # Specify the separator, modify it if needed next(reader) # Skip the header line if it exists for row in reader: if len(row) < 3: # Check that the row has at least 3 columns (address, siren, nic) print(f"Line ignored (missing column or malformed) : {row}") continue # Skip incorrect lines address = normalize_string(row[0]) # Normalize the address siren = row[1].strip() # SIREN nic = row[2].strip() # NIC print(f"Added to dictionary : {row[0]} -> SIREN: {siren}, NIC: {nic} (normalized: {adresse})") adresse_siren_nic[adresse] = (siren, nic) # Store SIREN and NIC in the dictionary # Read BDT.csv file (with address, client code) and compare addresses with open('BDT.csv', 'r', encoding='utf-8') as travail_file, \ open('update.csv', 'w', encoding='utf-8', newline='') as update_file: reader = csv.reader(travail_file, delimiter=';') # Specify the separator writer = csv.writer(update_file) # Write the header in the output file writer.writerow(['adresse', 'code_client', 'siren', 'nic']) for row in reader: if len(row) < 2: # Check that the row has at least 2 columns (address, client code) print(f"Line ignored (missing column or malformed) : {row}") continue adresse = normalize_string(row[0]) # Normalize the address in BDT.csv code_client = row[1].strip() # Client code print(f"Comparing: {row[0]} (normalized: {adresse}) with client code: {code_client}") if adresse in adresse_siren_nic: # If the address matches siren, nic = adresse_siren_nic[adresse] print(f"Matching address found: {row[0]} -> SIREN: {siren}, NIC: {nic}") writer.writerow([row[0], code_client, siren, nic]) # Use the original address and add the other columns else: print(f"Address not found: {row[0]} (normalized: {adresse})") print("The comparison is complete. The results have been written to 'update.csv'.") 
1
Lisana_69 Posted messages 20 Status Member > Bruno83200_6929 Posted messages 724 Registration date   Status Member Last intervention  
 

Thank you for the time you took to help me with creating this program.

Everything is functional; I just need to clean the databases to find more SIREN numbers.

Have a great day.

and perhaps we will have the opportunity to exchange again on other Python programs.

0
Bruno83200_6929 Posted messages 724 Registration date   Status Member Last intervention   170 > Lisana_69 Posted messages 20 Status Member
 

It was with great pleasure!!!

1