Batch: change accented characters

MarcusDom -  
MarcusDom Posted messages 53 Status Membre -
Hello,

Configuration: Windows XP / Internet Explorer 8.0

I would like to create a scheduled task that would call a .bat file to change all accented character values to their standard character. é -> e for example.
After going through batch commands, I must admit I'm a bit lost when it comes to performing this kind of task. Removing a line, copying a file, or launching a command is fine, but this is a bit complicated. If someone could help me with this problem, it would be really nice; this seems to be the last step of my project.

The context is that I am importing an LDIF file into an openLDAP and it does not handle accents well, especially after the modifications I have already made (é becomes , ).

Thank you.

6 réponses

nirG95 Posted messages 319 Status Membre 32
 
Look towards sed ;)
0
nirG95 Posted messages 319 Status Membre 32
 
Otherwise, you can do something like this:

for /f "delims=" %%a in ('type fichier1.txt') do call :commande "%%a" goto :eof :commande set ligne=%1 set ligne=%ligne:"=% set ligne=%ligne:é=e% set ligne=%ligne:è=e% set ligne=%ligne:à=a% @echo %ligne% >>fichier2.txt


Fichier1.txt: I went to the beach.
Fichier2.txt: I went to the beach.

Then it's up to you to modify it as you see fit!

Best regards.
0
nirG95 Posted messages 319 Status Membre 32
 
In order to keep the accents, have you tried replacing é with '?

for /f "delims=" %%a in ('type lol.txt') do call :commande "%%a" goto :eof :commande set ligne=%1 set ligne=%ligne:"=% set ligne=%ligne:é='% set ligne=%ligne:à=...% @echo %ligne% >>fichier2.txt


Since the site does not accept special characters, I am attaching the file directly.

http://www.cijoint.fr/cjlink.php?file=cj201008/cijIXP2wlV.zip

File1.txt: I went to the beach.
File2.txt: This morning I went, ... to the beach.

@CMD

C:\>type fichier2.txt This morning I went to the beach.


PS: The site does not accept characters, to find the characters that will replace the accents, you need to launch CMD and run edit (example: C:\> edit Fichier1.txt) then put your accented characters (for example: é to è ù etc ...) save it and open with Notepad and you will have the characters.

Best regards.
0
MarcusDom
 
Thank you for your quick responses, but I'm having a lot of problems.
For sed, I took a look but apparently it doesn't work.
Then for your first script, it changes my é to ,

Can you tell me what set ligne=%1
and set ligne=%ligne:"=% mean?

Also, it removes my empty lines.
So I'm working with ldif files encoded in UTF-16LE and as soon as I use the script, it saves the output file in ANSI.
But well, it's similar to text files and the most restrictive part is that I have to do everything in batch for a scheduled task.

I'm now going to take a look at your second solution.

Thank you for your contribution :)
0
nirG95 Posted messages 319 Status Membre 32
 
So I gave you two scripts! The first one replaces é with e, the second one replaces é with , (which is the é in back).

The following line set ligne=%ligne:"=% removes the quotes " " " ".

But my first script works.


for /f "delims=" %%a in ('type fichier1.txt') do call :commande "%%a" goto :eof :commande set ligne=%1 set ligne=%ligne:"=% set ligne=%ligne:é=e% set ligne=%ligne:è=e% set ligne=%ligne:à=a% @echo %ligne% >>fichier2.txt


Best regards.
0
MarcusDom
 
Re

The first script works but only if I convert my file to ANSI before using your script.
And it removes my empty lines between my entries, which prevents me from adding to the directory.
0
MarcusDom Posted messages 53 Status Membre
 
(Excuse me if I'm posting the same message multiple times, but they are not showing up in the discussion)

Your first script works but only if my file is converted to ANSI before using your script. And it removes my empty lines from the file, which is essential for distinguishing between two entries.

Is there a batch command that converts a document to ANSI?
And also a command to add an empty line after a certain number of lines? For example, a modulo 10 so that a line is inserted after every entry.

Thank you
0
MarcusDom
 
Re

I tried your code but it only works if my file is converted to ANSI. Perhaps there is a way to convert a UTF-16LE file to ANSI via command line.

And the other issue is that it removes all empty lines.
So can we:
- Add a line after each entry? The last line of each entry ends with givenName: entry name
- Add two empty lines at the end of the file, it's a standard for ldif files.

The second code does not work, importing into the program generates errors, but just one solution is enough for me :)
0
MarcusDom Posted messages 53 Status Membre
 
Something new, I used the sed command for standardizing characters and it works pretty well :)

I recommend it, there are a few subtleties regarding the Linux syntax, here's how I used it:

sed s/é/e/ fichier.ldf > temp.ldf
sed s/è/e/ temp.ldf > temp2.ldf
sed s/ô/o/ temp2.ldf > temp3.ldf
sed s/ê/e/ temp3.ldf > temp4.ldf
sed s/î/i/ temp4.ldf > temp5.ldf
sed s/û/u/ temp5.ldf > fichier.ldf
del temp.ldf
del temp2.ldf
del temp3.ldf
del temp4.ldf
del temp5.ldf

I don't know if it's optimal, but at least it works, and you have to repeat some lines for words that might have the same character multiple times.

Thank you and good luck.
0
dubcek Posted messages 18814 Registration date   Status Contributeur Last intervention   5 655
 
You can put the sed commands in a file and call sed with -f filename.txt
add a g to change all characters on the line
s/é/e/g s/è/e/g etc
0
MarcusDom Posted messages 53 Status Membre
 
Thanks dubcek, I had already put the commands in a batch file, but I didn't know about the /g option.

I will improve that.
0