Select next line CSH foreach

Solved
visiteurr -  
 visiteurr -
Hello,

I work in CSH Shell, in a file processing task I want to compare the first column element of line N with the first column element of line N+1. I'm struggling to do this... :/
How can I retrieve an element at N+1?

In Gro I have a file of the type

simon ok x
simon ok x
simon ok x
simon ok x
fabien ok x
fabien ok x
seb ok x
yoann ok x
yoann ok x
yoann ok x
yoann ok x
yoann ok x
yoann ok x

and I want to obtain this:

simon ok 4
simon ok 4
simon ok 4
simon ok 4
fabien ok 2
fabien ok 2
seb ok 1
yoann ok 6
yoann ok 6
yoann ok 6
yoann ok 6
yoann ok 6
yoann ok 6

the file to process is very large though... so I’m trying to do something lightweight...

so far, I haven't gotten anything to work, I've been testing but nothing works

example:
#! /bin/csh -f

foreach line ( "'cat tttt'" )
set argv = ( $line )
set name1 = $1
set name2 = $3
if ( $1 == $1 + 1) then
echo " $1 and '$1+1' test true "
else
echo " $1 and 'expr $1 + 1' test false "
endif

end

so if anyone has an idea to implement such a script in CSH... ( I'm not good in CSH, but I unfortunately didn't choose my work environment :/ )

thank you in advance

11 réponses

zipe31 Posted messages 34620 Registration date   Status Contributeur Last intervention   6 501
 
Bon, c'est pour un shell bash, il te faudra adapter la syntaxe pour le csh...

$ cat plop  simon ok x febgeg simon ok x rhedg simon ok x erze simon ok x srg e fabien ok x nrteth fabien ok x tehhet seb ok x et ee yoann ok x eth yoann ok x et he yoann ok x ehe yoann ok x egr yoann ok x ereh yoann ok x ete $ cat foo.sh  #! /bin/csh #set -xv while ( $line = <>) sed -i "s/${line% *}/ok ${line#*}/" plop end < <(awk '{ print $1 }' plop | uniq -c | awk '{ print $2,$1 }') $ ./foo.sh  $ cat plop  simon ok 4 febgeg simon ok 4 rhedg simon ok 4 erze simon ok 4 srg e fabien ok 2 nrteth fabien ok 2 tehhet seb ok 1 et ee yoann ok 6 eth yoann ok 6 et he yoann ok 6 ehe yoann ok 6 egr yoann ok 6 ereh yoann ok 6 ete $ 

;-))

--
Zen my nuggets ;-)
Faites un geste pour l'environnement, fermez vos fenêtres et adoptez un manchot.
2
visiteurr
 
Thank you very much, that's exactly what I want to do!

However, I can't implement it in CSH :/ since the -i option of sed doesn't work in CSH :/

If anyone knows CSH and can help me? (thanks again zipe31)
0
zipe31 Posted messages 34620 Registration date   Status Contributeur Last intervention   6 501
 
$ cat plop  simon ok x febgeg simon ok x rhedg simon ok x erze simon ok x srg e fabien ok x nrteth fabien ok x tehhet seb ok x et ee yoann ok x eth yoann ok x et he yoann ok x ehe yoann ok x egr yoann ok x ereh yoann ok x ete $ cat csh_foo.csh #! /bin/csh foreach line ( 'awk '{ print $1 }' plop | uniq -c | awk '{ printf "%s|%s\n",$2,$1 }'' ) set line = "$line:gas/|/ /" set argv = ( $line ) sed "/$1/{s/ok x/ok $2/}" plop > blop mv blop plop end $ ./csh_foo.csh $ cat plop simon ok 4 febgeg simon ok 4 rhedg simon ok 4 erze simon ok 4 srg e fabien ok 2 nrteth fabien ok 2 tehhet seb ok 1 et ee yoann ok 6 eth yoann ok 6 et he yoann ok 6 ehe yoann ok 6 egr yoann ok 6 ereh yoann ok 6 ete $

;-))

--
Zen my nuggets ;-) Do something for the environment, close your windows and adopt a penguin.
2
visiteurr
 
RE Thank you ^^
He sends me this: "Missing }"
nothing else, it's weird... I tried to modify (replaced the braces, etc.) but nothing works, it always gives errors.
But I don't understand, you did this on a CSH console so why doesn't it work for me? rrrr why do I never have any luck with computers? :/
0
zipe31 Posted messages 34620 Registration date   Status Contributeur Last intervention   6 501
 
Re-

He sent me this: "Missing }"
nothing else, it's weird ... I tried to modify it (replaced the braces etc ... but nothing works, it always gives errors.

Try to proceed step by step (that's what I did yesterday, since csh is not my cup of tea and I had errors like you, including the famous "Missing }" ;-((

So to start:

#! /bin/csh foreach line ( 'awk '{ print $1 }' plop | uniq -c | awk '{ printf "%s|%s\n",$2,$1 }'' ) echo $line end 

And see if you still have errors.

However, be careful, the forum code interprets back quotes " ' " (Alt Gr + 7, the 7 on the alphanumeric keypad above the Y and U keys) very poorly, so in the expression:

foreach line ( 'awk '{ print $1 }' plop | uniq -c | awk '{ printf "%s|%s\n",$2,$1 }'' )

there is indeed a back quote before the 1st awk and before the last closing parenthesis, right?



But I don't understand, you did this on a CSH console, so why isn't it working for me? rrrr why do I never have any luck with computers? :/
I installed it yesterday on Mandriva 2010, it is no longer provided by default, and as such it is tcsh that got installed, maybe that's why ;-\

$ ls -l /bin/csh lrwxrwxrwx 1 root root 4 2011-02-02 15:19 /bin/csh -> tcsh* $ urpmq -fi tcsh Name : tcsh Version : 6.15 Release : 6.4mdv2010.0 Group : Shells Size : 616513 Architecture: i586 Source RPM : tcsh-6.15-6.4mdv2010.0.src.rpm URL : http://www.tcsh.org/ Summary : An enhanced version of csh, the C shell Description : Tcsh is an enhanced but completely compatible version of csh, the C shell. Tcsh is a command language interpreter which can be used both as an interactive login shell and as a shell script command processor. Tcsh includes a command line editor, programmable word completion, spelling correction, a history mechanism, job control and a C language like syntax.
0
visiteurr
 
Ok, thanks again, for the quotes I had reversed :/ I left the first one in normal and the others inside in back :/ but I noticed that there was a problem ^^

I'll test this and let you know.
0
visiteurr
 
#! /bin/csh

foreach line ( 'awk '{ print $1 }' plop | uniq -c | awk '{ printf "%s|%s\n",$2,$1 }'' )
echo $line
end


it works very well!

on the other hand when I put everything back (with the correct quotes ^^) it says: Variable syntax
there's only one variable I think: $line and I don't see why!

As you've noticed I'm really bad at CSH :/ sorry>
0
zipe31 Posted messages 34620 Registration date   Status Contributeur Last intervention   6 501
 


sed "/$1/s/ok x/ok $2/" plop > blop
0
zipe31 Posted messages 34620 Registration date   Status Contributeur Last intervention   6 501
 
Hello,

In case you didn't know... there are ready-to-use tools available on GNU/Linux...

$ cat plop simon ok x simon ok x simon ok x simon ok x fabien ok x fabien ok x seb ok x yoann ok x yoann ok x yoann ok x yoann ok x yoann ok x yoann ok x $ uniq -c plop  4 simon ok x 2 fabien ok x 1 seb ok x 6 yoann ok x $

;-))

--
Zen my nuggets ;-)
Do something for the environment, close your windows and adopt a penguin.
0
visiteurr
 
Thank you for your reply. However, it compares the lines and I only want to compare the first column.
Because, for example, with
simon ok x febgeg
simon ok x rhedg
simon ok x erze
simon ok x srg e
fabien ok x nrteth
fabien ok x tehhet
seb ok x et ee
yoann ok x eth
yoann ok x et he
yoann ok x ehe
yoann ok x egr
yoann ok x ereh
yoann ok x ete

it doesn't work :/

and it's a huge file, so the best would be to modify it directly:
Knowing that the lines with the same first column are necessarily consecutive (phew)
Count the number of occurrences of the first term (e.g.: simon) and then put this number in the third column.

In essence:
L1 simon -> count = 1
L2 simon -> count = 2
L3 simon -> count = 3
L4 simon -> count = 4
L5 fabien -> count = 4 and then we put 4 in the third column of simon and reset count to 0

that's roughly what I want to do...

but otherwise, I would like to know: How do I retrieve an element at N+1? (cf first question)

Great thanks
0
lami20j Posted messages 21506 Registration date   Status Modérateur, Contributeur sécurité Last intervention   3 570
 
Hello,

However, it compares the lines and I only want to compare the first column.
Well, not for me.
From what I see, you want to compare the lines but only on a single criterion: the first word of the line.
0
lami20j Posted messages 21506 Registration date   Status Modérateur, Contributeur sécurité Last intervention   3 570
 
For example, this is what I see

:~$ cat visiteurr simon ok x febgeg simon ok x rhedg simon ok x erze simon ok x srg e fabien ok x nrteth fabien ok x tehhet seb ok x et ee yoann ok x eth yoann ok x et he yoann ok x ehe yoann ok x egr yoann ok x ereh yoann ok x ete :~$ perl -ne 's/^(.*?)\s.*/$1. "-> count " . ($h{$1}++ + 1)/e;print' visiteurr simon-> count 1 simon-> count 2 simon-> count 3 simon-> count 4 fabien-> count 1 fabien-> count 2 seb-> count 1 yoann-> count 1 yoann-> count 2 yoann-> count 3 yoann-> count 4 yoann-> count 5 yoann-> count 6
0
lami20j Posted messages 21506 Registration date   Status Modérateur, Contributeur sécurité Last intervention   3 570
 
:~$ cat visiteurr simon ok x febgeg simon ok x rhedg simon ok x erze simon ok x srg e fabien ok x nrteth fabien ok x tehhet seb ok x et ee yoann ok x eth yoann ok x et he yoann ok x ehe yoann ok x egr yoann ok x ereh yoann ok x ete :~$ perl -ne 's/^(.*?)\s(.*?)\sx(.*)/"$1 $2 " .($h{$1}++ + 1) . "$3"/e;print' visiteurr simon ok 1 febgeg simon ok 2 rhedg simon ok 3 erze simon ok 4 srg e fabien ok 1 nrteth fabien ok 2 tehhet seb ok 1 et ee yoann ok 1 eth yoann ok 2 et he yoann ok 3 ehe yoann ok 4 egr yoann ok 5 ereh yoann ok 6 ete 
0
lami20j Posted messages 21506 Registration date   Status Modérateur, Contributeur sécurité Last intervention   3 570
 
Re,

Well, there must be something simpler, but I'm thinking of something like this:
- we get the number of occurrences and store them in a temp file

:~$ perl -ane '$h{$F[0]}++;END{print "$_:$h{$_}\n" for keys %h}' visiteurr > visiteurr.occ lami20j@debian-acer:~$ cat visiteurr.occ seb:1 yoann:6 simon:4 fabien:2


- we use the temp file and insert the number of occurrences into the file
:~$ cat visiteurr simon ok x febgeg simon ok x rhedg simon ok x erze simon ok x srg e fabien ok x nrteth fabien ok x tehhet seb ok x et ee yoann ok x eth yoann ok x et he yoann ok x ehe yoann ok x egr yoann ok x ereh yoann ok x ete :~$ perl -ne '$h{$1}=$2 if /(.*):(.*)/;s/^(.*?)\s(.*?)\sx(.*)/$1 $2 $h{$1} $3/ and print' visiteurr.occ visiteurr simon ok 4 febgeg simon ok 4 rhedg simon ok 4 erze simon ok 4 srg e fabien ok 2 nrteth fabien ok 2 tehhet seb ok 1 et ee yoann ok 6 eth yoann ok 6 et he yoann ok 6 ehe yoann ok 6 egr yoann ok 6 ereh yoann ok 6 ete 


--
GNU/Linux: Linux is Not Ubuntu! Choosing which Linux to use does not mean your favorite Distribution,
106485010510997108
0
visiteurr
 
Hello,

Thanks for your reply lami20j! Uh, by the way, I don't know (and don't understand) the code you are using... Is it compatible with CSH?
Do you think the processing is lighter with this command?
0
zipe31 Posted messages 34620 Registration date   Status Contributeur Last intervention   6 501
 
Re-

Perl is usually installed by default on all GNU/Linux systems and does not depend on the login shell.

It is also supposed to be better suited for this kind of processing ;-))


PS. I'm replying on his behalf knowing that he won't log in until this evening, unless he wants to prove me wrong ;-))
0
visiteurr
 
Alright, thank you both then ^^ I will try and keep you updated:

(I’m not trying right away stolen vehicle = police station -> insurance etc ... :/ )
0
zipe31 Posted messages 34620 Registration date   Status Contributeur Last intervention   6 501
 
Good luck, we sympathize; -\
0
visiteurr
 
oulala lol I haven't modified anything except for the file names and I feel like it's working... I don't understand lol

Where do we select the column to modify? Can you explain this command to me?
:~$ perl -ne '$h{$1}=$2 if /(.*):(.*)/;s/^(.*?)\s(.*?)\sx(.*)/$1 $2 $h{$1} $3/ and print' visiteurr.occ visiteurr
0
lami20j Posted messages 21506 Registration date   Status Modérateur, Contributeur sécurité Last intervention   3 570
 
Hello,

1st command

perl -ane '$h{$F[0]}++;END{print "$_:$h{$_}\n" for keys %h}' visiteurr > visiteurr.occ

The role of this command is to count the number of occurrences of the word at the beginning of the lines in the file.

For this, I use a data structure called hash or associative array.
This data structure allows accessing array elements by a key (which is a string).
Each key corresponds to a value (which can be a string, a number, an array, a hash, a reference, basically anything ;-)
This results in the following presentation

%hash = ( "key1" => "value", "key2" => "another value", .... "keyN" => "and yet another value", ); 


Note that the key is unique.

In your example, the command will go through each line of the file.
Since we are looking for the number of occurrences of the first word of each line, we just need to consider the first word as the key, and since it should be unique, I will just count the value afterwards.

Here’s what happens under the hood.

Processing the first line
the key is simon and the value will be 1

Processing the second line
the key is simon and the value will be 2 (the value is incremented with each occurrence)

All this for all the simon, regardless of the line number in the file (so the lines starting with simon don’t need to be grouped)

When we reach fabien, that’s a new key, and similarly to the simon key, the value will be incremented and so on until the last line of the file.

In the end, the hash looks something like this (note that we can sort the hash but not needed in this case) which is internal and thus random and not in the order of creation of the hash
 %h = ( "seb" => 1, "yoann" => 6, "simon" => 4, "fabien" => 2, );


At this point, the hash is in memory and needs to be saved somewhere; I chose a file.
The block END{} ensures that once it reaches the end of the file, the hash is displayed.
To write to the file, I used simple redirection of STDOUT (standard output, the screen) to a file.

That’s how the 1st command works.
The options used allow splitting the words of each line into an array @F and then I use $F[0] - the 1st element (simon, seb, fabien, yoann)

The 2nd command

perl -ne '$h{$1}=$2 if /(.*):(.*)/;s/^(.*?)\s(.*?)\sx(.*)/$1 $2 $h{$1} $3/ and print' visiteurr.occ visiteurr

This command reads both files:
- the one created by the 1st command which contains the number of occurrences
- the original file.

The command consists of two lines of code separated by a semicolon
$h{$1}=$2 if /(.*):(.*)/
and
s/^(.*?)\s(.*?)\sx(.*)/$1 $2 $h{$1} $3/ and print

The command $h{$1}=$2 if /(.*):(.*)/ at the moment of reading the 1st file will recreate the hash.
This time the separator is no longer a space but a colon
(.*):(.*) it is a regular expression that could be translated like this

. means any character
* is a quantifier that allows finding 0, 1, or any number of characters
() the parentheses are for capturing the found pattern
: is the literal character

The captures are numbered from 1 to .... and the corresponding variables are $1, $2 .....

What’s interesting is that the hash will be filled only if the line contains a : (this could pose memory problems with no results if the original file contains :)
We could improve by using start and end string anchors. (^ - start; $ - end)

You might wonder why we didn’t do it all at once instead of creating a temporary file.
If the file is large (let’s say millions of lines) then just imagine how much RAM + swap we would need to store all that.
Well, the worst case would be if the original file contained one key per line, but in that case it would not be necessary to count the number of occurrences, and in such a case adding 1 in the column would suffice

So $h{$1}=$2 if /(.*):(.*)/ briefly says: fill the hash with key => value only if the line read from the file contains :

At the end of reading the 1st file, the hash is filled and the reading of the original file begins.

s/^(.*?)\s(.*?)\sx(.*)/$1 $2 $h{$1} $3/ and print

Knowing that the separator is the space, it is sufficient to split the words and then replace the x with the corresponding value found in the hash

s/MOTIF/REPLACEMENT/ is the substitute function that allows replacing the left side with what is on the right

The MOTIF part
s/^(.*?)\s(.*?)\sx(.*)/

s/
^ - start anchor
( - start 1st capture - $1
.*? - any character 0, 1 or any number of times but avoid greediness
) - end 1st capture
\s - looks for a space
( - start 2nd capture - $2
.*? - any character 0, 1 or any number of times but avoid greediness
) - end 2nd capture
\sx - is the field concerned for the change
( - the 3rd capture - $3
.* - any character 0, 1 or any number of times, greedy this time
) - end of the 3rd capture

Be careful, if the modified column does not contain x then the regex should be changed

The REPLACEMENT part

/$1 $2 $h{$1} $3/ and print

/
$1 - the 1st capture
$2 - space the 2nd capture
$h{$1} - space and see (number of occurrences)
$3 - space and the 3rd capture
/ and print - end of replacement and display

number of occurrences $h{$1}
The 1st capture is the first word of the line.
$h{$1} for example, when the word is simon we have:

$h{"simon"} and in the hash we saw that the value of simon is the number of occurrences found by the 1st command, so 4

This substitution is applied for each line.

There you go, I hope it’s a bit clearer.

He is the Perl specialist ;-))
Not a liar regarding the connection, but for the rest yes ;-))
0
lami20j Posted messages 21506 Registration date   Status Modérateur, Contributeur sécurité Last intervention   3 570
 
Hi,

To be honest, my test is based on an example that doesn't seem to match your file.
For that, I might need your file.
Can you send it to me by email?

One small clarification ... what's the difference between ".*?" and just ".*"?

Here's an example to see the difference.
You notice that when I use .* a, $1 is xigenc - .* has consumed everything up to the last e, so the longest string.
On the other hand, when I use .*?, then $1 is xig - .*? has consumed up to the 1st e, so the minimal string.

 :~$ echo exigence exigence :~$ echo exigence | perl -ne '/e(.*)e/ ; print "$1\n"' xigenc r:~$ echo exigence | perl -ne '/e(.*?)e/ ; print "$1\n"' xig
0
zipe31 Posted messages 34620 Registration date   Status Contributeur Last intervention   6 501
 
Hi,

For that, I might need your file.
Can you send it to me by email?

Already asked, but that's not possible, however the original lines look like this;-\
0
lami20j Posted messages 21506 Registration date   Status Modérateur, Contributeur sécurité Last intervention   3 570
 
Hello,

Already asked, but it's not possible, however the original lines are similar

Well, that's exactly what bothers me that it gathers, but no one mentions what lies behind the non-printable characters (space, tab, or I don't know what else ;-)

I will try to generalize.
0
zipe31 Posted messages 34620 Registration date   Status Contributeur Last intervention   6 501
 
Gotta deal with it... but that's when you really see the true beasts in the end ;-))
0
lami20j Posted messages 21506 Registration date   Status Modérateur, Contributeur sécurité Last intervention   3 570
 
Re,

I don't understand how to "simulate" columns...

Do you know?


So let's try to figure out the structure of your file.
With this command, all characters other than spaces and tabs are replaced by A and the others by their ASCII code.

perl -ne 'while(/(.)/g){my $x=$1;($x=~/\s/)?(print " ", ord($x), " "):print "A"};print "\n"' visiteurr > visit.struct

Then you put the file visit.struct on cjoint.com

As proof, here's what it displays on my end

~$ cat visiteurr simon ok x febgeg simon ok x rhedg simon ok x erze simon ok x srg e fabien ok x nrteth fabien ok x tehhet seb ok x et ee yoann ok x eth yoann ok x et he yoann ok x ehe yoann ok x egr yoann ok x ereh yoann ok x ete ~$ perl -ne 'while(/(.)/g){my $x=$1;($x=~/\s/)?(print " ", ord($x), " "):print "A"};print "\n"' visiteurr > visit.struct lami20j@debian-acer:~$ cat visit.struct AAAAA 32 AA 32 A 32 AAAAAA AAAAA 32 AA 32 A 32 AAAAA AAAAA 32 AA 32 A 32 AAAA AAAAA 32 AA 32 A 32 AAA 32 A AAAAAA 32 AA 32 A 32 AAAAAA AAAAAA 32 AA 32 A 32 AAAAAA AAA 32 AA 32 A 32 AA 32 AA AAAAA 32 AA 32 A 32 AAA AAAAA 32 AA 32 A 32 AA 32 AA AAAAA 32 AA 32 A 32 AAA AAAAA 32 AA 32 A 32 AAAA AAAAA 32 AA 32 A 32 AAA 32 


--
GNU/Linux: Linux is Not Ubuntu! Choosing a Linux doesn't mean your favorite Distribution,
106485010510997108
0
visiteurr
 
RE ^^ so after having extensively modified the code to try to adapt it ^^ here's what I got:

perl -ne '$h{$1}=$2 if /(.*):(.*)/;s/^(.*?)\tmodification\t(.*)/$1\t$h{$1}\t$2/ and print' texte.txt.occ texte.tmp.3_2 > texte.txt.tmp.3_3

with lines of this type:

simon modification 9999.00 test 999.00 tes2 test3 pierre 99.00 test4 yoann 99.00 99.00 grande_phrase 9999999.00 99.00 99.00 9.00 99.00 didier

and it works great

Unfortunately, I am not allowed to output even a modified document from my company ... Additionally, the file has 16,000 very, very long lines ^^ you might say it's not a big deal lol I have one that has over 4 million lines =) (more than 200Mo).
In my example above, each word is in a different column and the result should be:

simon 8 9999.00 test 999.00 tes2 test3 pierre 99.00 test4 yoann 99.00 99.00 grande_phrase 9999999.00 99.00 99.00 9.00 99.00 didier

if Simon appears 8 times in the first position.

What do you think of my modifications? Is it normal that it works or is it a stroke of luck and it won't work all the time?

Thanks guys, that's really nice
0
lami20j Posted messages 21506 Registration date   Status Modérateur, Contributeur sécurité Last intervention   3 570
 
Hi,

is it normal that it works or is it just a stroke of luck and it won't work all the time?

What I need is not the content but the structure of your file.
Well, if stating that there is a word separated by a space or tab is a security breach, then they are paranoid ;-)


What I'm interested in is the field separator and which column you want to replace.

and it works super well
I am not convinced.
Your command and your file line are two different things.
I see a colon in your regex between the 1st and 2nd capture, but not in your file line.
0
visiteurr
 
re,

I see in your regex between the 1st and 2nd capture a colon but not in the line of your file.

I think that the if /(.*):(.*)/ is regarding the .occ file, right?

Indeed, they are paranoid ^^ it's the problem with big multinationals :/

What I need is not the content but the structure of your file.

The file consists of lines all with the same structure ... 20 columns each containing a word (e.g., test_53_du_01012011 <- that's ONE word),
the column to modify is the 2nd one.

When I open it with "more," I get the impression that the column separator is a tabulation.

I hope I answered your question ... if you need more info, don't hesitate, I'll try to answer as best as I can...


However, regarding the ":", I've never had a ":" in my file to process.

thx ;)
0
lami20j Posted messages 21506 Registration date   Status Modérateur, Contributeur sécurité Last intervention   3 570
 
Re,

I think that the if /(.*):(.*)/ is related to the .occ file, right?
You're right. I'm at work too ;-)
Forget what I said ;-)

I'll take a look tonight.
0
visiteurr
 
No problem ^^ it means that I'm starting to understand the code a bit lol
0
lami20j Posted messages 21506 Registration date   Status Modérateur, Contributeur sécurité Last intervention   3 570
 
Hi,

The line you modified is correct, and it works with the types of lines you showed.

Actually, in my example, I was doing the replacement on the 3rd column while you want it on the 2nd.

Normally, \s is a character class that includes whitespace, so \t as well.

You also wanted an explanation for .*?.

I gave you an example here https://forums.commentcamarche.net/forum/affich-20734055-selection-ligne-suivante-csh-foreach#50

Be careful, however, about what you will use as regex instead of modification.
Test this

perl -ne '$h{$1}=$2 if /(.*):(.*)/;s/^(.*?)\s.*?\s(.*)/$1\t$h{$1}\t$2/ and print' texte.txt.occ texte.tmp.3_2 > resultat



To see what it does with .* instead of .*?, you can test with

perl -ne '$h{$1}=$2 if /(.*):(.*)/;s/^(.*?)\s.*\s(.*)/$1\t$h{$1}\t$2/ and print;print "\n"' texte.txt.occ texte.tmp.3_2 > resultat2
0