Duplicate Removal - PERL

bibi -  
 bheadman -
Salut à tous,

I wanted to get your opinion on the best way to remove duplicates from a list of strings...

This is what I'm doing for now (it's not great and that's why I'm posting ;)

1) I'm storing them in a simple array (not associative)

push(@stock, $elem);

2) I sort them

@stock=sort(@stock);

3) I go through all the elements and check if the next one is identical. If identical, I assign it the value ""


for ($i=0; $i < @stock; $i++)
{
if(@stock>$i+1)
{
($stock[$i]eq$stock[$i+1])?$stock[$i]='':$stock[$i];
}
}


4) I sort again (I warned you... my algo is really not great) :(

@stock=sort(@stock);

5) Then, again, I go through the array while removing elements when I encounter an empty string... if I find something different, I exit...

while (@sender[0]eq'')
{
shift(@sender);
if (!@sender)
{
last;
}
}


... all this to get rid of duplicates.... :(
So if someone sees a more optimized way than mine... something like giving me a function in Perl eradikerdoublons() or (because I've looked, this super function doesn't exist in Perl ;) improving my algo... it would be nice if you could explain how to me ;)

6 answers

nenecg
 
Here is a solution

my (%saw,@out)=();
undef %saw;
@out = sort(grep(!$saw{$_}++, @stock));

the @out array contains the list without duplicates.
10
Bobinours Posted messages 2903 Status Member 504
 
I had the same idea as sebsauvage, using a HASH table.


my %h_senders;
loop over the elements (condition) { # It's up to you to define it
$h_senders[$elem]++;
}


By the way, this allows you to know the number of occurrences of the element (thanks to the increment).

So if you display:

print $h_senders["je.suppose@que.cesont.des.email"];

It will display the number of times the email "je.suppose@que.cesont.des.email" is found.
--
-= Bobinours - http://bobin.underlands.org =-
1
bheadman
 
I hope he has found his solution by now ^^,
but basically the idea (in Perl) is to use hash arrays.
you need to enter the desired values as keys and count the occurrences of each key (the number of times it appears in the initial list). In Perl, a hash array can only have one key value per array (no duplicates in a way), which solves the problem.

There you go, I think that's what is proposed in the last post (it's been a while since I've touched Perl) but that's the avenue to explore.

Best regards.
Nicolas
1
bibi
 
Hey!!!!

You're not going to tell me that my code is super optimized, are you???? ;-)

If you don't know PERL, it's okay, just write to me in the language you know best or in pseudocode, or simply in French, what you would do to improve my code (I'll take care of transforming it into PERL)

I'm eagerly awaiting your suggestions, thanks :)
0
sebsauvage Posted messages 33284 Registration date   Status Moderator Last intervention   15 684
 
And why not place your strings in a hash table?
It makes it very easy to eliminate duplicates.

(I can't remember the syntax in Perl.)
0
sebsauvage Posted messages 33284 Registration date   Status Moderator Last intervention   15 684
 
To give the example in Python, a program that takes a file A.txt, removes duplicates, sorts, and writes the result to B.txt (I'll keep it compact):

items = dict( [ (line,0) for line in open('A.txt','rb').read().split('\n')] ).keys()
items.sort()
open('B.txt','w+b').write('\n'.join(items))


Isn't it beautiful, huh? Three little lines of code?
(Okay, I agree it's not great for readability if you don't know Python :)
0