We Recommend

Learning Perl Learning Perl
In this smooth, carefully paced course, a leading Perl trainer teaches you to program in the language that threatens to make C, sed, awk, and the Unix shell obsolete for many tasks. This book is the "official" guide for both formal (classroom) and informal learning. It is fully accessible to the novice programmer.


Posted By

noah on 07/03/07


Tagged

regex list file text filter write create work useful tools automation productivity jag 2005 essential forqblog


Versions (?)


Remove duplicate lines from a text file


Published in: Perl 


URL: http://answers.google.com/answers/threadview?id=25196

Sometimes I get a big list of things, and some of the things occur multiple times in the same list. A good example is a newline-delimited list of files: the same file path might be listed 4 or 5 times. So this script removes those kinds of duplicate lines.

Found at Google Answers.

  1. #!/usr/bin/perl -w
  2. use strict;
  3. my $origfile = shift;
  4. my $outfile = "no_dupes_" . $origfile;
  5. my %hTmp;
  6.  
  7. open (IN, "<$origfile") or die "Couldn't open input file: $!";
  8. open (OUT, ">$outfile") or die "Couldn't open output file: $!";
  9.  
  10. while (my $sLine = <IN>) {
  11. next if $sLine =~ m/^\s*$/; #remove empty lines
  12. #Without the above, still destroys empty lines except for the first one.
  13. print OUT $sLine unless ($hTmp{$sLine}++);
  14. }
  15. close OUT;
  16. close IN;
  17.  
  18.  
  19. #This script was found at
  20. #http://answers.google.com/answers/threadview?id=25196

Report this snippet 

You need to login to post a comment.