Posted By

ksaver on 08/03/11


Tagged

duplicate find


Versions (?)

Who likes this?

1 person have marked this snippet as a favorite

nerdfiles


Find duplicate files, using sha1 hash


 / Published in: Bash
 

URL: http://www.commandlinefu.com/commands/view/8958/find-duplicate-files-using-sha1-hash

Output Example:

d65bfef64a5fc9f7dbf9d35d80a2e1ed218c75d2 ./tmp1/12414.txt

d65bfef64a5fc9f7dbf9d35d80a2e1ed218c75d2 ./tmp2/2012.txt

d65bfef64a5fc9f7dbf9d35d80a2e1ed218c75d2 ./tmp1/3153.txt

dd07cec149e7c5929d6e9a0618de7114d50b34b0 ./tmp1/10064.txt

dd07cec149e7c5929d6e9a0618de7114d50b34b0 ./tmp2/30901.txt

d9bc21587f94d7a138bddf41cfa4e92a04cf9c54 ./tmp1/36.txt

d9bc21587f94d7a138bddf41cfa4e92a04cf9c54 ./tmp1/83.txt

[...]

  1. # Bash one-liner for find duplicate files
  2. # ksaver, Aug 2011
  3. # http://www.commandlinefu.com/commands/view/8958/find-duplicate-files-using-sha1-hash
  4. # Public Domain Code
  5. # Updated with some nice changes, now is smaller and faster... :-)
  6.  
  7. for i in $(find . -type f -exec sha1 -r {} \+ |tee .hashes.tmp |awk '{print $1}' |sort |uniq -d); do grep $i .hashes.tmp; echo; done;

Report this snippet  

Comments

RSS Icon Subscribe to comments
Posted By: ksaver on August 4, 2011

First version:

$ time for i in $(find . -type f -exec sha1 -r {} \; |tee .hashes.tmp |awk '{print $1}' |sort |uniq -c |awk '{print $1, $2}'|grep -v "^1"|sort -rn |awk '{print $2}'); do grep $i .hashes.tmp; echo; done;

[...]

real 0m2.336s

user 0m0.600s

sys 0m1.820s

New version:

$ time for i in $(find . -type f -exec sha1 -r {} + |tee .hashes.tmp |awk '{print $1}' |sort |uniq -d); do grep $i .hashes.tmp; echo; done;

[...]

real 0m0.256s

user 0m0.104s

sys 0m0.167s

You need to login to post a comment.