bug-findutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: file archive


From: James Youngman
Subject: Re: file archive
Date: Thu, 11 Jan 2007 16:26:26 +0000

On 12/5/06, Helmut Messerer <address@hidden> wrote:
I would need a file-archive tool, like a modified "locate" version,
which would store for each file an MD5 checksum, which then could be
searched in the database as well... this would enable us to find
identical files easily.

is that possible with findutils?

Sure:-

$ cat example.sh
#! /bin/sh


# make an example file tree
set -e
cd "$HOME"
mkdir -p tmp
cd tmp
WORKDIR=$(pwd)
cp -ar /usr/share/doc/gcc* .
set +e

find "$WORKDIR" -type f -exec md5sum {} \+ | /usr/lib/locate/frcode >
"$WORKDIR/md5sum.db"

$ time sh  example.sh

real    0m0.815s
user    0m0.032s
sys     0m0.080s
$ locate -d ./md5sum.db a71b89a32c72accd00daf10cb5e41d56
a71b89a32c72accd00daf10cb5e41d56  /home/youngman/tmp/gcc-3.3-base/README.Bugs
a71b89a32c72accd00daf10cb5e41d56  /home/youngman/tmp/gcc-3.4-base/README.Bugs
a71b89a32c72accd00daf10cb5e41d56  /home/youngman/tmp/gcc-4.0-base/README.Bugs

$ locate -d ./md5sum.db . | awk '
{
 instances[$1] = instances[$1] $2;
 ++count[$1];
}

END {
 for (i in count) {
   if (count[i] > 1)
     printf("md5sum %20s is shared by %d files\n", i, count[i]);
 }
};'
md5sum 63b818f22d81e2a0a0c7f3875a431128 is shared by 2 files
md5sum cf2eccc0a1d4cf7596a23cde61b9b0e2 is shared by 2 files
md5sum 1f3c7181ad7c9def4d79824256e3765d is shared by 2 files
md5sum a71b89a32c72accd00daf10cb5e41d56 is shared by 3 files




reply via email to

[Prev in Thread] Current Thread [Next in Thread]