[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: file archive
From: |
James Youngman |
Subject: |
Re: file archive |
Date: |
Thu, 11 Jan 2007 16:26:26 +0000 |
On 12/5/06, Helmut Messerer <address@hidden> wrote:
I would need a file-archive tool, like a modified "locate" version,
which would store for each file an MD5 checksum, which then could be
searched in the database as well... this would enable us to find
identical files easily.
is that possible with findutils?
Sure:-
$ cat example.sh
#! /bin/sh
# make an example file tree
set -e
cd "$HOME"
mkdir -p tmp
cd tmp
WORKDIR=$(pwd)
cp -ar /usr/share/doc/gcc* .
set +e
find "$WORKDIR" -type f -exec md5sum {} \+ | /usr/lib/locate/frcode >
"$WORKDIR/md5sum.db"
$ time sh example.sh
real 0m0.815s
user 0m0.032s
sys 0m0.080s
$ locate -d ./md5sum.db a71b89a32c72accd00daf10cb5e41d56
a71b89a32c72accd00daf10cb5e41d56 /home/youngman/tmp/gcc-3.3-base/README.Bugs
a71b89a32c72accd00daf10cb5e41d56 /home/youngman/tmp/gcc-3.4-base/README.Bugs
a71b89a32c72accd00daf10cb5e41d56 /home/youngman/tmp/gcc-4.0-base/README.Bugs
$ locate -d ./md5sum.db . | awk '
{
instances[$1] = instances[$1] $2;
++count[$1];
}
END {
for (i in count) {
if (count[i] > 1)
printf("md5sum %20s is shared by %d files\n", i, count[i]);
}
};'
md5sum 63b818f22d81e2a0a0c7f3875a431128 is shared by 2 files
md5sum cf2eccc0a1d4cf7596a23cde61b9b0e2 is shared by 2 files
md5sum 1f3c7181ad7c9def4d79824256e3765d is shared by 2 files
md5sum a71b89a32c72accd00daf10cb5e41d56 is shared by 3 files
- file archive, Helmut Messerer, 2007/01/11
- Re: file archive,
James Youngman <=