[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
findutils: some possible enhancements
From: |
Wolfgang Friebel |
Subject: |
findutils: some possible enhancements |
Date: |
Thu, 16 Aug 2001 16:31:46 +0200 (MET DST) |
Long time ago I made some proposals to enhance the findutils package.
As I looked now in the latest release 4.1.7 at least one of the ideas was
still listed on the TODO list (sorting find).
As I do believe, the changes I made to the findutils package (4.1) are at
least worth considering, I would like to list my ideas once more.
If something of it sounds interesting to you, I can try to provide patches
against the current release.
My proposals were (context diffs against 4.1 are in
ftp://ftp.ifh.de/pub/unix/gnu/findutils-4.1.enhanced.tar.gz):
locate: With the -d option in locate a default path should be provided.
(compiled in default and environment variable)
Otherwise for the average user this option is of little use as the
databases path is not generally known and takes much time to type
in. (We have a convention to have one database for each machine
and locate -d hostname ... lets you search for files on a given
machine.)
bigram/code (old method):
Most of the code for the bigram program is already contained in the
code utility. The idea is to recalculate the frequency of bigrams
on the fly in the program code whilst coding the database with an
old bigrams table and to replace the old bigrams file with a new one.
This is justified by the fact, that the database contents tends to
be rather stable. Even an empty table can serve as a starting point.
Then the coding step has simply to be repeated or the database stays
somewhat (40%) larger. This would make the bigram utility obsolete and
the coding process more transparent. The sort command as a source of
failures in the calculation of bigrams is avoided.
The sorted output of find can thus directly be piped into the code
command without creating huge temporary file lists, a further point
of possible failures.
find: The last remaining weak point in the updatedb shell script is the
use of sort to get a sorted filelist. Especially for large file
servers sort runs "out of sort space". The simplest way to avoid this
step is to generate already a sorted list of files by sorting
directories on request within find and then to descend the sorted tree.
To control the find behaviour I introduced two new find directives
-sort and -isort (for case insensitive sort) and added sort code to
find. This results in a further simplification of the updatedb script.
In conjunction with AFS it is useful to have an option that
directs find to stay on an AFS volume. I have added an option
-xvol in analogy to -xdev (stay on device).
For some task it was useful to have a sorted find output where in
a given directory files are printed first and only then subdirs.
While this might sound esotheric, I added nevertheless an option
-dirslast.
Major changes in my modified findutils version 4.1
* find:
* New options -isort and -sort to get sorted output.
* locate:
** updatedb takes advantage of options -sort and -print0 of find.
** optional compression of database by gzip.
** locate understands gzipped databases.
** A default search path for databases can be specified by environment
variable LOCATE_PREFIX.
** Added code to make consistency checks for old database format.
** Option -0 or --null introduced to be compliant with xargs --null
Best regards
--
Wolfgang Friebel
Deutsches Elektronen-Synchrotron DESY | Phone: +49 33762 77372 |
Platanenallee 6 | Fax: +49 33762 77216 |
D-15738 Zeuthen Germany | E-Mail: address@hidden |
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- findutils: some possible enhancements,
Wolfgang Friebel <=