[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: uniq for unsorted input
From: |
Bruno Haible |
Subject: |
Re: uniq for unsorted input |
Date: |
Wed, 27 Mar 2024 00:25:50 +0100 |
Pádraig Brady wrote:
> A documentation patch would be very useful.
> If you search for "Decorate Sort Undecorate" in coreutils.texi
> you can see an existing example of this DSU pattern.
> I would also mention DSU if adding another example.
OK, I'll provide doc input for this (post 9.5).
> Re adding new --keep-unsorted options I'm a bit reluctant
> as the scalability is not improved by bringing within the command.
The advantage of having this functionality in a coreutils
C program (as opposed to some shell script) is that it can
be optimized for both ends of the scalability spectrum:
- An all-in-memory implementation, like the existing 'uniq',
for inputs that easily fit in RAM (similar criteria as 'sort'
uses, based on physmem()).
- A scalable implementation, that does 3 or 4 fork+exec but
is O(N log N), for bigger inputs.
Since I don't see 'uniq' options that could not be supported with
--keep-unsorted[-first] or --keep-unsorted-last, it would make
sense to extend the 'uniq' program, rather than to create another
program.
Bruno