[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: sort suggestion : --head/tail n option

From: Paul Eggert
Subject: Re: sort suggestion : --head/tail n option
Date: Fri, 23 Apr 2004 00:52:05 -0700
User-agent: Gnus/5.1006 (Gnus v5.10.6) Emacs/21.3 (gnu/linux)

"Philippe De Muyter" <address@hidden> writes:

> Do you think that this new option would fit easily in the current `sort'
> implementation, and how ?

It could be done and would I think be useful, though there would be a
few issues.

1.  You need to handle the case of "ties" correctly.  A tie occurs if
two lines compare equal: this can happen even if their byte-for-byte
contents differ.  In this case, you want the output of "sort --tail
1000" to be equivalent to "sort | tail -n 1000".  As you're sorting if
a new "tie" comes in for the first line in the tail, you need to
discard that first line and insert the "tie" in the correct position
somewhere in the "tail".  Ties for "sort --head" are easier, as you
can simply discard new ties for the last line in the head.

2.  Have you looked at the literature for the best data structure to
solve your problem?  For example, the people doing simulations have
worked hard on algorithms to implement priority queues (where the most
important operations are "delete min" and "insert"), and this seems to
be close to what you need.  Splay heaps seem to be one of the current
champions in that area.

3 (optional).  There should be no arbitrary limits, so ideally the
performance should be good even if N is quite large, so large that the
number of saved lines doesn't fit into memory.  However, this might
require adding major structural complexity to your algorithm so
perhaps it isn't worth doing.

Anyway, thanks for looking into the problem.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]