bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: fix du and wc wrt --files0-from=F


From: Jim Meyering
Subject: Re: fix du and wc wrt --files0-from=F
Date: Tue, 02 Dec 2008 08:16:07 +0100

Pádraig Brady <address@hidden> wrote:
> Jim Meyering wrote:
>>>From 622f12ed8fd799ad83546233a60655a6a0d2a4b6 Mon Sep 17 00:00:00 2001
>> From: Jim Meyering <address@hidden>
>> Date: Tue, 25 Nov 2008 18:38:26 +0100
>> Subject: [PATCH 2/2] wc: read and process --files0-from= input a name at a 
>> time,
>>
>> when the file name list is not too large.  Before, wc would always
>> reading the entire file name list into RAM and *then* process each
>       ^^^

Thanks!

>> file name.
>> * src/wc.c: Include "argv-iter.h".
>> (main): Rewrite to use argv-iter when the input file name list
>> is known to be too large.
>> * NEWS (Bug fixes): Mention it.
>
> I'd also mention why wc tries to read the whole list into RAM
> if there's place (so that it can auto align the numbers outputted).

I adjusted the log:

    wc: read and process --files0-from= input a name at a time,

    when the file name list is not too large.  Before, wc would always read
    the entire file name list into memory and *then* process each file name.
    wc does read the list into memory when the list is known not to be too
    large; this is done in order to be able to align the output numbers,
    as it does with arguments specified on the command-line
    * src/wc.c: Include "argv-iter.h".
    (main): Rewrite to use argv-iter when the input file name list
    is known to be too large.
    * NEWS (Bug fixes): Mention it.

> I'm not sure number alignment is worth this extra code/complexity,
> but it's written now :)

I could go either way, and vacillated as I wrote.  I'm not particularly
attached to wc's alignment feature, dislike the added stat-related
complexity, and yet, am reluctant to remove it.

> I presume you're not forgetting sort --files0-from as
> you've previously mentioned it.

There are a few reasons not to change sort.  Here's the main one:
sort is different from wc and du in that typical usage won't usually
involve more than 10s or 100s of files, and when there are many, the
combined sizes of their names should easily fit in a negligibly small
fraction of virtual memory.

Plus, it would be more disruptive.
There are too many interfaces that require a list of input file names.

What if someone is merging millions of files?
That might be a plausible use case.
For now, however, I'll leave sort.c alone.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]