bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#22624: [bug-coreutils] coreutils-8.25: big success, but problem on G


From: Pádraig Brady
Subject: bug#22624: [bug-coreutils] coreutils-8.25: big success, but problem on GNU/Hurd
Date: Fri, 12 Feb 2016 21:07:06 -0800
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0

On 12/02/16 10:18, Paul Eggert wrote:
> On 02/11/2016 08:13 PM, Pádraig Brady wrote:
>> The changes look good, except for this:
>>
>>    $ seq 1000 | split -n4
>>    $ seq 100000 | split -n4
>>    split: -: cannot determine file size: Illegal seek
>>
>> I.E. it would be better to indicate immediately
>> if there is an issue determining the file size,
>> because it's a gotcha that may hit users as data increases,
>> and -n is complex enough anyway, that it's better to
>> do as much checking up front as possible.
>> I'd still disallow this case even for -n1 in case the
>> number was parameterized to number of CPUs or whatever.
> 
> Hmm, well, I already spent too much time on this so I think I'll check 
> in what I have (since it fixes the GNU/Hurd problem) and let it 
> percolate a bit first.
> 
> I have some qualms about the approach suggested above, as it would cause 
> 'split' to give up on files that it currently handles (e.g., typical 
> files in /proc), on the theory that we don't want to spoil users into 
> thinking that 'split' can handle larger files.

I've attached a patch that keeps support for /proc (seekable) files,
while immediately failing for pipes.  Also it fixes a regression
for the the -n r/... case, where it again exits immediately
when all --filters have exited.

> It'd be better to fix 
> 'split' to handle the larger files. It could do this for a troublesome 
> case (e.g., a large /proc file) by copying the file's data into the 
> first output file F1, then doing a split-in-place from F1 to the 
> remaining output files F2 ... Fn (this would be done by copying to F2 
> ... Fn and then truncating F1).

Clever. Theoretically that could support pipes as input too!
That also got me thinking that split(1) could be made very efficient
with an existing regular file, where reflink(range) is supported,
by reflinking the new files to the existing parts of the data.

> If the input file is /dev/zero, though, 
> 'split' should just give up right away as it does now, as there's no 
> point in copying forever.

+1

thanks,
Pádraig.

Attachment: split-n-fixes.patch
Description: Text Data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]