[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: split behavior

From: Pádraig Brady
Subject: Re: split behavior
Date: Mon, 14 Sep 2009 09:59:54 +0100
User-agent: Thunderbird (X11/20071008)

Roger McNichols wrote:
> Thanks for the feedback.
>> Do you mean select the appropriate suffix length based on size,
>> or do you mean the zzaa, zzab scheme? The former wouldn't
>> help when processing a pipe for example so I'd probably
>> stick with the latter method for consistency.
> Currently, split (at least 5.2.1) DOES pick the suffix size based on the file 
> size when used as "split -<#> file" and the file size is known.

I checked the repo and can't see code supporting that.
Perhaps you've got a locally modified `split` ?

> But as you 
> point out, if the file is a pipe you may still run out of suffixes if the 
> file size
> changes after invocatio of slpit, or if split is used in the "split -<#> -" 
> (reads stdin) mode, a 2-letter suffix is all you get unless you specify a 
> length.
> Now I suppose that maybe the discussion went something like:
>   >> what if an unknown-sized input stream is the input?
>   >> well then just use -a 100  and you will never* run out...
>      (*note 26^100 is pretty big)
> Anyway, I propose to develop a new commandline option that would invoke the 
> 'old'
> suffix formation behavior.  And even though aa ... zaa ... zzaa ... instead 
> of 
> aa .. zzaa ... zzzzaa (as well as many other schemes) would work just as well,

Bzzt. zaa would sort before zb
In general one needs to append 'z'*suffix_len which would default to 2 if not 
One would need to consider this behaviour with digit suffixes also.

> I propose to utilize the 'old' one for the added advantage of reverse 
> compatibility.

OK. While I like the scheme it would be really nice to see what we're being 
with. I.E. it would be great if you found where the old split you used came 

> That way any code that relied on the old scheme for counting would be able to 
> be
> re-functionalized with a simple addition of a commandline argument.
>> if the suffix len is specified and is too small.
>> Otherwise we use the zzaa, zzab method as described before.
> This is also a good idea, but it might override the users intention which 
> could 
> be to use split to detect a file that was more that 676*N lines long or to 
> use it 
> with the -1 option and only write our the first 676 lines of the input 

That's exceedingly unlikely. It would be great to have the "unlimited" behaviour
by default I think. As mentioned before we could have the "limited" behaviour

> (who knows why, but we're fixing a fix that broke something else, right?)

I can't see the code for the old behaviour so I wouldn't assume that.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]