[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: split behavior
Re: split behavior
Mon, 14 Sep 2009 09:59:54 +0100
Thunderbird 126.96.36.199 (X11/20071008)
Roger McNichols wrote:
> Thanks for the feedback.
>> Do you mean select the appropriate suffix length based on size,
>> or do you mean the zzaa, zzab scheme? The former wouldn't
>> help when processing a pipe for example so I'd probably
>> stick with the latter method for consistency.
> Currently, split (at least 5.2.1) DOES pick the suffix size based on the file
> size when used as "split -<#> file" and the file size is known.
I checked the repo and can't see code supporting that.
Perhaps you've got a locally modified `split` ?
> But as you
> point out, if the file is a pipe you may still run out of suffixes if the
> file size
> changes after invocatio of slpit, or if split is used in the "split -<#> -"
> (reads stdin) mode, a 2-letter suffix is all you get unless you specify a
> Now I suppose that maybe the discussion went something like:
> >> what if an unknown-sized input stream is the input?
> >> well then just use -a 100 and you will never* run out...
> (*note 26^100 is pretty big)
> Anyway, I propose to develop a new commandline option that would invoke the
> suffix formation behavior. And even though aa ... zaa ... zzaa ... instead
> aa .. zzaa ... zzzzaa (as well as many other schemes) would work just as well,
Bzzt. zaa would sort before zb
In general one needs to append 'z'*suffix_len which would default to 2 if not
One would need to consider this behaviour with digit suffixes also.
> I propose to utilize the 'old' one for the added advantage of reverse
OK. While I like the scheme it would be really nice to see what we're being
with. I.E. it would be great if you found where the old split you used came
> That way any code that relied on the old scheme for counting would be able to
> re-functionalized with a simple addition of a commandline argument.
>> if the suffix len is specified and is too small.
>> Otherwise we use the zzaa, zzab method as described before.
> This is also a good idea, but it might override the users intention which
> be to use split to detect a file that was more that 676*N lines long or to
> use it
> with the -1 option and only write our the first 676 lines of the input
That's exceedingly unlikely. It would be great to have the "unlimited" behaviour
by default I think. As mentioned before we could have the "limited" behaviour
if POSIXLY_CORRECT is set.
> (who knows why, but we're fixing a fix that broke something else, right?)
I can't see the code for the old behaviour so I wouldn't assume that.
Re: split behavior, Pádraig Brady, 2009/09/11