[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: split: support unlimited number of split files
From: |
Jérémy Compostella |
Subject: |
Re: split: support unlimited number of split files |
Date: |
Sat, 10 Mar 2012 12:13:02 +0100 |
2012/3/10 Pádraig Brady <address@hidden>:
> On 03/01/2012 07:44 PM, Jérémy Compostella wrote:
>> Hi Pádraig,
>>
>> Finally, I found time to work again on this patch. I provided it as
>> attachment.
>>
>> I'm not completely satisfied with the documentation part. I've tried to
>> be more specific but it becomes quickly complicated. So I get back the
>> original explanation which does not completely satisfied me since it
>> does not explain the following point:
>> - When -a is not specified:
>> - output file names are considered exhausted when the first suffix
>> character will become 'z'.
>> -'z[a-z]' suffixes are never used
>> - the fact that suffix length is increased each time the suffix are
>> "exhausted".
>> But maybe it's not relevant to be so specific.
>>
>> What is your opinion about this point ?
>
> Well documentation can easily include too much detail.
> But one should try to clarify any ambiguities
> a user might be considering.
>
> I adjusted that documentation as follows:
>
> @@ -2990,11 +2990,15 @@ The output files' names consist of @var{prefix}
> (@samp{>
> followed by a group of characters (@samp{aa}, @samp{ab}, @dots{} by
> default), such that concatenating the output files in traditional
> sorted order by file name produces the original input file (except
> -@option{-nr/@var{n}}). If more than 676 output files are required and
> -@samp{-a} is not specified, @command{split} uses @samp{zaaa},
> -@samp{zaab}, etc. If the @option{-a} is specified and the output file
> -names are exhausted, @command{split} reports an error without deleting
> -the output files that it did create.
> +@option{-nr/@var{n}}). By default split will initially create files
> +with two generated suffix characters, and will increase this width by two
> +when the next most significant position reaches the last character.
> +(@samp{yz}, @samp{zaaa}, @dots{}). In this way an arbitrary number
> +of output files are supported which sort as described above,
> +even in the presence of an @option{--additional-suffix} option.
> +If the @option{-a} option is specified and the output file names are
> exhausted,
> +@command{split} reports an error without deleting the output files
> +that it did create.
I like it.
> I made a few adjustments to the code also.
>
> Disable this feature if --numeric=FROM is specified,
> because these suffixes are not all consecutive
> and hence we can't support arbitrary FROM values.
>
> Disable this feature if a specific number of
> files is specified with the -n option.
> We already auto select the width appropriately.
>
> I could find no logic flaws in the implementation in new_file_name().
> Well there was one nit with:
>
> sufindex = xrealloc (sufindex, suffix_length * sizeof *sufindex);
>
> That should be as follows to avoid overflow issues:
>
> sufindex = xnrealloc (sufindex, suffix_length, sizeof *sufindex);
Fair point.
> While looking at that I noticed a fair bit of overlap
> from the existing fixed width suffix initialization,
> so I merged the two with a goto. That brought the
> new implementation down from 15 to 8 significant lines,
> and those 8 are now mostly simple integer width adjustments.
> Also by doing this we get to take advantage of the _PC_NAME_MAX
> check in this section.
I've been formatted by "Please, never use goto keyword in your C
program" at university and work too. Indeed, I've thought to a solution
like that but I didn't find a really smart solution without the goto
keyword.
> I also added a test.
>
> Full patch is attached.
> I hope to apply this sometime tomorrow.
BTW, reviewing your changes I discovered that when a user specifies a
suffix length zero the old behavior was to use the default length of 2
and the new is, of course, the suffix auto length way. However, IMHO,
it's dangerous to not raise an error. If a user for example make a
script which use split with a shell variable for the suffix_length
option it could have the feeling that his script works fine although it
provides a suffix length zero.
I attached an independent patch since there is no semantic relationship
with the current one. Feel free to merge, review or refuse it.
Cheers,
Jérémy
0001-split-suffix-length-zero-is-not-permitted.patch
Description: Text Data
- Re: split: support unlimited number of split files, (continued)
- Re: split: support unlimited number of split files, Pádraig Brady, 2012/03/01
- Re: split: support unlimited number of split files, Jérémy Compostella, 2012/03/02
- Re: split: support unlimited number of split files, Pádraig Brady, 2012/03/02
- Re: split: support unlimited number of split files, Jérémy Compostella, 2012/03/02
- Re: split: support unlimited number of split files, Pádraig Brady, 2012/03/02
- Re: split: support unlimited number of split files, Jérémy Compostella, 2012/03/02
- Re: split: support unlimited number of split files, Pádraig Brady, 2012/03/02
- Re: split: support unlimited number of split files, Jérémy Compostella, 2012/03/02
- Re: split: support unlimited number of split files, Jérémy Compostella, 2012/03/09
Re: split: support unlimited number of split files, Pádraig Brady, 2012/03/09
- Re: split: support unlimited number of split files,
Jérémy Compostella <=