bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#25832: split (v 8.25) with numeric suffixes beyond 89


From: Assaf Gordon
Subject: bug#25832: split (v 8.25) with numeric suffixes beyond 89
Date: Tue, 21 Feb 2017 21:40:39 -0500

Hello,

> On Feb 21, 2017, at 19:55, Holger Wolff <address@hidden> wrote:
> 
> Incorrect numeric suffixes are sometimes produced when going beyond number 89:
> Assume a file "test.txt" with 1000 lines, and the command
> 
> $ split -d -l 10 test.txt test_
> 
> I expect files test_00 through test_99, but what I get are test_00 through 
> test_89 and test_9000 through test_9009.

Thank you for the bug report.

I can confirm this is reproducible in the latest revision.

The immediate reason is that without a starting value,
coreutil's split has a feature to 'widen' the filename,
but the logic to widen it follows the alphabet widening
and doesn't work well for numeric widening.

That is, when not using numeric-suffixes,
'yz' (the last two letters) are widened to 'zaaa':

     $ seq 1000 | split -l 1 - foo_

will result in:

     ...
     foo_yy
     foo_yz
     foo_zaaa
     foo_zaab
     ...

And you are seeing the last two digits ('89')
widened in the same logic (to '9000').


Technically, if 'numeric_suffix_start'
is left as 'null' in the parsing of --numeric-suffix:
 http://git.savannah.gnu.org/cgit/coreutils.git/tree/src/split.c#n1455

then the widening logic behaves as if those were letters, not digits
in 'split.c:next_file_name()':
 http://git.savannah.gnu.org/cgit/coreutils.git/tree/src/split.c#n403



An immediate band-aid of defaulting to numeric_suffix_start=0
will result in an unintended consequences (a regression, perhaps):
If more files needs to be created, an explicit numeric start value prevents
filename widening (this wasn't the case in your example because 1000 lines fit 
in 100 files of 10 lines):

    # Works, filenames will be widened to 9010.
    $ seq 1001 | split -l 10 --numeric-suffix - foo_

    # Widening is not allowed (from default of 2 digits), split fails:
    $ seq 1001 | split -l 10 --numeric-suffix=0 - foo_
    split: output file suffixes exhausted


What do others think: default to no-widening for numeric suffixes,
or add code to 'next_file_name()' for numeric widening ?

-assaf






reply via email to

[Prev in Thread] Current Thread [Next in Thread]