coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

split: support unlimited number of split files


From: Jérémy Compostella
Subject: split: support unlimited number of split files
Date: Fri, 24 Feb 2012 23:08:18 +0100

All,

I'm interesting in implementing this feature. In fact, I already made a
quick implementation to play with.

I refer to the original thread : "split behavior"
http://lists.gnu.org/archive/html/bug-coreutils/2009-09/msg00217.html

To summarise it (quick version), in the past the split command provided
this unlimited number of split files as its default behavior. But it did
not conform to POSIX, so it has been removed (see
http://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=commit;h=65cbf7d1).

This old behavior was:
$ cat /var/log/messages | split -2 - /tmp/x.
x.aa
x.ab
...
x.yz
x.zaaa
x.zaab
...
x.zyzz
x.zzaaaa
x.zzaaab

But, others in the "split behavior" thread propose something like:
x.aa
...
x.zz
x.zzaa
...
x.zzzz
x.zzzzaa

These two possibilities deserves the same goal, split files order, once
alphabetically sorted, is the correct order.

However, the second possibility does not satisfy me since it will make the
use of the --additional-suffix option break this:
$ cat /var/log/messages | split --additional-suffix=.txt -2 - /tmp/x. && ls 
/tmp/x.* | sort
x.aa.txt
...
x.zy.txt
x.zzaa.txt
...
x.zztw.txt
x.zz.txt      <---- :(
x.zztx.txt
...

Therefore, my opinion is : the old behavior is more adapted to the
current split option set.

In the "split behavior" thread it was proposed to look at the
POSIXLY_CORRECT environment variable to activate or not the unlimited
split files behavior. But, I think it's dangerous. Indeed, it breaks the
usual files list: x.aa ... x.zz ... vs. x.aa ... x.yz x.zaa .. (the x.zz
file does not exist anymore). User may be surprised and older scripts
may failed.

Maybe adding a new option or a new argument would be fine, I was
thinking to the following:
* --unlimited-suffixes
* --suffix-length=unlimited or --suffix-length=auto

With this new option (or argument), user would keep the ability to
select the start suffix length. For example:
$ cat /var/log/messages | split --suffix-length=auto --suffix-length 3 -2 - 
/tmp/x.
x.aaa  <--- start with suffix length = 3
x.aab
...
x.yzz
x.zaaaa
x.zaaab
...
x.zyzzz
x.zzaaaaa
x.zzaaaab

Cheers,

Jérémy



reply via email to

[Prev in Thread] Current Thread [Next in Thread]