[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: split overwriting already existing files
From: |
Pádraig Brady |
Subject: |
Re: split overwriting already existing files |
Date: |
Thu, 03 Jul 2014 09:24:01 +0100 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130110 Thunderbird/17.0.2 |
On 07/03/2014 07:12 AM, Bernhard Voelker wrote:
> Analyzing bug#17904, I came across the idea that split(1) could
> possibly do something weird, i.e. delete the "aa" file, when
> an output file already exists. Well split(1) doesn't delete it,
> but rather overwrites it:
>
> $ wc -l file
> 25000 file
>
> $ cp -p file file-newaa
>
> $ ls -log file*
> total 5864
> -rw-r--r-- 1 2999930 Jul 3 07:47 file
> -rw-r--r-- 1 2999930 Jul 3 07:47 file-newaa
>
> $ find . -size +1000 -exec ~/coreutils/src/split --verbose -l 10000 {\}
> {}-new \;
> creating file ‘./file-newaa-newaa’
> creating file ‘./file-newaa-newab’
> creating file ‘./file-newaa-newac’
> creating file ‘./file-newaa’
> creating file ‘./file-newab’
> creating file ‘./file-newac’
>
> find(1) was obviously passing "file-newaa" first to split(1).
> But the second split(1) run has silently overwritten the
> already existing "file-newaa"!
>
> $ ls -log
> total 8796
> -rw-r--r-- 1 2999930 Jul 3 07:47 file
> -rw-r--r-- 1 1194980 Jul 3 07:48 file-newaa
> -rw-r--r-- 1 1194980 Jul 3 07:48 file-newaa-newaa
> -rw-r--r-- 1 1203284 Jul 3 07:48 file-newaa-newab
> -rw-r--r-- 1 601666 Jul 3 07:48 file-newaa-newac
> -rw-r--r-- 1 1203284 Jul 3 07:48 file-newab
> -rw-r--r-- 1 601666 Jul 3 07:48 file-newac
>
> There's nothing explicitly about overwriting in the Texinfo manual,
> but as it always says "the output file is created", I would assume
> that O_CREAT is used.
>
> This is what POSIX [1] says about the output files:
>
> [1] http://pubs.opengroup.org/onlinepubs/9699919799/utilities/split.html
>
> The output files contain portions of the original input file;
> otherwise, unchanged.
>
> I'm not sure if that latter mandates to use O_CREAT, but I'd
> consider failing here would be better than losing data.
>
> Before looking into the code, do you think we should change this?
I would say no because you would often want split
to overwrite the existing output set.
There was a related protection added recently
to not overwrite input files:
http://git.sv.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=ae584644
Other than that I can't think of other protections we could provide,
apart from adding a --no-clobber option, but I'm not sure that's warranted.
thanks,
Pádraig.