bug-tar
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-tar] Detection of sparse files is broken on btrfs


From: Joerg Schilling
Subject: Re: [Bug-tar] Detection of sparse files is broken on btrfs
Date: Mon, 8 Jan 2018 12:23:22 +0100
User-agent: Heirloom mailx 12.5 7/5/10

Mark H Weaver <address@hidden> wrote:

> I just got bitten by the same problem reported back in July 2016:
>
>   https://lists.gnu.org/archive/html/bug-tar/2016-07/msg00000.html
>
> At the time, Joerg Schilling unilaterally refused to fix the bug,
> claiming that Btrfs was broken and violated POSIX, although when asked
> for a reference to back that up he never provided one.  Everyone else in
> the thread disagreed with him, but the bug never got fixed.

Of course I provided that reference by pointing to the POSIX standard.

In order to make sure that every constraint is correct, I may enhance my 
statement:

In theory, a filesystem could put data for a tiny file into some kind of "free 
space" in the meta-data-storage (sometimes called "inode") and thus legally 
report st_blocks == 0. But this would not be allowed to change as a result of 
just a "sync()" operation.

But note that a file that could be sparse needs to have a minimal size of 
DEV_BSIZE in order to be "sparse" while known implementations do not store more 
than 64 bytes in that location.


> Paul Eggert argued that there's no guarantee that st_blocks must be zero
> for a file with nonzero data.  As an example, he pointed out that if all
> of the file's data fits within the inode, it would be reasonable to
> report st_blocks == 0 for a file with nonzero data.

See above.... BTW: There is a related comment in star/hole.c that explains that 
NetApp puts file data up to 64 bytes completely into the meta data storage and 
the method used by star avoids calling a file with st_blocks == 0 sparse as long
as it follows the POSIX semantics.



> Others pointed out that in Linux's /proc filesystem, all files have
> st_blocks == 0.  That is also the case on my system running
> linux-libre-4.14.12.  Joerg claimed that his /proc filesystem reported
> nonzero st_blocks, but he was the only one in the thread who did so.

This is incorrect: I pointed out that the *original* /proc filesystem 
implementation always returns st_blocks != 0 if st_size != 0. If you encounter 
a /proc filesystem that st_blocks == 0, this must be a buggy inofficial clone
implementation.


> It was also pointed out that with the advent of SEEK_HOLE and SEEK_DATA,
> the st_blocks hack is no longer needed for efficiency on modern systems.
>
> I see from the GNU maintainers file that Paul Eggert is a maintainer for
> GNU tar, and Joerg Schilling is not, so I don't see why we should let
> Joerg continue to prevent us from fixing this bug.

Given that http://austingroupbugs.net/view.php?id=415#c862 defines SEEK_HOLE
and SEEK_DATA already and given that most OS alreday implement it, it would be 
the best way to just follow the accepted standard.

BTW: I am in the group of core POSIX maintainers.

> I propose that we revisit this bug and fix it.  We clearly cannot assume
> that st_blocks == 0 implies that the file contains only zeroes.  This
> bug is fairly serious for anyone using btrfs and possibly other
> filesystems, as it has the potential to lose user data.

I cannot speak for gnu tar, but star does not call a file "sparse" as long as 
this file follows POSIX semantics. This is implemented by requiring the size of 
the file (st_size) to be at least DEV_BSIZE larger than the size computed from 
st_blocks in order to be treated as "sparse".


Conclusion: If btrfs returns st_blocks == 0 for larger (non sparse) files, this 
is a POSIX non-compliance that needs to be fixed.

Jörg

-- 
 EMail:address@hidden                    (home) Jörg Schilling D-13353 Berlin
    address@hidden (work) Blog: http://schily.blogspot.com/
 URL: http://cdrecord.org/private/ http://sf.net/projects/schilytools/files/'



reply via email to

[Prev in Thread] Current Thread [Next in Thread]