bug-tar
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-tar] Detection of sparse files is broken on btrfs


From: Joerg Schilling
Subject: Re: [Bug-tar] Detection of sparse files is broken on btrfs
Date: Wed, 10 Jan 2018 11:12:23 +0100
User-agent: Heirloom mailx 12.5 7/5/10

Tim Kientzle <address@hidden> wrote:

> What is the most efficient (preferably portable) way for an archiving program 
> (such as tar) to determine whether it should archive a particular file as 
> sparse or non-sparse?

IIRC, a lseek() call is aprox. 2 microseconds. I did some reseach in 2005 when 
implemented support for SEEK_HOLE. IIRC, SEEK_HOLE was implemented by Sun in 
spring 2005 after I discussed methods for a useful and performant interface 
with Jeff Bonwick from the Sun ZFS team. At that time, Sun told us that we 
shopuld first use fpathconf(f, _PC_MIN_HOLE_SIZE) to find whether a specific 
filesystem supports SEEK_HOLE and I believed that another syscall would be bad 
for the perfornamce. It turned out that this is not the case and I finally 
started to use fpathconf(f, _PC_MIN_HOLE_SIZE) in 2006.

The current method I use is to call:

        lseek(f, (off_t)0, SEEK_HOLE);

If this returns EINVAL, the OS does not support SEEK_HOLE (I use private 
#defines for SEEK_HOLE == 4 and SEEK_DATA == 3) to check this.

It it returns ENOTSUP, the specific filesystem does not support SEEK_HOLE.

If the return value is >= st_size, the file is not sparse, as there is only the 
virtual hole past the end of the file.

> Historically, we?ve compared st_nblocks to st_size to quickly determine if a 
> file is sparse in order to avoid the SEEK_HOLE scan in most cases.  Some 
> programs even consider st_nblocks == 0 as an indication that a file is 
> entirely sparse.  Based on the claims I?ve read here, it sounds like 
> st_nblocks can no longer be trusted for these purposes.
>
> So is there some other way to quickly identify sparse files so we can avoid 
> the SEEK_HOLE scan for non-sparse files?

Star only uses this method in case that SEEK_HOLE is not supported.
In addition, I changed my algorithm regarding st_blocks == 0 and the assumtion 
that the file only consists of a single hole in October 2013 after I discovered 
that NetAPP stores files up to 64 bytes in the inode.

Otherwise the fallback algorithm for sparse files on a dump OS is:

        st_size > (st_blocks * DEV_BSIZE) + DEV_BSIZE

Jörg

-- 
 EMail:address@hidden                    (home) Jörg Schilling D-13353 Berlin
    address@hidden (work) Blog: http://schily.blogspot.com/
 URL: http://cdrecord.org/private/ http://sf.net/projects/schilytools/files/'



reply via email to

[Prev in Thread] Current Thread [Next in Thread]