bug-tar
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-tar] [PATCH] Remove nonportable check for files containing only


From: Andreas Dilger
Subject: Re: [Bug-tar] [PATCH] Remove nonportable check for files containing only zeroes
Date: Wed, 17 Jan 2018 15:38:52 -0700

On Jan 10, 2018, at 4:50 AM, Pavel Raiskup <address@hidden> wrote:
> 
> On Wednesday, January 10, 2018 3:42:52 AM CET Mark H Weaver wrote:
>> From da922703282b0d3b8837a99a9c7fdd32f1d20d49 Mon Sep 17 00:00:00 2001
>> From: Mark H Weaver <address@hidden>
>> Date: Tue, 9 Jan 2018 20:16:14 -0500
>> Subject: [PATCH] Remove nonportable check for files containing only zeroes.
>> 
>> This check benefitted only one unlikely case (large files containing
>> only zeroes, on systems that do not support SEEK_HOLE)
> 
> It drops the optimization even for situations when SEEK_HOLE is not
> available, which is not 100% necessary.  I'm not proposing doing otherwise
> (I actually proposed this in [1]), but I'm rather CCing Andreas once more,
> as that's the original requester, the use-cases with lustre were
> definitely not unlikely and the question is whether SEEK_HOLE covers them
> nowadays.
> 
> [1] https://lists.gnu.org/archive/html/bug-tar/2016-07/msg00017.html

Sorry for the late reply on this thread.

It should be noted that the real problem was not related to backing
up files at the Lustre level, but rather backing up files directly from
the backing ext4 filesystem of the metadata target, for example if migrating
to new hardware, or for backup/restore of only that target.  The MDT stored
the actual file size in the metadata inode (could be GB or TB in size), but
the file data was stored on data servers on other nodes.

This meant that using the old tar versions to do the MDT backup might take
days at 100% CPU just to write sparse files to the tarball.  If tar is now
using SEEK_HOLE to determine sparseness, then the ext4-level backups with
newer systems should work OK (SEEK_HOLE was added to ext4 in the 3.0 kernel,
and was improved in 3.7, though a data consistency bug with unwritten data
was just fixed in 4.12).

That means SEEK_HOLE is NOT available in RHEL 6.x kernels, which are still
in fairly widespread (though declining) use.  I'd prefer that the heuristic
for sparse files without SEEK_HOLE not be removed completely, but I do think
that it needs to be fixed for the small inline file and file in cache cases.

>> and was based on an assumption about file system behavior that is not
>> mandated by POSIX and no longer holds in practice, namely that for
>> sufficiently large files, (st_blocks == 0) implies that the file
>> contains only zeroes.  Examples of file systems that violate this
>> assumption include Linux's /proc file system and Btrfs.

Is that comment correct, namely that btrfs has "large" files that report
st_blocks == 0 even though they contain data?  Or is this only for small
files that have inline data, or recently written files that are only in
cache?

>> * src/sparse.c (sparse_scan_file_wholesparse): Remove this function.
>> (sparse_scan_file_seek): Remove the initial check for files containing
>> only zeroes.
>> ---
>> src/sparse.c | 24 ------------------------
>> 1 file changed, 24 deletions(-)
>> 
>> diff --git a/src/sparse.c b/src/sparse.c
>> index d41c0ea..3de6560 100644
>> --- a/src/sparse.c
>> +++ b/src/sparse.c
>> @@ -261,26 +261,6 @@ sparse_scan_file_raw (struct tar_sparse_file *file)
>>   return tar_sparse_scan (file, scan_end, NULL);
>> }
>> 
>> -static bool
>> -sparse_scan_file_wholesparse (struct tar_sparse_file *file)
>> -{
>> -  struct tar_stat_info *st = file->stat_info;
>> -  struct sp_array sp = {0, 0};
>> -
>> -  /* Note that this function is called only for truly sparse files of size 
>> >= 1
>> -     block size (checked via ST_IS_SPARSE before).  See the thread
>> -     http://www.mail-archive.com/address@hidden/msg04209.html for more info 
>> */
>> -  if (ST_NBLOCKS (st->stat) == 0)
>> -    {
>> -      st->archive_file_size = 0;
>> -      sp.offset = st->stat.st_size;
>> -      sparse_add_map (st, &sp);
>> -      return true;
>> -    }
>> -
>> -  return false;
>> -}
>> -
>> #ifdef SEEK_HOLE
>> /* Try to engage SEEK_HOLE/SEEK_DATA feature. */
>> static bool
>> @@ -343,10 +323,6 @@ sparse_scan_file_seek (struct tar_sparse_file *file)
>> static bool
>> sparse_scan_file (struct tar_sparse_file *file)
>> {
>> -  /* always check for completely sparse files */
>> -  if (sparse_scan_file_wholesparse (file))
>> -    return true;
>> -
>>   switch (hole_detection)
>>     {
>>     case HOLE_DETECTION_DEFAULT:
>> 
> 
> 
> 
> 


Cheers, Andreas





Attachment: signature.asc
Description: Message signed with OpenPGP


reply via email to

[Prev in Thread] Current Thread [Next in Thread]