|
From: | Eric Blake |
Subject: | [coreutils] Re: [Bug-tar] [PATCH] improved sparse file detection |
Date: | Tue, 24 Aug 2010 16:45:19 -0600 |
User-agent: | Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.8) Gecko/20100806 Fedora/3.1.2-1.fc13 Mnenhy/0.8.3 Thunderbird/3.1.2 |
[adding coreutils] On 08/24/2010 09:17 AM, Bernd Schubert wrote:
Hi all, for improved stat() performance the Lustre filesystem uses entirely empty sparse files on its metadata target (MDT). Now with hundredes of millions of sparse file of huge sizes, creating a backup of of the MDT using vanilla gnu-tar is basically impossible, as it needs far too much time to detect sparse files.
Coreutils cp(1) has recently started using code to efficiently iterate over the locations of all holes within sparse files, with the goal of eventually being able to target both Linux ioctls and Solaris SEEK_HOLE directives. I think that could also be leveraged rather nicely for tar's detection of sparse files, by stopping the iteration after the first hole has been found; in particular, it would rapidly detect files that are not completely sparse (whereas the description of your patch implies that you only address the subset of quickly detecting a completely sparse file, but offer no speedup on partially sparse files). Thus, coreutils' sparse file management is a great candidate for migrating into gnulib and sharing among several projects.
Meanwhile, if you are indeed correct that there are easy ways to detect completely sparse files, even when the ioctl or SEEK_HOLE directives are not present, then the coreutils cp(1) hole iteration routine should probably be taught that corner case to recognize an entirely sparse file as a single hole.
PS: I'm used to linux-style indentation and I'm not sure if I did it the right way. If it is wrong, please complain and I will try to reformat it.
Thanks for taking the time to contribute a patch. However, the diffstat says that your patch is large enough to fall outside the bounds of trivial submissions, so I quit reading it to avoid any copyright issues. Would you be willing to assign copyright to the FSF? If so, we can start the paperwork process off-list.
-- Eric Blake address@hidden +1-801-349-2682 Libvirt virtualization library http://libvirt.org
[Prev in Thread] | Current Thread | [Next in Thread] |