[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-block] [Qemu-devel] semantics of FIEMAP without FIEMAP_FLAG_SY

From: Vladimir Sementsov-Ogievskiy
Subject: Re: [Qemu-block] [Qemu-devel] semantics of FIEMAP without FIEMAP_FLAG_SYNC (was Re: [PATCH v5 13/14] nbd: Implement NBD_CMD_WRITE_ZEROES on server)
Date: Thu, 15 Feb 2018 19:40:25 +0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0

Hi all.

Two years later, is there are any news on the topic?

I can't understand the following thing:

 - FIEMAP without FLAG_SYNC is unsafe
 - FIEMAP with FLAG_SYNC is safe but slow
 - so, we've dropped FIEMAP and use only lseek. So, it means that lseek is safe _and_ fast (at least, faster than FIEMAP with FLAG_SYNC)...

but how is it possible, if lseek and fiemap share same code in the kernel?

Do your code with

    /* Found an extent, and we're inside it.  */
    *next = f.fe.fe_logical + f.fe.fe_length;
    if (f.fe.fe_flags & FIEMAP_EXTENT_UNWRITTEN) {
    } else {
        return BDRV_BLOCK_DATA;

provide safe block_status based on FIEMAP without FLAG_SYNC?

may help: link to discussion start:

22.07.2016 13:41, Paolo Bonzini wrote:
i.e. the use of fiemap to duplicate the exact layout of a file
from userspace is only posisble if you can /guarantee/ the source
file has not changed in any way during the copy operation at the
pointin time you finalise the destination data copy.
We don't do exactly that, exactly because it's messy when you have
concurrent accesses (which shouldn't be done but you never know).
Which means you *cannot make the assumption it won't happen*.
FIEMAP is not guaranteed to tell you exactly where the data in the
file is that you need to copy is and that nothing you can do from
userspace changes that. I can't say it any clearer than that.
You've said it very clearly.  But I'm not saying "fix the damn FIEMAP", I'm
saying "this is what we need, lseek doesn't provide it, FIEMAP comes
close but really doesn't".  If the solution is to fix FIEMAP, if it's
possible at all, that'd be great.  But any other solution is okay.

Do you at least agree that it's possible to use the kind of information
in struct fiemap_extent (extent start, length and flags) in a way that is
not racy, or at least not any different from SEEK_DATA and SEEK_HOLE's
raciness?  I'm not saying that you'd get that information from FIEMAP.
It's just the kind of information that I'd like to get.

(BTW, the documentation of FIEMAP is horrible.  It does not say at all
that FIEMAP_FLAG_SYNC is needed to return extents that match what
SEEK_HOLE/SEEK_DATA would return.  The obvious reading is that
FIEMAP_FLAG_SYNC would avoid returning FIEMAP_EXTENT_DELALLOC extents,
and that in turn would not be a problem if you don't need fe_physical.
Perhaps it would help if fiemap.txt said started with *why* would one
use FIEMAP, not just what it does).

When doing a copy, we use(d to use) FIEMAP the same way as you'd use lseek,
querying one extent at a time.  If you proceed this way, all of these
can cause the same races:

- pread(ofs=10MB, len=10MB) returns all zeroes, so the 10MB..20MB is
not copied

- pread(ofs=10MB, len=10MB) returns non-zero data, so the 10MB..20MB is

- lseek(SEEK_DATA, 10MB) returns 20MB, so the 10MB..20MB area is not

- lseek(SEEK_HOLE, 10MB) returns 20MB, so the 10MB..20MB area is

- ioctl(FIEMAP at 10MB) returns an extent starting at 20MB, so
the 10MB..20MB area is not copied
No, FIEMAP is not guaranteed to behave like this. what is returned
is filesystem dependent. Fielsystems that don't support holes will
return data extents. Filesystems that support compression might
return a compressed data extent rather than a hole. Encrypted files
might not expose holes at all, so people can't easily find known
plain text regions in the encrypted data. Filesystems could report
holes as deduplicated data, etc.  What do you do when FIEMAP returns
"OFFLINE" to indicate that the data is located elsewhere and will
need to be retrieved by the HSM operating on top of the filesystem
before layout can be determined?
lseek(SEEK_DATA) might also say you're not on a hole because the file
is compressed/encrypted/deduplicated/offline/whatnot.  So lseek is
also filesystem dependent, isn't it?  It also doesn't work on block
devices, so it's really file descriptor dependent.  That's not news.

Of course read, lseek and FIEMAP will not return exactly the same
information.  The point is that they're subject to exactly the same
races, and that struct fiemap_extent *can* be parsed conservatively.
The code I attached to the previous message does that, if it finds any
extent kind other than an unwritten extent it just treats it as data.

Based on this it would be nice to understand the reason why FIEMAP needs
FIEMAP_FLAG_SYNC to return meaningful values (the meaningful values
are there, they're just what lseek or read use), or to have a more
powerful function than just lseek(SEEK_DATA/SEEK_HOLE).  All we want is
to be able to distinguish between the three fallocate modes.

The assumptions being made about FIEMAP behaviour will only lead to
user data corruption, as they already have several times in the past.
Indeed, FIEMAP as is ranks just above gets() in usability.  But there's
no reason why it has to be that way.


Best regards,

reply via email to

[Prev in Thread] Current Thread [Next in Thread]