[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
What prevents discarding a cluster during rewrite?
From: |
Vladimir Sementsov-Ogievskiy |
Subject: |
What prevents discarding a cluster during rewrite? |
Date: |
Tue, 23 Feb 2021 00:30:53 +0300 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.7.1 |
Hi all!
Thinking of how to prevent dereferencing to zero (and discard) of host cluster
during flush of compressed cache (which I'm working on now), I have a
question.. What prevents it for normal writes?
A simple interactive qemu-io session on master branch:
./qemu-img create -f qcow2 x 1M
[root@kvm build]# ./qemu-io blkdebug::x
do initial write:
qemu-io> write -P 1 0 64K
wrote 65536/65536 bytes at offset 0
64 KiB, 1 ops; 00.12 sec (556.453 KiB/sec and 8.6946 ops/sec)
rewrite, and break before write (assume long write by fs or hardware for some
reason)
qemu-io> break write_aio A
qemu-io> aio_write -P 2 0 64K
blkdebug: Suspended request 'A'
OK, we stopped before write. Everything is already allocated on initial write,
mutex now resumed.. And suddenly we do discard:
qemu-io> discard 0 64K
discard 65536/65536 bytes at offset 0
64 KiB, 1 ops; 00.00 sec (146.034 MiB/sec and 2336.5414 ops/sec)
Now, start another write, to another place.. But it will allocate same host
cluster!!!
qemu-io> write -P 3 128K 64K
wrote 65536/65536 bytes at offset 131072
64 KiB, 1 ops; 00.08 sec (787.122 KiB/sec and 12.2988 ops/sec)
Check it:
qemu-io> read -P 3 128K 64K
read 65536/65536 bytes at offset 131072
64 KiB, 1 ops; 00.00 sec (188.238 MiB/sec and 3011.8033 ops/sec)
resume our old write:
qemu-io> resume A
blkdebug: Resuming request 'A'
qemu-io> wrote 65536/65536 bytes at offset 0
64 KiB, 1 ops; 0:05:07.10 (213.400382 bytes/sec and 0.0033 ops/sec)
of course it doesn't influence first cluster, as it is discarded:
qemu-io> read -P 2 0 64K
Pattern verification failed at offset 0, 65536 bytes
read 65536/65536 bytes at offset 0
64 KiB, 1 ops; 00.00 sec (726.246 MiB/sec and 11619.9352 ops/sec)
qemu-io> read -P 0 0 64K
read 65536/65536 bytes at offset 0
64 KiB, 1 ops; 00.00 sec (632.348 MiB/sec and 10117.5661 ops/sec)
But in 3rd cluster data is corrupted now:
qemu-io> read -P 3 128K 64K
Pattern verification failed at offset 131072, 65536 bytes
read 65536/65536 bytes at offset 131072
64 KiB, 1 ops; 00.00 sec (163.922 MiB/sec and 2622.7444 ops/sec)
qemu-io> read -P 2 128K 64K
read 65536/65536 bytes at offset 131072
64 KiB, 1 ops; 00.00 sec (257.058 MiB/sec and 4112.9245 ops/sec
So, that's a classical use-after-free... For user it looks like racy
write/discard to one cluster may corrupt another cluster... It may be even
worse, if use-after-free corrupts metadata.
Note, that initial write is significant, as when we do allocate cluster we
write L2 entry after data write (as I understand), so the race doesn't happen.
But, if consider compressed writes, they allocate everything before write..
Let's check:
[root@kvm build]# ./qemu-img create -f qcow2 x 1M; ./qemu-io blkdebug::x
Formatting 'x', fmt=qcow2 cluster_size=65536 extended_l2=off
compression_type=zlib size=1048576 lazy_refcounts=off refcount_bits=16
qemu-io> break write_compressed A
qemu-io> aio_write -c -P 1 0 64K
qemu-io> compressed: 327680 79
blkdebug: Suspended request 'A'
qemu-io> discard 0 64K
discarded: 327680
discard 65536/65536 bytes at offset 0
64 KiB, 1 ops; 00.01 sec (7.102 MiB/sec and 113.6297 ops/sec)
qemu-io> write -P 3 128K 64K
normal cluster alloc: 327680
wrote 65536/65536 bytes at offset 131072
64 KiB, 1 ops; 00.06 sec (1.005 MiB/sec and 16.0774 ops/sec)
qemu-io> resume A
blkdebug: Resuming request 'A'
qemu-io> wrote 65536/65536 bytes at offset 0
64 KiB, 1 ops; 0:00:15.90 (4.026 KiB/sec and 0.0629 ops/sec)
qemu-io> read -P 3 128K 64K
Pattern verification failed at offset 131072, 65536 bytes
read 65536/65536 bytes at offset 131072
64 KiB, 1 ops; 00.00 sec (237.791 MiB/sec and 3804.6539 ops/sec)
(strange, but seems it didn't fail several times for me.. But now it fails
several times... Anyway, it's all not good).
--
Best regards,
Vladimir
- What prevents discarding a cluster during rewrite?,
Vladimir Sementsov-Ogievskiy <=