[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-block] [Qemu-devel] [PATCH] pflash: Only read non-zero parts o
From: |
Markus Armbruster |
Subject: |
Re: [Qemu-block] [Qemu-devel] [PATCH] pflash: Only read non-zero parts of backend image |
Date: |
Mon, 13 May 2019 13:59:56 +0200 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) |
Xiang Zheng <address@hidden> writes:
> On 2019/5/10 23:16, Markus Armbruster wrote:
>> Xiang Zheng <address@hidden> writes:
>>
>>> On 2019/5/9 19:59, Markus Armbruster wrote:
>>>> Xiang Zheng <address@hidden> writes:
>>>>
>>>>> On 2019/5/8 21:20, Markus Armbruster wrote:
>>>>>> Laszlo Ersek <address@hidden> writes:
>>>>>>
>>>>>>> Hi Markus,
>>>>>>>
>>>>>>> On 05/07/19 20:01, Markus Armbruster wrote:
>>>>>>>> The subject is slightly misleading. Holes read as zero. So do
>>>>>>>> non-holes full of zeroes. The patch avoids reading the former, but
>>>>>>>> still reads the latter.
>>>>>>>>
>>>>>>>> Xiang Zheng <address@hidden> writes:
>>>>>>>>
>>>>>>>>> Currently we fill the memory space with two 64MB NOR images when
>>>>>>>>> using persistent UEFI variables on virt board. Actually we only use
>>>>>>>>> a very small(non-zero) part of the memory while the rest significant
>>>>>>>>> large(zero) part of memory is wasted.
>>>>>>>>
>>>>>>>> Neglects to mention that the "virt board" is ARM.
>>>>>>>>
>>>>>>>>> So this patch checks the block status and only writes the non-zero
>>>>>>>>> part
>>>>>>>>> into memory. This requires pflash devices to use sparse files for
>>>>>>>>> backends.
>>>>>>>>
>>>>>>>> I started to draft an improved commit message, but then I realized this
>>>>>>>> patch can't work.
>>>>>>>>
>>>>>>>> The pflash_cfi01 device allocates its device memory like this:
>>>>>>>>
>>>>>>>> memory_region_init_rom_device(
>>>>>>>> &pfl->mem, OBJECT(dev),
>>>>>>>> &pflash_cfi01_ops,
>>>>>>>> pfl,
>>>>>>>> pfl->name, total_len, &local_err);
>>>>>>>>
>>>>>>>> pflash_cfi02 is similar.
>>>>>>>>
>>>>>>>> memory_region_init_rom_device() calls
>>>>>>>> memory_region_init_rom_device_nomigrate() calls qemu_ram_alloc() calls
>>>>>>>> qemu_ram_alloc_internal() calls g_malloc0(). Thus, all the device
>>>>>>>> memory gets written to even with this patch.
>>>>>>>
>>>>>>> As far as I can see, qemu_ram_alloc_internal() calls g_malloc0() only to
>>>>>>> allocate the the new RAMBlock object called "new_block". The actual
>>>>>>> guest RAM allocation occurs inside ram_block_add(), which is also called
>>>>>>> by qemu_ram_alloc_internal().
>>>>>>
>>>>>> You're right. I should've read more attentively.
>>>>>>
>>>>>>> One frame outwards the stack, qemu_ram_alloc() passes NULL to
>>>>>>> qemu_ram_alloc_internal(), for the 4th ("host") parameter. Therefore, in
>>>>>>> qemu_ram_alloc_internal(), we set "new_block->host" to NULL as well.
>>>>>>>
>>>>>>> Then in ram_block_add(), we take the (!new_block->host) branch, and call
>>>>>>> phys_mem_alloc().
>>>>>>>
>>>>>>> Unfortunately, "phys_mem_alloc" is a function pointer, set with
>>>>>>> phys_mem_set_alloc(). The phys_mem_set_alloc() function is called from
>>>>>>> "target/s390x/kvm.c" (setting the function pointer to
>>>>>>> legacy_s390_alloc()), so it doesn't apply in this case. Therefore we end
>>>>>>> up calling the default qemu_anon_ram_alloc() function, through the
>>>>>>> funcptr. (I think anyway.)
>>>>>>>
>>>>>>> And qemu_anon_ram_alloc() boils down to mmap() + MAP_ANONYMOUS, in
>>>>>>> qemu_ram_mmap(). (Even on PPC64 hosts, because qemu_anon_ram_alloc()
>>>>>>> passes (-1) for "fd".)
>>>>>>>
>>>>>>> I may have missed something, of course -- I obviously didn't test it,
>>>>>>> just speculated from the source.
>>>>>>
>>>>>> Thanks for your sleuthing!
>>>>>>
>>>>>>>> I'm afraid you neglected to test.
>>>>>>
>>>>>> Accusation actually unsupported. I apologize, and replace it by a
>>>>>> question: have you observed the improvement you're trying to achieve,
>>>>>> and if yes, how?
>>>>>>
>>>>>
>>>>> Yes, we need to create sparse files as the backing images for pflash
>>>>> device.
>>>>> To create sparse files like:
>>>>>
>>>>> dd of="QEMU_EFI-pflash.raw" if="/dev/zero" bs=1M seek=64 count=0
>>>>> dd of="QEMU_EFI-pflash.raw" if="QEMU_EFI.fd" conv=notrunc
>>>>
>>>> This creates a copy of firmware binary QEMU_EFI.fd padded with a hole to
>>>> 64MiB.
>>>>
>>>>> dd of="empty_VARS.fd" if="/dev/zero" bs=1M seek=64 count=0
>>>>
>>>> This creates the varstore as a 64MiB hole. As far as I know (very
>>>> little), you should use the varstore template that comes with the
>>>> firmware binary.
>>>>
>>>> I use
>>>>
>>>> cp --sparse=always bld/pc-bios/edk2-arm-vars.fd .
>>>> cp --sparse=always bld/pc-bios/edk2-aarch64-code.fd .
>>>>
>>>> These guys are already zero-padded, and I use cp to sparsify.
>>>>
>>>>> Start a VM with below commandline:
>>>>>
>>>>> -drive
>>>>> file=/usr/share/edk2/aarch64/QEMU_EFI-pflash.raw,if=pflash,format=raw,unit=0,readonly=on\
>>>>> -drive
>>>>> file=/usr/share/edk2/aarch64/empty_VARS.fd,if=pflash,format=raw,unit=1 \
>>>>>
>>>>> Then observe the memory usage of the qemu process (THP is on).
>>>>>
>>>>> 1) Without this patch:
>>>>> # cat /proc/`pidof qemu-system-aarch64`/smaps | grep AnonHugePages: |
>>>>> grep -v ' 0 kB'
>>>>> AnonHugePages: 706560 kB
>>>>> AnonHugePages: 2048 kB
>>>>> AnonHugePages: 65536 kB // pflash memory device
>>>>> AnonHugePages: 65536 kB // pflash memory device
>>>>> AnonHugePages: 2048 kB
>>>>>
>>>>> # ps aux | grep qemu-system-aarch64
>>>>> RSS: 879684
>>>>>
>>>>> 2) After applying this patch:
>>>>> # cat /proc/`pidof qemu-system-aarch64`/smaps | grep AnonHugePages: |
>>>>> grep -v ' 0 kB'
>>>>> AnonHugePages: 700416 kB
>>>>> AnonHugePages: 2048 kB
>>>>> AnonHugePages: 2048 kB // pflash memory device
>>>>> AnonHugePages: 2048 kB // pflash memory device
>>>>> AnonHugePages: 2048 kB
>>>>>
>>>>> # ps aux | grep qemu-system-aarch64
>>>>> RSS: 744380
>>>>
>>>> Okay, this demonstrates the patch succeeds at mapping parts of the
>>>> pflash memory as holes.
>>>>
>>>> Do the guests in these QEMU processes run?
>>>
>>> Yes.
>>
>> Good to know, thanks.
>>
>>>>> Obviously, there are at least 100MiB memory saved for each guest.
>>>>
>>>> For a definition of "memory".
>>>>
>>>> Next question: what impact on system performance do you observe?
>>>>
>>>> Let me explain.
>>>>
>>>> Virtual memory holes get filled in by demand paging on access. In other
>>>> words, they remain holes only as long as nothing accesses the memory.
>>>>
>>>> Without your patch, we allocate pages at image read time and fill them
>>>> with zeroes. If we don't access them again, the kernel will eventually
>>>> page them out (assuming you're running with swap). So the steady state
>>>> is "we waste some swap space", not "we waste some physical RAM".
>>>>
>>>
>>> Not everybody wants to run with swap because it may cause low performance.
>>
>> Someone running without swap because he heard someone say someone said
>> swap may be slow is probably throwing away performance.
>>
>> But I assume you mean people running without swap because they measured
>> their workload and found it more performant without swap. Legitimate.
>
> Yes, and I had ever suffered from the high IO waits with swap.:)
>
>>
>>>> Your patch lets us map pflash memory pages containing only zeros as
>>>> holes.
>>>>
>>>> For pages that never get accessed, your patch avoids page allocation,
>>>> filling with zeroes, writing to swap (all one-time costs), and saves
>>>> some swap space (not commonly an issue).
>>>>
>>>> For pflash memory that gets accessed, your patch merely delays page
>>>> allocation from image read time to first access.
>>>>
>>>> I wonder how these savings and delays affect actual system performance.
>>>> Without an observable change in system performance, all we'd accomplish
>>>> is changing a bunch of numers in /proc/$pid/.
>>>>
>>>> What improvement(s) can you observe?
>>>
>>> We only use pflash device for UEFI, and we hardly care about the
>>> performance.
>>> I think the bottleneck of the performance is the MMIO emulation, even this
>>> patch would delay page allocation at the first access.
>>
>> I wasn't inquiring about the performance of the pflash device. I was
>> inquiring about *system* performance. But let me rephrase my question.
>>
>> Doing work to save resources is only worthwhile if something valuable
>> gets better in a measurable way. I'm asking you
>>
>> (1) to explain what exactly you value, and
>>
>> (2) to provide measurements that show improvement.
>>
>
> What we exactly value is the cost of memory resources and it is the only
> thing that this patch aims to resolve.
Then measure this cost!
> I am confused that why you think it will impact the system performance? Did I
> neglect something?
If the patch does not impact how the system as a whole performs, then
it's useless.
Since you find it useful, it must have some valuable[*] observable
effect for you. Tell us about it!
I keep asking not to torment you, but to guide you towards building a
compelling justification for your patch. However, I can only show you
the path; the walking you'll have to do yourself.
>>>> I guess the best case for your patch is many guests with relatively
>>>> small RAM sizes.
>>
>> .
>>
[*] Changing a bunch of numbers in /proc is not valuable.
- Re: [Qemu-block] [PATCH] pflash: Only read non-zero parts of backend image, (continued)
- Re: [Qemu-block] [PATCH] pflash: Only read non-zero parts of backend image, Peter Maydell, 2019/05/05
- Re: [Qemu-block] [Qemu-devel] [PATCH] pflash: Only read non-zero parts of backend image, Markus Armbruster, 2019/05/07
- Re: [Qemu-block] [Qemu-devel] [PATCH] pflash: Only read non-zero parts of backend image, Laszlo Ersek, 2019/05/07
- Re: [Qemu-block] [Qemu-devel] [PATCH] pflash: Only read non-zero parts of backend image, Markus Armbruster, 2019/05/08
- Re: [Qemu-block] [Qemu-devel] [PATCH] pflash: Only read non-zero parts of backend image, Xiang Zheng, 2019/05/09
- Re: [Qemu-block] [Qemu-devel] [PATCH] pflash: Only read non-zero parts of backend image, Markus Armbruster, 2019/05/09
- Re: [Qemu-block] [Qemu-devel] [PATCH] pflash: Only read non-zero parts of backend image, Xiang Zheng, 2019/05/10
- Re: [Qemu-block] [Qemu-devel] [PATCH] pflash: Only read non-zero parts of backend image, Markus Armbruster, 2019/05/10
- Re: [Qemu-block] [Qemu-devel] [PATCH] pflash: Only read non-zero parts of backend image, Xiang Zheng, 2019/05/11
- Re: [Qemu-block] [Qemu-devel] [PATCH] pflash: Only read non-zero parts of backend image,
Markus Armbruster <=
- Re: [Qemu-block] [Qemu-devel] [PATCH] pflash: Only read non-zero parts of backend image, Kevin Wolf, 2019/05/13
- Re: [Qemu-block] [Qemu-devel] [PATCH] pflash: Only read non-zero parts of backend image, Markus Armbruster, 2019/05/13