[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH RFC 2/5] s390x: implement diag260

From: David Hildenbrand
Subject: Re: [PATCH RFC 2/5] s390x: implement diag260
Date: Wed, 15 Jul 2020 19:38:49 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.9.0

On 15.07.20 18:14, Heiko Carstens wrote:
> On Wed, Jul 15, 2020 at 01:42:02PM +0200, David Hildenbrand wrote:
>>> So, are you saying that even at IPL time there might already be memory
>>> devices attached to the system? And the kernel should _not_ treat them
>>> as normal memory?
>> Sorry if that was unclear. Yes, we can have such devices (including
>> memory areas) on a cold boot/reboot/kexec. In addition, they might pop
>> up at runtime (e.g., hotplugging a virtio-mem device). The device is in
>> charge of exposing that area and deciding what to do with it.
>> The kernel should never treat them as normal memory (IOW, system RAM).
>> Not during a cold boot, not during a reboot. The device driver is
>> responsible for deciding how to use that memory (e.g., add it as system
>> RAM), and which parts of that memory are actually valid to be used (even
>> if a tprot might succeed it might not be valid to use just yet - I guess
>> somewhat similar to doing a tport on a dcss area - AFAIK, you also don't
>> want to use it like normal memory).
>> E.g., on x86-64, memory exposed via virtio-mem or virtio-pmem is never
>> exposed via the e820 map. The only trace that there might be *something*
>> now/in the future is indicated via ACPI SRAT tables. This takes
>> currently care of indicating the maximum possible PFN.
> Ok, but all of this needa to be documented somewhere. This raises a
> couple of questions to me:

I assume this mostly targets virtio-mem, because the semantics of
virtio-mem provided memory are extra-weird (in contrast to rather static
virtio-pmem, which is essentially just an emulated NVDIMM - a disk
mapped into physical memory).

Regarding documentation (some linked in the cover letter), so far I have

1. https://virtio-mem.gitlab.io/
2. virtio spec proposal [1]
3. QEMU 910b25766b33 ("virtio-mem: Paravirtualized memory hot(un)plug")
4. Linux 5f1f79bbc9 ("virtio-mem: Paravirtualized memory hotplug")
5. Linux cover letter [2]
6. KVM forum talk [3] [4]

As your questions go quite into technical detail, and I don't feel like
rewriting the doc here :) , I suggest looking at [2], 1, and 5.

> What happens on

I'll stick to virtio-mem when answering regarding "special memory". As I
noted, there might be more in the future.

> - IPL Clear with this special memory? Will it be detached/away afterwards?

A diag308(0x3) - load clear - will usually* zap all virtio-mem provided
memory (discard backing storage in the hypervisor) and logically turn
the state of all virtio-mem memory inside the device-assigned memory
region to "unplugged" - just as during a cold boot. The semantics of
"unplugged" blocks depend on the "usable region" (see the virtio-spec if
you're curious - the memory might still be accessible). Starting "fresh"
with all memory logically unplugged is part of the way virtio-mem works.

* there are corner cases while a VM is getting migrated, where we cannot
perform this (similar, to us not being able to clear ordinary memory
during a load clear in QEMU while migrating). In this case, the memory
is left untouched.

> - IPL Normal? "Obviously" it must stay otherwise kdump would never see
>   that memory.

Only diag308(0x3) will mess with virtio-mem memory. For the other types
of resets, its left untouched. So yes, "obviously" is correct :)

> And when you write it's up to the device driver what to with that
> memory: is there any documentation available what all of this is good
> for? I would assume _most likely_ this extra memory is going to be
> added to ZONE_MOVABLE _somehow_ so that it can be taken away also. But
> since it is not normal memory, like you say, I'm wondering how that is
> supposed to work.

For now

1. virtio-mem adds all (possible) aligned memory via add_memory() to Linux
2. Requires user space to online the memory blocks / configure a zone.

For 2., only ZONE_NORMAL really works right now and is recommended to
use. As you correctly note, that does not give you any guarantees how
much memory you can unplug again (e.g, fragmentation with unmovable
data), but is good enough for the first version (with focus on memory
hotplug, not unplug). ZONE_MOVABLE support is in the works.

However, we cannot blindly expose all memory to ZONE_MOVABLE (zone
imbalances leading to rashes), and sometimes also don't want to (e.g.,
gigantic pages). Without spoilering too much, a mixture would be nice.

> As far as I can tell there would be a lot of inconsistencies in
> userspace interfaces which provide memory / zone information. Or I'm
> not getting the point of all of this at all.

All memory/zone stats are properly fixed up (similar to ballooning). The
only visible inconsistency that *might* happen when unplugging memory /
hotplugging memory in <256MB on s390x, is that the number of memory
block devices (/sys/devices/system/memory/...) might indicate more
memory than actually available (e.g., via lsmem).

[2] https://lore.kernel.org/kvm/20200311171422.10484-1-david@redhat.com/
[4] https://www.youtube.com/watch?v=H65FDUDPu9s


David / dhildenb

reply via email to

[Prev in Thread] Current Thread [Next in Thread]