qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] softmmu/physmem: Improve guest memory allocation failure err


From: David Hildenbrand
Subject: Re: [PATCH] softmmu/physmem: Improve guest memory allocation failure error message
Date: Tue, 24 Aug 2021 10:53:56 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0

On 24.08.21 10:37, Dr. David Alan Gilbert wrote:
* David Hildenbrand (david@redhat.com) wrote:
On 23.08.21 12:34, Philippe Mathieu-Daudé wrote:
On 8/23/21 12:24 PM, David Hildenbrand wrote:
On 23.08.21 12:12, Philippe Mathieu-Daudé wrote:
On 8/23/21 11:29 AM, David Hildenbrand wrote:
On 23.08.21 11:23, Peter Maydell wrote:
On Mon, 23 Aug 2021 at 09:40, David Hildenbrand <david@redhat.com>
wrote:
Not opposed to printing the size, although I doubt that it will really
stop similar questions/problems getting raised.

The case that triggered this was somebody thinking
-m took a byte count, so very likely that an error message
saying "you tried to allocate 38TB" would have made their
mistake clear in a way that just "allocation failed" did not.
It also means that if a future user asks us for help then
we can look at the error message and immediately tell them
the problem, rather than going "hmm, what are all the possible
ways that allocation might have failed" and going off down
rabbitholes like VM overcommit settings...

We've had similar issues recently where Linux memory overcommit handling
rejected the allocation -- and the user was well aware about the actual
size. You won't be able to catch such reports, because people don't
understand how Linux memory overcommit handling works or was configured.

"I have 3 GiB of free memory, why can't I create a 3 GiB VM". "I have 3
GiB of RAM, why can't I create a 3 GiB VM even if it won't make use of
all 3 GiB of memory".

Thus my comment, it will only stop very basic usage issues. And I agree
that looking at the error *might* help. It didn't help for the cases I
just described, because we need much more system information to make a
guess what the user error actually is.

Is it possible to get the maximal overcommitable amount on Linux?

Not reliably I think.

In the "always" mode, there is none.

In the "guess"/"estimate" mode, the kernel takes a guess (currently
implemented as checking if the mmap size <= total RAM + total SWAP).
      Committable = MemTotal + SwapTotal

In the "never" mode:
      Committable = CommitLimit - Committed_AS
However, the value gets further reduced for !root applications by
/proc/sys/vm/admin_reserve_kbytes.

Replicating these calculations in user space would be suboptimal IMHO.

What about simply giving a hint about memory overcommit and display
a link to documentation with longer description about how to check
and figure out this issue?

That would be highly OS-specific -- for example, there is no memory
overcommit under Windows. Sure, we could add a Linux specific hint,
indication documentation. But I'm not sure if most end users stumbling into
such an error+hint would be able to make sense of memory overcommit details
(not to mention that they know what it even is) :)

You can run into memory allocation issues with many applications. Let me
give you a simple example

t480s: ~  $ dd if=/dev/zero of=/dev/null ibs=100G
dd: memory exhausted by input buffer of size 107374182400 bytes (100 GiB)

So indicating the size of the failing allocation might be just good enough.
For the other parts it's usually just "the way the OS was configured, it
does not think it can allow this allocation".

Does it also get complicated by the use of CGroup?

Not in terms of memory overcommit AFAIU. cgroups only control actually memory consumption, not mmap() creation.


Dave


--
Thanks,

David / dhildenb




reply via email to

[Prev in Thread] Current Thread [Next in Thread]