qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v5 04/13] mm/shmem: Restrict MFD_INACCESSIBLE memory against


From: David Hildenbrand
Subject: Re: [PATCH v5 04/13] mm/shmem: Restrict MFD_INACCESSIBLE memory against RLIMIT_MEMLOCK
Date: Wed, 13 Apr 2022 18:30:47 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.6.2

> 
> So this is another situation where the actual backend (TDX, SEV, pKVM, pure 
> software) makes a difference -- depending on exactly what backend we're 
> using, the memory may not be unmoveable.  It might even be swappable (in the 
> potentially distant future).

Right. And on a system without swap we don't particularly care about
mlock, but we might (in most cases) care about fragmentation with
unmovable memory.

> 
> Anyway, here's a concrete proposal, with a bit of handwaving:

Thanks for investing some brainpower.

> 
> We add new cgroup limits:
> 
> memory.unmoveable
> memory.locked
> 
> These can be set to an actual number or they can be set to the special value 
> ROOT_CAP.  If they're set to ROOT_CAP, then anyone in the cgroup with 
> capable(CAP_SYS_RESOURCE) (i.e. the global capability) can allocate movable 
> or locked memory with this (and potentially other) new APIs.  If it's 0, then 
> they can't.  If it's another value, then the memory can be allocated, charged 
> to the cgroup, up to the limit, with no particular capability needed.  The 
> default at boot is ROOT_CAP.  Anyone who wants to configure it differently is 
> free to do so.  This avoids introducing a DoS, makes it easy to run tests 
> without configuring cgroup, and lets serious users set up their cgroups.

I wonder what the implications are for existing user space.

Assume we want to move page pinning (rdma, vfio, io_uring, ...) to the
new model. How can we be sure

a) We don't break existing user space
b) We don't open the doors unnoticed for the admin to go crazy on
   unmovable memory.

Any ideas?

> 
> Nothing is charge per mm.
> 
> To make this fully sensible, we need to know what the backend is for the 
> private memory before allocating any so that we can charge it accordingly.

Right, the support for migration and/or swap defines how to account.

-- 
Thanks,

David / dhildenb




reply via email to

[Prev in Thread] Current Thread [Next in Thread]