qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] linux-user: Reserve space for brk


From: Laurent Vivier
Subject: Re: [PATCH] linux-user: Reserve space for brk
Date: Wed, 22 Jan 2020 15:01:14 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.3.1

Le 18/01/2020 à 00:02, Richard Henderson a écrit :
> With bad luck, we can wind up with no space at all for brk,
> which will generally cause the guest malloc to fail.
> 
> This bad luck is easier to come by with ET_DYN (PIE) binaries,
> where either the stack or the interpreter (ld.so) gets placed
> immediately after the main executable.
> 
> But there's nothing preventing this same thing from happening
> with ET_EXEC (normal) binaries, during probe_guest_base().
> 
> In both cases, reserve some extra space via mmap and release
> it back to the system after loading the interpreter and
> allocating the stack.
> 
> The choice of 16MB is somewhat arbitrary.  It's enough for libc
> to get going, but without being so large that 32-bit guests or
> 32-bit hosts are in danger of running out of virtual address space.
> It is expected that libc will be able to fall back to mmap arenas
> after the limited brk space is exhausted.
> 
> Launchpad: https://bugs.launchpad.net/qemu/+bug/1749393
> Signed-off-by: Richard Henderson <address@hidden>
> ---
> 
> Note that the LP comments mention the fix for this in the kernel,
> and about there being a "guaranteed 128MB gap" for x86_64.  As far
> as I can see, this "gap" is part of the unmapped_area() algorithm.
> For qemu, this would correspond to mmap_find_vma(), except that,
> except when we fall back to mmap_find_vma_reserved(), we are not
> 100% in control over the allocation.
> 
> 
> r~
> 
> ---
>  linux-user/qemu.h    |  1 +
>  linux-user/elfload.c | 73 +++++++++++++++++++++++++++++++++-----------
>  2 files changed, 57 insertions(+), 17 deletions(-)
> 
> diff --git a/linux-user/qemu.h b/linux-user/qemu.h
> index f6f5fe5fbb..560a68090e 100644
> --- a/linux-user/qemu.h
> +++ b/linux-user/qemu.h
> @@ -35,6 +35,7 @@ struct image_info {
>          abi_ulong       end_data;
>          abi_ulong       start_brk;
>          abi_ulong       brk;
> +        abi_ulong       reserve_brk;
>          abi_ulong       start_mmap;
>          abi_ulong       start_stack;
>          abi_ulong       stack_limit;
> diff --git a/linux-user/elfload.c b/linux-user/elfload.c
> index 07b16cc0f4..2edb5d5b31 100644
> --- a/linux-user/elfload.c
> +++ b/linux-user/elfload.c
> @@ -10,6 +10,7 @@
>  #include "qemu/path.h"
>  #include "qemu/queue.h"
>  #include "qemu/guest-random.h"
> +#include "qemu/units.h"
>  
>  #ifdef _ARCH_PPC64
>  #undef ARCH_DLINFO
> @@ -2364,24 +2365,51 @@ static void load_elf_image(const char *image_name, 
> int image_fd,
>          }
>      }
>  
> -    load_addr = loaddr;
> -    if (ehdr->e_type == ET_DYN) {
> -        /* The image indicates that it can be loaded anywhere.  Find a
> -           location that can hold the memory space required.  If the
> -           image is pre-linked, LOADDR will be non-zero.  Since we do
> -           not supply MAP_FIXED here we'll use that address if and
> -           only if it remains available.  */
> -        load_addr = target_mmap(loaddr, hiaddr - loaddr, PROT_NONE,
> -                                MAP_PRIVATE | MAP_ANON | MAP_NORESERVE,
> -                                -1, 0);
> -        if (load_addr == -1) {
> -            goto exit_perror;
> +    if (pinterp_name != NULL) {
> +        /*
> +         * This is the main executable.
> +         *
> +         * Reserve extra space for brk.
> +         * We hold on to this space while placing the interpreter
> +         * and the stack, lest they be placed immediately after
> +         * the data segment and block allocation from the brk.
> +         *
> +         * 16MB is chosen as "large enough" without being so large
> +         * as to allow the result to not fit with a 32-bit guest on
> +         * a 32-bit host.
> +         */
> +        info->reserve_brk = 16 * MiB;
> +        hiaddr += info->reserve_brk;
> +
> +        if (ehdr->e_type == ET_EXEC) {
> +            /*
> +             * Make sure that the low address does not conflict with
> +             * MMAP_MIN_ADDR or the QEMU application itself.
> +             */
> +            probe_guest_base(image_name, loaddr, hiaddr);
>          }
> -    } else if (pinterp_name != NULL) {
> -        /* This is the main executable.  Make sure that the low
> -           address does not conflict with MMAP_MIN_ADDR or the
> -           QEMU application itself.  */
> -        probe_guest_base(image_name, loaddr, hiaddr);
> +    }
> +
> +    /*
> +     * Reserve address space for all of this.
> +     *
> +     * In the case of ET_EXEC, we supply MAP_FIXED so that we get
> +     * exactly the address range that is required.
> +     *
> +     * Otherwise this is ET_DYN, and we are searching for a location
> +     * that can hold the memory space required.  If the image is
> +     * pre-linked, LOADDR will be non-zero, and the kernel should
> +     * honor that address if it happens to be free.
> +     *
> +     * In both cases, we will overwrite pages in this range with mappings
> +     * from the executable.
> +     */
> +    load_addr = target_mmap(loaddr, hiaddr - loaddr, PROT_NONE,
> +                            MAP_PRIVATE | MAP_ANON | MAP_NORESERVE |
> +                            (ehdr->e_type == ET_EXEC ? MAP_FIXED : 0),
> +                            -1, 0);
> +    if (load_addr == -1) {
> +        goto exit_perror;
>      }
>      load_bias = load_addr - loaddr;
>  
> @@ -2860,6 +2888,17 @@ int load_elf_binary(struct linux_binprm *bprm, struct 
> image_info *info)
>      bprm->core_dump = &elf_core_dump;
>  #endif
>  
> +    /*
> +     * If we reserved extra space for brk, release it now.
> +     * The implementation of do_brk in syscalls.c expects to be able
> +     * to mmap pages in this space.
> +     */
> +    if (info->reserve_brk) {
> +        abi_ulong start_brk = HOST_PAGE_ALIGN(info->brk);
> +        abi_ulong end_brk = HOST_PAGE_ALIGN(info->brk + info->reserve_brk);
> +        target_munmap(start_brk, end_brk - start_brk);
> +    }
> +
>      return 0;
>  }
>  
> 

Applied to my linux-user branch.

Thanks,
Laurent



reply via email to

[Prev in Thread] Current Thread [Next in Thread]