[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v2] configure: enable --s390-pgste linker option
From: |
Christian Borntraeger |
Subject: |
Re: [Qemu-devel] [PATCH v2] configure: enable --s390-pgste linker option |
Date: |
Wed, 23 Aug 2017 10:48:09 +0200 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.0 |
On 08/23/2017 09:28 AM, Christian Ehrhardt wrote:
>
>
> On Wed, Aug 23, 2017 at 8:53 AM, Christian Borntraeger <address@hidden
> <mailto:address@hidden>> wrote:
>
> KVM guests on s390 need a different page table layout than normal
> processes (2kb page table + 2kb page status extensions vs 2kb page table
> only). As of today this has to be enabled via the vm.allocate_pgste
> sysctl.
>
> Newer kernels (>= 4.12) on s390 check for an S390_PGSTE program header
> and enable the pgste page table extensions in that case. This makes the
> vm.allocate_pgste sysctl unnecessary. We enable this program header for
> the s390 system emulation (qemu-system-s390x) if we build on s390
> - for s390 system emulation
> - the linker supports --s390-pgste (binutils >= 2.29)
> - KVM is enabled
>
> This will allow distributions to disable the global vm.allocate_pgste
> sysctl, which will improve the page table allocation for non KVM
> processes as only 2kb chunks are necessary.
>
>
> Hi Christian,
> it is great to see context pgste come to life.
> Currently vm.allocate_pgste defaults to 0 in the kernel but as you stated
> mostly enabled for KVM support in Distros.
> So when someone wants to disable it he has to drop the enabling (e.g.
> /etc/sysctl.d/10-arch-specific.conf for us).
>
> I want to be sure on the proper phasing of this - we can drop the "enabling"
> of global pgste once for a release we:
> - do not expect/support a kernel <4.12 to run there
> - will have only qemu versions >= the one carrying this change (and have it
> properly enabled)
> - binutils >= 2.29 to get the linking right
Yes. So I guess that for the Ubuntu case you could remove the sysctl thing for
18.04 assuming that
this will hit qemu 2.11 and 18.04 will use 2.11.
>
> But furthermore if we have a qemu with this enabled, there is no drawback and
> we could still run it in:
> - former releases with older kernels
Yes.
> - former releases with older build environments
Yes.
> That program header would just be ignored and we just would have to keep the
> sysctl enabled there right?
Yes.
>
> Also for the time we want to check on the proper header, you surely have a
> one liner you can share that you run against the binary to check if it was
> generated correctly?
> Maybe even one that you can run against a pid if the status is correct?
readelf -l on the binary
$ readelf -l REPOS/qemu/build/s390x-softmmu/qemu-system-s390x
Elf file type is EXEC (Executable file)
Entry point 0x101f758
There are 11 program headers, starting at offset 64
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
PHDR 0x0000000000000040 0x0000000001000040 0x0000000001000040
0x0000000000000268 0x0000000000000268 R E 0x8
INTERP 0x00000000000002a8 0x00000000010002a8 0x00000000010002a8
0x000000000000000f 0x000000000000000f R 0x1
[Requesting program interpreter: /lib/ld64.so.1]
LOAD 0x0000000000000000 0x0000000001000000 0x0000000001000000
0x00000000004852f0 0x00000000004852f0 R E 0x1000
LOAD 0x0000000000485450 0x0000000001486450 0x0000000001486450
0x000000000003dcc8 0x0000000000485840 RW 0x1000
DYNAMIC 0x0000000000485b80 0x0000000001486b80 0x0000000001486b80
0x0000000000000480 0x0000000000000480 RW 0x8
NOTE 0x00000000000002b8 0x00000000010002b8 0x00000000010002b8
0x0000000000000044 0x0000000000000044 R 0x4
TLS 0x0000000000485450 0x0000000001486450 0x0000000001486450
0x0000000000000000 0x0000000000000230 R 0x8
GNU_EH_FRAME 0x00000000003dc638 0x00000000013dc638 0x00000000013dc638
0x0000000000017a74 0x0000000000017a74 R 0x4
GNU_STACK 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 RW 0x10
GNU_RELRO 0x0000000000485450 0x0000000001486450 0x0000000001486450
0x0000000000000bb0 0x0000000000000bb0 R 0x1
S390_PGSTE 0x0000000000000000 0x0000000000000000 0x0000000000000000
<----
0x0000000000000000 0x0000000000000000 0x8
<----
[...]
Older binutils will report something like
LOPROC+0 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 8
instead of S390_PGSTE.