[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[VERY RFC PATCH 2/2] hurd: Make it possible to call memcpy very early

From: Sergey Bugaev
Subject: [VERY RFC PATCH 2/2] hurd: Make it possible to call memcpy very early
Date: Thu, 20 Apr 2023 21:42:20 +0300

Normally, in static builds, the first code that runs is _start, in e.g.
sysdeps/x86_64/start.S, which quickly calls __libc_start_main, passing
it the argv etc. Among the first things __libc_start_main does is
initializing the tunables (based on env), then CPU features, and then
calls _dl_relocate_static_pie (). Specifically, this runs ifunc
resolvers to pick, based on the CPU features discovered earlier, the
most suitable implementation of "string" functions such as memcpy.

Before that point, calling memcpy (or other ifunc-resolved functions)
will not work.

In the Hurd port, things are more complex. In order to get argv/env for
our process, glibc normally needs to do an RPC to the exec server,
unless our args/env are already located on the stack (which is what
happens to bootstrap processes spawned by GNU Mach). Fetching our
argv/env from the exec server has to be done before the call to
__libc_start_main, since we need to know what our argv/env are to pass
them to __libc_start_main.

On the other hand, the implementation of the RPC (and other initial
setup needed on the Hurd before __libc_start_main can be run) is not
very trivial. In particular, it may (and on x86_64, will) use memcpy.
But as described above, calling memcpy before __libc_start_main can not
work, since the GOT entry for it is not yet initialized at that point.

Work around this by pre-filling the GOT entry with the baseline version
of memcpy, __memcpy_sse2_unaligned. This makes it possible for early
calls to memcpy to just work. Once _dl_relocate_static_pie () is called,
the baseline version will get replaced with the most suitable one, and
that's what subsequent calls of memcpy are going to call.

Also, apply the same treatment to __stpncpy, which can also be used by
the RPCs (see mig_strncpy.c), and is an ifunc-resolved function on both
x86_64 and i386.

Tested on x86_64-gnu (!).

Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>

Please tell me:

* if the approach is at all sane
* if there's a better way to do this without hardcoding
* are the GOT entries for indirect functions supposed to be statically
  initialized to anything (in the binary)? if yes, why? if not, why is
* should there be a !PIC version as well? does the GOT exist under
  !PIC (to access indirect functions), and if it does then how do I
  access it? it would seem gcc just generates a direct $function even
  for indirect functions in this case.

 sysdeps/mach/hurd/i386/static-start.S   | 7 +++++++
 sysdeps/mach/hurd/x86_64/static-start.S | 8 ++++++++
 2 files changed, 15 insertions(+)

diff --git a/sysdeps/mach/hurd/i386/static-start.S 
index c5d12645..1b1ae559 100644
--- a/sysdeps/mach/hurd/i386/static-start.S
+++ b/sysdeps/mach/hurd/i386/static-start.S
@@ -19,6 +19,13 @@
        .globl _start
+#ifdef PIC
+       call __x86.get_pc_thunk.bx
+       addl $_GLOBAL_OFFSET_TABLE_, %ebx
+       leal __stpncpy_ia32@GOTOFF(%ebx), %eax
+       movl %eax, __stpncpy@GOT(%ebx)
        call _hurd_stack_setup
        xorl %edx, %edx
        jmp _start1
diff --git a/sysdeps/mach/hurd/x86_64/static-start.S 
index 982d3d52..81b3c0ac 100644
--- a/sysdeps/mach/hurd/x86_64/static-start.S
+++ b/sysdeps/mach/hurd/x86_64/static-start.S
@@ -19,6 +19,14 @@
        .globl _start
+#ifdef PIC
+       leaq __memcpy_sse2_unaligned(%rip), %rax
+       movq %rax, memcpy@GOTPCREL(%rip)
+       leaq __stpncpy_sse2_unaligned(%rip), %rax
+       movq %rax, __stpncpy@GOTPCREL(%rip)
        call _hurd_stack_setup
        xorq %rdx, %rdx
        jmp _start1

reply via email to

[Prev in Thread] Current Thread [Next in Thread]