[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

aarch64-gnu (and Happy New Year!)

From: Sergey Bugaev
Subject: aarch64-gnu (and Happy New Year!)
Date: Sun, 31 Dec 2023 22:53:26 +0300

Hello, and happy holidays!

Every now and then, I hear someone mention potential ports of gnumach
to new architectures. I think I have heard RISC-V and (64-bit?) ARM
mentioned somewhere recently as potential new port targets. Being
involved in the x86_64 port last spring was a really fun and
interesting experience, and I learned a lot; so I, for one, have
always thought doing more ports would be a great idea, and that I
would be glad to be a part of such an effort again.

Among the architectures, AArch64 and RISC-V indeed seem most
attractive (not that I know much about either). Among those two,
RISC-V is certainly newer and more exciting, but Aarch64 is certainly
more widespread and established. (Wouldn't it be super cool if we
could run GNU/Hurd everywhere from tiny ARM boards, to Raspberry Pi's,
to common smartphones, to, now, ARM-based laptops desktops?) Also I
have had some experience with ARM in the past, so I knew a tiny bit of
ARM assembly.

So I thought, what would it take to port the Hurd to AArch64, a
completely non-x86 architecture, one that I knew very little about?
There is no AArch64 gnumach (that I know of) yet, but I could try to
hack on glibc even without one, I'd only need some headers, right?
There's also no compiler toolchain, but those patches to add the
x86_64-gnu target looked pretty understandable, so — how hard could it

Well, I did more than think about it :)

I read up on AArch64 registers / assembly / architecture / calling
convention, added the aarch64-gnu target to binutils and GCC, added
basic versions of mach/aarch64/ headers to gnumach (but no actual
code), and made a mostly complete port of glibc. I haven't spent much
effort on Hurd proper, but I have tried running the build, and the
core Hurd servers (ext2fs, proc, exec, auth) do get built.

I will be posting the patches soon. For now, here's just a little teaser:

glibc/build $ file libc.so elf/ld.so
libc.so: ELF 64-bit LSB shared object, ARM aarch64, version 1
(GNU/Linux), dynamically linked, interpreter /lib/ld-aarch64.so.1, for
GNU/Hurd 0.0.0, with debug_info, not stripped
elf/ld.so: ELF 64-bit LSB shared object, ARM aarch64, version 1
(SYSV), dynamically linked, with debug_info, not stripped

hurd/build $ file ext2fs/ext2fs.static proc/proc
ext2fs/ext2fs.static: ELF 64-bit LSB executable, ARM aarch64, version
1 (GNU/Linux), statically linked, for GNU/Hurd 0.0.0, with debug_info,
not stripped
proc/proc: ELF 64-bit LSB executable, ARM aarch64, version 1 (SYSV),
dynamically linked, interpreter /lib/ld-aarch64.so.1, for GNU/Hurd
0.0.0, with debug_info, not stripped

glibc/build $ aarch64-gnu-objdump --disassemble=__mig_get_reply_port libc.so
libc.so:     file format elf64-littleaarch64
Disassembly of section .plt:
Disassembly of section .text:
000000000002b8e0 <__mig_get_reply_port>:
   2b8e0: a9be7bfd stp x29, x30, [sp, #-32]!
   2b8e4: 910003fd mov x29, sp
   2b8e8: f9000bf3 str x19, [sp, #16]
   2b8ec: d53bd053 mrs x19, tpidr_el0
   2b8f0: b85f8260 ldur w0, [x19, #-8]
   2b8f4: 34000080 cbz w0, 2b904 <__mig_get_reply_port+0x24>
   2b8f8: f9400bf3 ldr x19, [sp, #16]
   2b8fc: a8c27bfd ldp x29, x30, [sp], #32
   2b900: d65f03c0 ret
   2b904: 97fffbef bl 2a8c0 <__mach_reply_port>
   2b908: b81f8260 stur w0, [x19, #-8]
   2b90c: f9400bf3 ldr x19, [sp, #16]
   2b910: a8c27bfd ldp x29, x30, [sp], #32
   2b914: d65f03c0 ret

So it compiles and links, but does it work? — well, we can't know
that, not until someone ports gnumach, right?

Well actually we can :) I've done the same thing as last time, when
working on the x86_64 port: run a statically linked hello world
executable on Linux, under GDB, carefully skipping over and emulating
syscalls and RPCs. This did uncover a number of bugs, both in my port
of glibc and in how the toolchain was set up (the first issue was that
static-init.S was not even getting linked in, the second issue was
that static-init.S was crashing even prior to the _hurd_stack_setup
call, and so on). But, I fixed all of those, and got the test
executable working! — as in, successfully running all the glibc
initialization (no small feat; this includes TLS setup, hwcaps /
cpu-features, and ifuncs), reaching main (), successfully doing puts
(), and shutting down. So it totally works, and is only missing an
AArch64 gnumach to run on.

The really unexpected part is how easy this actually was: it took me
like 3 days from "ok, guess I'm doing this, let's add a new target to
binutils and gcc" to glibc building successfully, and a couple more
days to get hello world to work (single-stepping under GDB is just
that time-consuming). Either I'm getting good at this..., or (perhaps
more realistically) maybe it was just easy all along, and it was my
inexperience with glibc internals that slowed me down the last time.
Also, we have worked out a lot of 64-bit issues with the x86_64 port,
so this is something I didn't have to deal with this time.

Now to some of the more technical things:

* The TLS implementation is basically complete and working. We're using
  tpidr_el0 for the thread pointer (as can be seen in the listing above),
  like GNU/Linux and unlike Windows (which uses x18, apparently) and
  macOS (which uses tpidrro_el0). We're using "Variant I" layout, as
  described in "ELF Handling for Thread-Local Storage", again same as
  GNU/Linux, and unlike what we do on both x86 targets. This actually
  ends up being simpler than what we had for x86! The other cool thing is
  that we can do "msr tpidr_el0, x0" from userspace without any gnumach
  involvement, so that part of the implementation is quite a bit simpler

* Conversely, while on x86 it is possible to perform "cpuid" and identify
  CPU features entirely in user space, on AArch64 this requires access
  to some EL1-only registers. On Linux and the BSDs, the kernel exposes
  info about the CPU features via AT_HWCAP (and more recently, AT_HWCAP2)
  auxval entries. Moreover, Linux allows userland to read some otherwise
  EL1-only registers (notably for us, midr_el1) by catching the trap that
  results from the EL0 code trying to do that, and emulating its effect.
  Also, Linux exposes midr_el1 and revidr_el1 values through procfs.

  The Hurd does not use auxval, nor is gnumach involved in execve anyway.
  So I thought the natural way to expose this info would be with an RPC,
  and so in mach_aarch64.defs I have an aarch64_get_hwcaps routine that
  returns the two hwcaps values (using the same bits as AT_HWCAP{,2}) and
  the values of midr_el1/revidr_el1. This is hooked to init_cpu_features
  in glibc, and used to initialize GLRO(dl_hwcap) / GLRO(dl_hwcap2) and
  eventually to pick the appropriate ifunc implementations.

* The page size (or rather, paging granularity) is notoriously not
  necessarily 4096 on ARM, and the best practice is for userland not to
  assume any specific page size and always query it dynamically. GNU Mach
  will (probably) have to be built support for some specific page size,
  but I've cleaned up a few places in glibc where things were relying on
  a statically defined page size.

* There are a number of hardware hardening features available on AArch64
  (PAC, BTI, MTE — why do people keep adding more and more workarounds,
  including hardware ones, instead of rewriting software in a properly
  memory-safe language...). Those are not really supported right now; all
  of them would require some support form gnumach side; we'll probably
  need new protection flags (VM_PROT_BTI, VM_PROT_MTE), for one thing.

  We would need to come up with a design for how we want these to work
  Hurd-wide. For example I imagine it's the userland that will be
  generating PAC keys (and settings them for a newly exec'ed task), since
  gnumach does not contain the functionality to generate random values
  (nor should it); but this leaves open question of what should happen to
  early bootstrap tasks and whether they can start using PAC after
  initial startup.

* Unlike on x86, I believe it is not possible to fully restore execution
  context (the values of all registers, including pc and cpsr) purely in
  userland; one of the reasons for that being that we can apparently no
  longer do a load from memory straight into pc, like it was possible in
  previous ARM revisions. So the way sigreturn () works on Linux is of
  course they have it as a syscall that takes a struct sigcontext, and
  writes it over the saved thread state. Sounds familiar to you? — of
  course, that's almost exactly like thread_set_state () in Mach-speak.
  The difference being that thread_set_state () explicitly disallows you
  to set the calling thread's state, which makes it impossible to use for
  implementing sigreturn (). So I'm thinking we should lift that
  restriction; there's no reason why thread_set_state () cannot be made
  to work on the calling thread; it only requires some careful coding to
  make sure the return register (%eax/%rax/x0) is *not* rewritten with
  mach_msg_trap's return code, unlike normally.

  But other than that, I do have AArch64 versions of trampoline.c and
  intr-msg.h (complete with SYSCALL_EXAMINE & MSG_EXAMINE). Whether they
  work, we'll only learn once we have enough of the Hurd running to have
  the proc server.

Anyways, enjoy! As said, I will be posting the patches some time soon.
I of course don't expect to get any reviews during the holidays. And —
any volunteers for a gnumach port? :)


P.S. Believe it or not, this is not the announcement that I was going
to make at Joshua's Christmas party; I only started hacking on this
later, after that email exchange. That other thing is still to be
announced :)

reply via email to

[Prev in Thread] Current Thread [Next in Thread]