[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: aarch64-gnu (and Happy New Year!)

From: Luca
Subject: Re: aarch64-gnu (and Happy New Year!)
Date: Mon, 1 Jan 2024 14:02:20 +0100

Hi Sergey,

Il 31/12/23 20:53, Sergey Bugaev ha scritto:
Hello, and happy holidays!

Every now and then, I hear someone mention potential ports of gnumach
to new architectures. I think I have heard RISC-V and (64-bit?) ARM
mentioned somewhere recently as potential new port targets. Being
involved in the x86_64 port last spring was a really fun and
interesting experience, and I learned a lot; so I, for one, have
always thought doing more ports would be a great idea, and that I
would be glad to be a part of such an effort again.

Among the architectures, AArch64 and RISC-V indeed seem most
attractive (not that I know much about either). Among those two,
RISC-V is certainly newer and more exciting, but Aarch64 is certainly
more widespread and established. (Wouldn't it be super cool if we
could run GNU/Hurd everywhere from tiny ARM boards, to Raspberry Pi's,
to common smartphones, to, now, ARM-based laptops desktops?) Also I
have had some experience with ARM in the past, so I knew a tiny bit of
ARM assembly.

So I thought, what would it take to port the Hurd to AArch64, a
completely non-x86 architecture, one that I knew very little about?
There is no AArch64 gnumach (that I know of) yet, but I could try to
hack on glibc even without one, I'd only need some headers, right?
There's also no compiler toolchain, but those patches to add the
x86_64-gnu target looked pretty understandable, so — how hard could it

Well, I did more than think about it :)

I read up on AArch64 registers / assembly / architecture / calling
convention, added the aarch64-gnu target to binutils and GCC, added
basic versions of mach/aarch64/ headers to gnumach (but no actual
code), and made a mostly complete port of glibc. I haven't spent much
effort on Hurd proper, but I have tried running the build, and the
core Hurd servers (ext2fs, proc, exec, auth) do get built.

I will be posting the patches soon. For now, here's just a little teaser:

glibc/build $ file libc.so elf/ld.so
libc.so: ELF 64-bit LSB shared object, ARM aarch64, version 1
(GNU/Linux), dynamically linked, interpreter /lib/ld-aarch64.so.1, for
GNU/Hurd 0.0.0, with debug_info, not stripped
elf/ld.so: ELF 64-bit LSB shared object, ARM aarch64, version 1
(SYSV), dynamically linked, with debug_info, not stripped

hurd/build $ file ext2fs/ext2fs.static proc/proc
ext2fs/ext2fs.static: ELF 64-bit LSB executable, ARM aarch64, version
1 (GNU/Linux), statically linked, for GNU/Hurd 0.0.0, with debug_info,
not stripped
proc/proc: ELF 64-bit LSB executable, ARM aarch64, version 1 (SYSV),
dynamically linked, interpreter /lib/ld-aarch64.so.1, for GNU/Hurd
0.0.0, with debug_info, not stripped

glibc/build $ aarch64-gnu-objdump --disassemble=__mig_get_reply_port libc.so
libc.so:     file format elf64-littleaarch64
Disassembly of section .plt:
Disassembly of section .text:
000000000002b8e0 <__mig_get_reply_port>:
    2b8e0: a9be7bfd stp x29, x30, [sp, #-32]!
    2b8e4: 910003fd mov x29, sp
    2b8e8: f9000bf3 str x19, [sp, #16]
    2b8ec: d53bd053 mrs x19, tpidr_el0
    2b8f0: b85f8260 ldur w0, [x19, #-8]
    2b8f4: 34000080 cbz w0, 2b904 <__mig_get_reply_port+0x24>
    2b8f8: f9400bf3 ldr x19, [sp, #16]
    2b8fc: a8c27bfd ldp x29, x30, [sp], #32
    2b900: d65f03c0 ret
    2b904: 97fffbef bl 2a8c0 <__mach_reply_port>
    2b908: b81f8260 stur w0, [x19, #-8]
    2b90c: f9400bf3 ldr x19, [sp, #16]
    2b910: a8c27bfd ldp x29, x30, [sp], #32
    2b914: d65f03c0 ret

So it compiles and links, but does it work? — well, we can't know
that, not until someone ports gnumach, right?

Well actually we can :) I've done the same thing as last time, when
working on the x86_64 port: run a statically linked hello world
executable on Linux, under GDB, carefully skipping over and emulating
syscalls and RPCs. This did uncover a number of bugs, both in my port
of glibc and in how the toolchain was set up (the first issue was that
static-init.S was not even getting linked in, the second issue was
that static-init.S was crashing even prior to the _hurd_stack_setup
call, and so on). But, I fixed all of those, and got the test
executable working! — as in, successfully running all the glibc
initialization (no small feat; this includes TLS setup, hwcaps /
cpu-features, and ifuncs), reaching main (), successfully doing puts
(), and shutting down. So it totally works, and is only missing an
AArch64 gnumach to run on.

The really unexpected part is how easy this actually was: it took me
like 3 days from "ok, guess I'm doing this, let's add a new target to
binutils and gcc" to glibc building successfully, and a couple more
days to get hello world to work (single-stepping under GDB is just
that time-consuming). Either I'm getting good at this..., or (perhaps
more realistically) maybe it was just easy all along, and it was my
inexperience with glibc internals that slowed me down the last time.
Also, we have worked out a lot of 64-bit issues with the x86_64 port,
so this is something I didn't have to deal with this time.

Now to some of the more technical things:

* The TLS implementation is basically complete and working. We're using
   tpidr_el0 for the thread pointer (as can be seen in the listing above),
   like GNU/Linux and unlike Windows (which uses x18, apparently) and
   macOS (which uses tpidrro_el0). We're using "Variant I" layout, as
   described in "ELF Handling for Thread-Local Storage", again same as
   GNU/Linux, and unlike what we do on both x86 targets. This actually
   ends up being simpler than what we had for x86! The other cool thing is
   that we can do "msr tpidr_el0, x0" from userspace without any gnumach
   involvement, so that part of the implementation is quite a bit simpler

* Conversely, while on x86 it is possible to perform "cpuid" and identify
   CPU features entirely in user space, on AArch64 this requires access
   to some EL1-only registers. On Linux and the BSDs, the kernel exposes
   info about the CPU features via AT_HWCAP (and more recently, AT_HWCAP2)
   auxval entries. Moreover, Linux allows userland to read some otherwise
   EL1-only registers (notably for us, midr_el1) by catching the trap that
   results from the EL0 code trying to do that, and emulating its effect.
   Also, Linux exposes midr_el1 and revidr_el1 values through procfs.

   The Hurd does not use auxval, nor is gnumach involved in execve anyway.
   So I thought the natural way to expose this info would be with an RPC,
   and so in mach_aarch64.defs I have an aarch64_get_hwcaps routine that
   returns the two hwcaps values (using the same bits as AT_HWCAP{,2}) and
   the values of midr_el1/revidr_el1. This is hooked to init_cpu_features
   in glibc, and used to initialize GLRO(dl_hwcap) / GLRO(dl_hwcap2) and
   eventually to pick the appropriate ifunc implementations.

* The page size (or rather, paging granularity) is notoriously not
   necessarily 4096 on ARM, and the best practice is for userland not to
   assume any specific page size and always query it dynamically. GNU Mach
   will (probably) have to be built support for some specific page size,
   but I've cleaned up a few places in glibc where things were relying on
   a statically defined page size.

* There are a number of hardware hardening features available on AArch64
   (PAC, BTI, MTE — why do people keep adding more and more workarounds,
   including hardware ones, instead of rewriting software in a properly
   memory-safe language...). Those are not really supported right now; all
   of them would require some support form gnumach side; we'll probably
   need new protection flags (VM_PROT_BTI, VM_PROT_MTE), for one thing.

   We would need to come up with a design for how we want these to work
   Hurd-wide. For example I imagine it's the userland that will be
   generating PAC keys (and settings them for a newly exec'ed task), since
   gnumach does not contain the functionality to generate random values
   (nor should it); but this leaves open question of what should happen to
   early bootstrap tasks and whether they can start using PAC after
   initial startup.

* Unlike on x86, I believe it is not possible to fully restore execution
   context (the values of all registers, including pc and cpsr) purely in
   userland; one of the reasons for that being that we can apparently no
   longer do a load from memory straight into pc, like it was possible in
   previous ARM revisions. So the way sigreturn () works on Linux is of
   course they have it as a syscall that takes a struct sigcontext, and
   writes it over the saved thread state. Sounds familiar to you? — of
   course, that's almost exactly like thread_set_state () in Mach-speak.
   The difference being that thread_set_state () explicitly disallows you
   to set the calling thread's state, which makes it impossible to use for
   implementing sigreturn (). So I'm thinking we should lift that
   restriction; there's no reason why thread_set_state () cannot be made
   to work on the calling thread; it only requires some careful coding to
   make sure the return register (%eax/%rax/x0) is *not* rewritten with
   mach_msg_trap's return code, unlike normally.

   But other than that, I do have AArch64 versions of trampoline.c and
   intr-msg.h (complete with SYSCALL_EXAMINE & MSG_EXAMINE). Whether they
   work, we'll only learn once we have enough of the Hurd running to have
   the proc server.

Really great work! To work on gnumach we just need MIG and any armv8 compiler (also targeting GNU/Linux is fine), and it seems MIG works fine without adjustments? Maybe there could be some issues once it's run somewhere, e.g. alignment issues.

Anyways, enjoy! As said, I will be posting the patches some time soon.
I of course don't expect to get any reviews during the holidays. And —
any volunteers for a gnumach port? :)

Another issue with ARM in general is that the hardware support is much less streamlined than x86, although with v8 there should be some alignment on basic stuff like IRQ and UEFI. Probably even the serial console needs a platform-specific driver (I'm not sure, I'm more familiar with older and more embedded variants like Cortex-M)

To bootstrap gnumach the first thing we'd need would probably be the console, setting up the virtual memory, then thread states, context switch, irqs and userspace entry points (list by no means exhaustive).

I actually have an armv8 server that would be handy for some development, so I might be able to help with something in the future.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]