[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: master def6fa4246 2/2: Speed up string-lessp for multibyte strings
From: |
Mattias Engdegård |
Subject: |
Re: master def6fa4246 2/2: Speed up string-lessp for multibyte strings |
Date: |
Sun, 9 Oct 2022 10:42:36 +0200 |
8 okt. 2022 kl. 19.40 skrev Stefan Monnier <monnier@iro.umontreal.ca>:
> I ended up using `memcpy` which the compiler
> helpfully turns into plain word-sized loads. So we get code without
> alignment or architecture assumptions and efficient code (even on
> architectures that don't allow unaligned loads since the compiler can
> still produce more efficient code than a byte-by-byte loop).
Yes, I considered memcpy but was worried that compilers would generate poor
code (maybe a library call) on some platforms making a mockery of what was
intended as an optimisation. (memcpy scores fractionally better on the C
undefined-behaviour scale but I'm not overly worried.)
I may yet change my mind.
> [ Over on comp.arch the general mood is that not supporting unaligned
> loads natively is a ridiculous mistake because it's so cheap to
> implement (and the software workarounds are much more costly in
> comparison). ]
There's lots of merit in that, especially for code parsing network protocols
where packets in nested layers appear inside and next to each other so that
it's impossible to avoid at least one of them being unaligned no matter how
well your frames are laid out. Systems people tend to like unaligned-friendly
circuitry.
Naturally the standard-bearers of ultra-modern architecture, x86[-64] and
s390[x], allow unaligned access!