phcoder and everyone else on the list, hello.
As many of you know, the builtin LUKS decryption in GRUB is a major feature
that enables many advanced setups, such as coreboot-based Full Disk Encryption.
However, it has been reported [1] the speed of cryptomount is extremely slow.
On my box, if a large number of iterations is used (by default), GNU/Linux takes
2 seconds to derive the LUKS master key, while on GRUB, it takes about 40 seconds.
It strongly affects the usability of LUKS on GRUB. On one hand, if user chooses
a large number of iterations, GRUB will take at least 40 seconds to unlock an
encrypted partition. If a typo is made while entering the passphrase, it will
be even slower. It forces many users to choose a smaller number of iterations, but
it makes the passphrase more vulnerable to brute-force attacks from modern CPUs
and GPUs with their ever-increasing computational power, and thus discouraged by
LUKS developers. The performance issue must be solved.
I've investigated the cause of the issue, and I found the culprit is the
C-implementation of SHA-512 hash function, which is essential for a 256-bit
encryption setup. Since SHA-512 manipulates 64-bit integers, its performance is
very poor on x86.
Now, I'm working on some GRUB hacking to integrate a SSE2-optimized version of
SHA512 hash function for GRUB on x86. It would boost the performance of key
derivation by 400%. I've already added the implementation to libgcrypt-grub, and
it would be automatically selected based on CPUID, in the same way as libgcrypt
does it in the upstream.
In GRUB SSE registers are disabled. If you want to use SSE, you need to make sure you enable them and that they are disabled again before kernel handoff