When both "dest" and "src" are aligned, copying the data in grub_addr_t
sized chunks is more efficient than a byte-by-byte copy.
Also tweak __aeabi_memcpy(), __aeabi_memcpy4(), and __aeabi_memcpy8(),
since grub_memcpy() is not inline anymore.
Optimization for unaligned buffers was omitted to maintain code
simplicity and readability. The current chunk-copy optimization
for aligned buffers already provides a noticeable performance
improvement (*) for Argon2 keyslot decryption.
(*) On my system, for a LUKS2 keyslot configured with a 1 GB Argon2
memory requirement, this patch reduces the decryption time from
22 seconds to 12 seconds.
Signed-off-by: Gary Lin <glin@suse.com>
Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
libgcc for boot environment isn't always present and compatible.
libgcc is often absent if endianness or bit-size at boot is different
from running OS.
libgcc may use optimised opcodes that aren't available on boot time.
So instead of relying on libgcc shipped with the compiler, supply
the functions in GRUB directly.
Tests are present to ensure that those replacement functions behave the
way compiler expects them to.