980 Commits

Author SHA1 Message Date
Xiaotian Wu
d33cbf2d8f loongarch: Add auxiliary files
Add support for manipulating architectural cache and timers, and EFI
memory maps.

Signed-off-by: Zhou Yang <zhouyang@loongson.cn>
Signed-off-by: Xiaotian Wu <wuxiaotian@loongson.cn>
Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
2023-05-17 13:21:43 +02:00
Xiaotian Wu
0b4693e32c loongarch: Add support for ELF psABI v2.00 relocations
A new set of relocation types was added in the LoongArch ELF psABI v2.00
spec [1], [2] to replace the stack-based scheme in v1.00. Toolchain
support is available from binutils 2.40 and gcc 13 onwards.

This patch adds support for the new relocation types, that are simpler
to handle (in particular, stack operations are gone). Support for the
v1.00 relocs are kept for now, for compatibility with older toolchains.

[1] https://github.com/loongson/LoongArch-Documentation/pull/57
[2] https://loongson.github.io/LoongArch-Documentation/LoongArch-ELF-ABI-EN.html#_appendix_revision_history

Signed-off-by: Xiaotian Wu <wuxiaotian@loongson.cn>
Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
2023-05-17 13:18:36 +02:00
Xiaotian Wu
b264f098be loongarch: Add support for ELF psABI v1.00 relocations
This patch adds support of the stack-based LoongArch relocations
throughout GRUB, including tools, dynamic linkage, and support for
conversion of ELF relocations into PE ones. A stack machine is required
to handle these per the spec [1] (see the R_LARCH_SOP types), of which
a simple implementation is included.

These relocations are produced by binutils 2.38 and 2.39, while the newer
v2.00 relocs require more recent toolchain (binutils 2.40+ & gcc 13+, or
LLVM 16+). GCC 13 has not been officially released as of early 2023, so
support for v1.00 relocs are expected to stay relevant for a while.

[1] https://loongson.github.io/LoongArch-Documentation/LoongArch-ELF-ABI-EN.html#_relocations

Signed-off-by: Zhou Yang <zhouyang@loongson.cn>
Signed-off-by: Xiaotian Wu <wuxiaotian@loongson.cn>
Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
2023-05-17 12:59:08 +02:00
Xiaotian Wu
6a42dd9e25 loongarch: Add early startup code
On entry, we need to save the system table pointer as well as our image
handle. Add an early startup file that saves them and then brings us
into our main function.

Signed-off-by: Zhou Yang <zhouyang@loongson.cn>
Signed-off-by: Xiaotian Wu <wuxiaotian@loongson.cn>
Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
2023-05-17 12:53:08 +02:00
Mukesh Kumar Chaurasiya
b5b7fe64d6 disk: Replace transform_sector() function with grub_disk_to_native_sector()
The transform_sector() function is not very clear in what it's doing
and confusing. The GRUB already has a function which is doing the same
thing in a very self explanatory way, i.e., grub_disk_to_native_sector().
So, it's much better to use self explanatory one than transform_sector().

Signed-off-by: Mukesh Kumar Chaurasiya <mchauras@linux.vnet.ibm.com>
Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
2023-04-13 15:04:33 +02:00
Avnish Chouhan
98d0df0351 kern/ieee1275/init: Extended support in Vec5
This patch enables multiple options in Vec5 which are required and
solves the boot issues seen on some machines which are looking for
these specific options.

1. LPAR: Client program supports logical partitioning and
   associated hcall()s.
2. SPLPAR: Client program supports the Shared
   Processor LPAR Option.
3. DYN_RCON_MEM: Client program supports the
   “ibm,dynamic-reconfiguration-memory” property and it may be
   presented in the device tree.
4. LARGE_PAGES: Client supports pages larger than 4 KB.
5. DONATE_DCPU_CLS: Client supports donating dedicated processor cycles.
6. PCI_EXP: Client supports PCI Express implementations
   utilizing Message Signaled Interrupts (MSIs).

7. CMOC: Enables the Cooperative Memory Over-commitment Option.
8. EXT_CMO: Enables the Extended Cooperative Memory Over-commit Option.

9. ASSOC_REF: Enables “ibm,associativity” and
   “ibm,associativity-reference-points” properties.
10. AFFINITY: Enables Platform Resource Reassignment Notification.
11. NUMA: Supports NUMA Distance Lookup Table Option.

12. HOTPLUG_INTRPT: Supports Hotplug Interrupts.
13. HPT_RESIZE: Enable Hash Page Table Resize Option.

14. MAX_CPU: Defines maximum number of CPUs supported.

15. PFO_HWRNG: Supports Random Number Generator.
16. PFO_HW_COMP: Supports Compression Engine.
17. PFO_ENCRYPT: Supports Encryption Engine.

18. SUB_PROCESSORS: Supports Sub-Processors.

19. DY_MEM_V2: Client program supports the “ibm,dynamic-memory-v2” property in the
    “ibm,dynamic-reconfiguration-memory” node and it may be presented in the device tree.
20. DRC_INFO: Client program supports the “ibm,drc-info” property definition and it may be
    presented in the device tree.

Signed-off-by: Avnish Chouhan <avnish@linux.vnet.ibm.com>
Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
2023-03-29 20:35:06 +02:00
Avnish Chouhan
8406cfe477 kern/ieee1275/init: Convert plain numbers to constants in Vec5
This patch converts the plain numbers used in Vec5 properties to constants.

1. LPAR: Client program supports logical partitioning and
   associated hcall()s.
2. SPLPAR: Client program supports the Shared
   Processor LPAR Option.
3. CMO: Enables the Cooperative Memory Over-commitment Option.
4. MAX_CPU: Defines maximum number of CPUs supported.

Signed-off-by: Avnish Chouhan <avnish@linux.vnet.ibm.com>
Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
2023-03-29 20:35:05 +02:00
Daniel Axtens
d8953d0793 commands/memtools: Add memtool module with memory allocation stress-test
When working on memory, it's nice to be able to test your work.

Add a memtest module. When compiled with --enable-mm-debug, it exposes
3 commands:

 * lsmem - print all allocations and free space in all regions
 * lsfreemem - print free space in all regions

 * stress_big_allocs - stress test large allocations:
  - how much memory can we allocate in one chunk?
  - how many 1MB chunks can we allocate?
  - check that gap-filling works with a 1MB aligned 900kB alloc + a
     100kB alloc.

Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
Tested-by: Nageswara R Sastry <rnsastry@linux.ibm.com>
Reviewed-by: Robbie Harwood <rharwood@redhat.com>
2023-03-07 15:26:36 +01:00
Diego Domingos
1b4d91185b ieee1275: Implement vec5 for cas negotiation
As a legacy support, if the vector 5 is not implemented, Power Hypervisor will
consider the max CPUs as 64 instead 256 currently supported during
client-architecture-support negotiation.

This patch implements the vector 5 and set the MAX CPUs to 256 while setting the
others values to 0 (default).

Signed-off-by: Diego Domingos <diegodo@linux.vnet.ibm.com>
Acked-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: Avnish Chouhan <avnish@linux.vnet.ibm.com>
Tested-by: Nageswara R Sastry <rnsastry@linux.ibm.com>
Reviewed-by: Robbie Harwood <rharwood@redhat.com>
Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
2023-03-07 15:24:14 +01:00
Daniel Axtens
2e645b46e8 ieee1275: Support runtime memory claiming
On powerpc-ieee1275, we are running out of memory trying to verify
anything. This is because:

 - we have to load an entire file into memory to verify it. This is
   difficult to change with appended signatures.
 - We only have 32MB of heap.
 - Distro kernels are now often around 30MB.

So we want to be able to claim more memory from OpenFirmware for our heap
at runtime.

There are some complications:

 - The grub mm code isn't the only thing that will make claims on
   memory from OpenFirmware:

    * PFW/SLOF will have claimed some for their own use.

    * The ieee1275 loader will try to find other bits of memory that we
      haven't claimed to place the kernel and initrd when we go to boot.

    * Once we load Linux, it will also try to claim memory. It claims
      memory without any reference to /memory/available, it just starts
      at min(top of RMO, 768MB) and works down. So we need to avoid this
      area. See arch/powerpc/kernel/prom_init.c as of v5.11.

 - The smallest amount of memory a ppc64 KVM guest can have is 256MB.
   It doesn't work with distro kernels but can work with custom kernels.
   We should maintain support for that. (ppc32 can boot with even less,
   and we shouldn't break that either.)

 - Even if a VM has more memory, the memory OpenFirmware makes available
   as Real Memory Area can be restricted. Even with our CAS work, an LPAR
   on a PowerVM box is likely to have only 512MB available to OpenFirmware
   even if it has many gigabytes of memory allocated.

What should we do?

We don't know in advance how big the kernel and initrd are going to be,
which makes figuring out how much memory we can take a bit tricky.

To figure out how much memory we should leave unused, I looked at:

 - an Ubuntu 20.04.1 ppc64le pseries KVM guest:
    vmlinux: ~30MB
    initrd:  ~50MB

 - a RHEL8.2 ppc64le pseries KVM guest:
    vmlinux: ~30MB
    initrd:  ~30MB

So to give us a little wriggle room, I think we want to leave at least
128MB for the loader to put vmlinux and initrd in memory and leave Linux
with space to satisfy its early allocations.

Allow other space to be allocated at runtime.

Tested-by: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: Daniel Axtens <dja@axtens.net>
Tested-by: Nageswara R Sastry <rnsastry@linux.ibm.com>
Reviewed-by: Robbie Harwood <rharwood@redhat.com>
Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
2023-03-07 15:20:53 +01:00
Daniel Axtens
81b0b1cea4 ieee1275: Drop len -= 1 quirk in heap_init
This was apparently "required by some firmware": commit dc9468500919
(2007-02-12  Hollis Blanchard  <hollis@penguinppc.org>).

It's not clear what firmware that was, and what platform from 14 years ago
which exhibited the bug then is still both in use and buggy now.

It doesn't cause issues on qemu (mac99 or pseries) or under PFW for Power8.

I don't have access to old Mac hardware, but if anyone feels especially
strongly we can put it under some feature flag. I really want to disable
it under pseries because it will mess with region merging.

Signed-off-by: Daniel Axtens <dja@axtens.net>
Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
Tested-by: Nageswara R Sastry <rnsastry@linux.ibm.com>
Reviewed-by: Robbie Harwood <rharwood@redhat.com>
2023-03-07 15:18:47 +01:00
Daniel Axtens
b5fd45a50f ieee1275: Request memory with ibm, client-architecture-support
On PowerVM, the first time we boot a Linux partition, we may only get
256MB of real memory area, even if the partition has more memory.

This isn't enough to reliably verify a kernel. Fortunately, the Power
Architecture Platform Reference (PAPR) defines a method we can call to ask
for more memory: the broad and powerful ibm,client-architecture-support
(CAS) method.

CAS can do an enormous amount of things on a PAPR platform: as well as
asking for memory, you can set the supported processor level, the interrupt
controller, hash vs radix mmu, and so on.

If:

 - we are running under what we think is PowerVM (compatible property of /
   begins with "IBM"), and

 - the full amount of RMA is less than 512MB (as determined by the reg
   property of /memory)

then call CAS as follows: (refer to the Linux on Power Architecture
Reference, LoPAR, which is public, at B.5.2.3):

 - Use the "any" PVR value and supply 2 option vectors.

 - Set option vector 1 (PowerPC Server Processor Architecture Level)
   to "ignore".

 - Set option vector 2 with default or Linux-like options, including a
   min-rma-size of 512MB.

 - Set option vector 3 to request Floating Point, VMX and Decimal Floating
   point, but don't abort the boot if we can't get them.

 - Set option vector 4 to request a minimum VP percentage to 1%, which is
   what Linux requests, and is below the default of 10%. Without this,
   some systems with very large or very small configurations fail to boot.

This will cause a CAS reboot and the partition will restart with 512MB
of RMA. Importantly, grub will notice the 512MB and not call CAS again.

Notes about the choices of parameters:

 - A partition can be configured with only 256MB of memory, which would
   mean this request couldn't be satisfied, but PFW refuses to load with
   only 256MB of memory, so it's a bit moot. SLOF will run fine with 256MB,
   but we will never call CAS under qemu/SLOF because /compatible won't
   begin with "IBM".)

 - unspecified CAS vectors take on default values. Some of these values
   might restrict the ability of certain hardware configurations to boot.
   This is why we need to specify the VP percentage in vector 4, which is
   in turn why we need to specify vector 3.

Finally, we should have enough memory to verify a kernel, and we will
reach Linux. One of the first things Linux does while still running under
OpenFirmware is to call CAS with a much fuller set of options (including
asking for 512MB of memory). Linux includes a much more restrictive set of
PVR values and processor support levels, and this CAS invocation will likely
induce another reboot. On this reboot grub will again notice the higher RMA,
and not call CAS. We will get to Linux again, Linux will call CAS again, but
because the values are now set for Linux this will not induce another CAS
reboot and we will finally boot all the way to userspace.

On all subsequent boots, everything will be configured with 512MB of RMA,
so there will be no further CAS reboots from grub. (phyp is super sticky
with the RMA size - it persists even on cold boots. So if you've ever booted
Linux in a partition, you'll probably never have grub call CAS. It'll only
ever fire the first time a partition loads grub, or if you deliberately lower
the amount of memory your partition has below 512MB.)

Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
Tested-by: Nageswara R Sastry <rnsastry@linux.ibm.com>
Reviewed-by: Robbie Harwood <rharwood@redhat.com>
2023-03-07 15:14:29 +01:00
Khem Raj
403d6540cd RISC-V: Handle R_RISCV_CALL_PLT reloc
GNU assembler starting 2.40 release always generates R_RISCV_CALL_PLT
reloc for call in assembler [1], similarly LLVM does not make
distinction between R_RISCV_CALL_PLT and R_RISCV_CALL [2].

Fixes "grub-mkimage: error: relocation 0x13 is not implemented yet.".

[1] https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=70f35d72ef04cd23771875c1661c9975044a749c
[2] https://reviews.llvm.org/D132530

Signed-off-by: Khem Raj <raj.khem@gmail.com>
Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
2023-02-28 14:04:11 +01:00
Zhang Boyang
65bc459630 mm: Avoid complex heap growth math in hot path
We do a lot of math about heap growth in hot path of grub_memalign().
However, the result is only used if out of memory is encountered, which
is seldom.

This patch moves these calculations away from hot path. These
calculations are now only done if out of memory is encountered. This
change can also help compiler to optimize integer overflow checks away.

Signed-off-by: Zhang Boyang <zhangboyang.id@gmail.com>
Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
2023-02-02 19:44:56 +01:00
Zhang Boyang
21869baec1 mm: Preallocate some space when adding new regions
When grub_memalign() encounters out-of-memory, it will try
grub_mm_add_region_fn() to request more memory from system firmware.
However, it doesn't preallocate memory space for future allocation
requests. In extreme cases, it requires one call to
grub_mm_add_region_fn() for each memory allocation request. This can
be very slow.

This patch introduces GRUB_MM_HEAP_GROW_EXTRA, the minimal heap growth
granularity. The new region size is now set to the bigger one of its
original value and GRUB_MM_HEAP_GROW_EXTRA. Thus, it will result in some
memory space preallocated if current allocations request is small.

The value of GRUB_MM_HEAP_GROW_EXTRA is set to 1MB. If this value is
smaller, the cost of small memory allocations will be higher. If this
value is larger, more memory will be wasted and it might cause
out-of-memory on machines with small amount of RAM.

Signed-off-by: Zhang Boyang <zhangboyang.id@gmail.com>
Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
2023-02-02 19:44:56 +01:00
Zhang Boyang
2282cbfe5a mm: Adjust new region size to take management overhead into account
When grub_memalign() encounters out-of-memory, it will try
grub_mm_add_region_fn() to request more memory from system firmware.
However, the size passed to it doesn't take region management overhead
into account. Adding a memory area of "size" bytes may result in a heap
region of less than "size" bytes really available. Thus, the new region
may not be adequate for current allocation request, confusing
out-of-memory handling code.

This patch introduces GRUB_MM_MGMT_OVERHEAD to address the region
management overhead (e.g. metadata, padding). The value of this new
constant must be large enough to make sure grub_memalign(align, size)
always succeeds after a successful call to
  grub_mm_init_region(addr, size + align + GRUB_MM_MGMT_OVERHEAD),
for any given addr and size (assuming no integer overflow).

The size passed to grub_mm_add_region_fn() is now correctly adjusted,
thus if grub_mm_add_region_fn() succeeded, current allocation request
can always succeed.

Signed-off-by: Zhang Boyang <zhangboyang.id@gmail.com>
Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
2023-02-02 19:44:56 +01:00
Benjamin Herrenschmidt
cff78b3b61 kern/acpi: Export a generic grub_acpi_find_table()
And convert grub_acpi_find_fadt() to use it.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
2023-01-18 23:07:06 +01:00
Maxim Fomin
1a241e0506 kern/fs: Fix possible integer overflow in i386-pc mode with large partitions
The i386-pc mode supports MBR partition scheme where maximum partition
size is 2 TiB. In case of large partitions left shift expression with
unsigned long int "length" object may cause integer overflow making
calculated partition size less than true value. This issue is fixed by
increasing the size of "length" integer type.

Signed-off-by: Maxim Fomin <maxim@fomin.one>
Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
2023-01-10 16:37:11 +01:00
Steve McIntyre
e375394fb9 kern/file: Fix error handling in grub_file_open()
grub_file_open() calls grub_file_get_device_name(), but doesn't check
the return. Instead, it checks if grub_errno is set.

However, nothing initialises grub_errno here when grub_file_open()
starts. This means that trying to open one file that doesn't exist and
then trying to open another file that does will (incorrectly) also
fail to open that second file.

Let's fix that.

Signed-off-by: Steve McIntyre <steve@einval.com>
Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
2022-12-07 23:38:26 +01:00
Zhang Boyang
93a786a001 kern/efi/sb: Enforce verification of font files
As a mitigation and hardening measure enforce verification of font
files. Then only trusted font files can be load. This will reduce the
attack surface at cost of losing the ability of end-users to customize
fonts if e.g. UEFI Secure Boot is enabled. Vendors can always customize
fonts because they have ability to pack fonts into their GRUB bundles.

This goal is achieved by:

  * Removing GRUB_FILE_TYPE_FONT from shim lock verifier's
    skip-verification list.

  * Adding GRUB_FILE_TYPE_FONT to lockdown verifier's defer-auth list,
    so font files must be verified by a verifier before they can be loaded.

Suggested-by: Daniel Kiper <daniel.kiper@oracle.com>
Signed-off-by: Zhang Boyang <zhangboyang.id@gmail.com>
Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
2022-11-14 20:24:39 +01:00
Robbie Harwood
b20192f22c kern/env: Add function for retrieving variables as booleans
Signed-off-by: Robbie Harwood <rharwood@redhat.com>
Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
2022-11-14 17:21:53 +01:00
Raymund Will
e364307f6a loader: Add support for grub-emu to kexec Linux menu entries
The GRUB emulator is used as a debugging utility but it could also be
used as a user-space bootloader if there is support to boot an operating
system.

The Linux kernel is already able to (re)boot another kernel via the
kexec boot mechanism. So the grub-emu tool could rely on this feature
and have linux and initrd commands that are used to pass a kernel,
initramfs image and command line parameters to kexec for booting
a selected menu entry.

By default the systemctl kexec option is used so systemd can shutdown
all of the running services before doing a reboot using kexec. But if
this is not present, it can fall back to executing the kexec user-space
tool directly. The ability to force a kexec-reboot when systemctl kexec
fails must only be used in controlled environments to avoid possible
filesystem corruption and data loss.

Signed-off-by: Raymund Will <rw@suse.com>
Signed-off-by: John Jolly <jjolly@suse.com>
Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
Signed-off-by: Robbie Harwood <rharwood@redhat.com>
Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
2022-11-14 17:13:24 +01:00
Ard Biesheuvel
6d7bb89efa efi: Move MS-DOS stub out of generic PE header definition
The PE/COFF spec permits the COFF signature and file header to appear
anywhere in the file, and the actual offset is recorded in 4 byte
little endian field at offset 0x3c of the image.

When GRUB is emitted as a PE/COFF binary, we reuse the 128 byte MS-DOS
stub (even for non-x86 architectures), putting the COFF signature and
file header at offset 0x80. However, other PE/COFF images may use
different values, and non-x86 Linux kernels use an offset of 0x40
instead.

So let's get rid of the grub_pe32_header struct from pe32.h, given that
it does not represent anything defined by the PE/COFF spec. Instead,
introduce a minimal struct grub_msdos_image_header type based on the
PE/COFF spec's description of the image header, and use the offset
recorded at file position 0x3c to discover the actual location of the PE
signature and the COFF image header.

The remaining fields are moved into a struct grub_pe_image_header,
which we will use later to access COFF header fields of arbitrary
images (and which may therefore appear at different offsets)

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
2022-10-27 16:53:01 +02:00
Jagannathan Raman
82ff9faa5b kern/buffer: Handle NULL input pointer in grub_buffer_free()
The grub_buffer_free() should handle NULL input pointer, similar to
grub_free(). If the pointer is not referencing any memory location,
grub_buffer_free() need not perform any function.

Fixes: CID 396931

Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Reviewed-by: Ross Philipson <ross.philipson@oracle.com>
Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
2022-10-27 16:51:21 +02:00
Darren Kenny
8505f73003 gnulib: Provide abort() implementation for gnulib
The recent gnulib updates require an implementation of abort(), but the
current macro provided by changeset:

  cd37d3d3916c gnulib: Drop no-abort.patch

to config.h.in does not work with the clang compiler since it doesn't
provide a __builtin_trap() implementation, so this element of the
changeset needs to be reverted, and replaced.

After some discussion with Vladimir 'phcoder' Serbinenko and Daniel Kiper
it was suggested to bring back in the change from the changeset:

  db7337a3d353 * grub-core/gnulib/regcomp.c (regerror): ...

Which implements abort() as an inline call to grub_abort(), but since
that was made static by changeset:

  a8f15bceeafe * grub-core/kern/misc.c (grub_abort): Make static

it is also necessary to revert the specific part that makes it a static
function too.

Another implementation of abort() was found in grub-core/kern/compiler-rt.c
which needs to also be removed to be consistent.

Signed-off-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
2022-10-27 16:14:14 +02:00
Zhang Boyang
17975d10a8 mm: Try invalidate disk caches last when out of memory
Every heap grow will cause all disk caches invalidated which decreases
performance severely. This patch moves disk cache invalidation code to
the last of memory squeezing measures. So, disk caches are released only
when there are no other ways to get free memory.

Signed-off-by: Zhang Boyang <zhangboyang.id@gmail.com>
Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
Reviewed-by: Patrick Steinhardt <ps@pks.im>
2022-10-27 15:49:48 +02:00
Glenn Washburn
42a424c9d5 kern/corecmd: Quote variable values when displayed by the set command
Variable values may contain spaces at the end or newlines. However, when
displayed without quotes this is not obvious and can lead to confusion as
to the actual contents of variables. Also for some variables grub_env_get()
returns a NULL pointer instead of a pointer to an empty string and
previously would be printed as "var=(null)". Now such variables will be
displayed as "var=''".

Signed-off-by: Glenn Washburn <development@efficientek.com>
Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
2022-10-11 14:28:06 +02:00
Tuan Phan
3403774703 kern/compiler-rt: Fix __clzsi2() logic
Fix the incorrect return value of __clzsi2() function.

Fixes: e795b90 (RISC-V: Add libgcc helpers for clz)

Signed-off-by: Tuan Phan <tphan@ventanamicro.com>
Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
2022-10-04 18:10:18 +02:00
Daniel Axtens
75e38e86e7 efi: Increase default memory allocation to 32 MiB
We have multiple reports of things being slower with a 1 MiB initial static
allocation, and a report (more difficult to nail down) of a boot failure
as a result of the smaller initial allocation.

Make the initial memory allocation 32 MiB.

Signed-off-by: Daniel Axtens <dja@axtens.net>
Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
2022-10-04 17:06:25 +02:00
Robbie Harwood
dbc641ac92 efi: Make all grub_efi_guid_t variables static
This is believed to result in smaller code.

Signed-off-by: Robbie Harwood <rharwood@redhat.com>
Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
2022-08-20 00:23:11 +02:00
Alec Brown
ddb6c1bafb elf: Validate number of elf program header table entries
In bsdXX.c and multiboot_elfxx.c, e_phnum is used to obtain the number of
program header table entries, but it wasn't being checked if the value was
there.

According to the elf(5) manual page,
"If the number of entries in the program header table is larger than or equal to
PN_XNUM (0xffff), this member holds PN_XNUM (0xffff) and the real number of
entries in the program header table is held in the sh_info member of the
initial entry in section header table.  Otherwise, the sh_info member of the
initial entry contains the value zero."

Since this check wasn't being made, grub_elfXX_get_phnum() is being added to
elfXX.c to make this check and use e_phnum if it doesn't have PN_XNUM as a
value, else use sh_info. We also need to make sure e_phnum isn't greater than
PN_XNUM and sh_info isn't less than PN_XNUM.

Note that even though elf.c and elfXX.c are located in grub-core/kern, they are
compiled as modules and don't need the EXPORT_FUNC() macro to define the functions
in elf.h.

Also, changed casts of phnum to match variables being set as well as dropped
casts when unnecessary.

Signed-off-by: Alec Brown <alec.r.brown@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
2022-08-19 22:27:51 +02:00
Alec Brown
385d906007 elf: Validate elf section header table index for section name string table
In multiboot_elfxx.c, e_shstrndx is used to obtain the section header table
index of the section name string table, but it wasn't being checked if the value
was there.

According to the elf(5) manual page,
"If the index of section name string table section is larger than or equal to
SHN_LORESERVE (0xff00), this member holds SHN_XINDEX (0xffff) and the real
index of the section name string table section is held in the sh_link member of
the initial entry in section header table. Otherwise, the sh_link member of the
initial entry in section header table contains the value zero."

Since this check wasn't being made, grub_elfXX_get_shstrndx() is being added to
elfXX.c to make this check and use e_shstrndx if it doesn't have SHN_XINDEX as a
value, else use sh_link. We also need to make sure e_shstrndx isn't greater than
or equal to SHN_LORESERVE and sh_link isn't less than SHN_LORESERVE.

Note that even though elf.c and elfXX.c are located in grub-core/kern, they are
compiled as modules and don't need the EXPORT_FUNC() macro to define the functions
in elf.h.

Signed-off-by: Alec Brown <alec.r.brown@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
2022-08-19 22:26:53 +02:00
Alec Brown
27a14a8ae2 elf: Validate number of elf section header table entries
In bsdXX.c and multiboot_elfxx.c, e_shnum is used to obtain the number of
section header table entries, but it wasn't being checked if the value was
there.

According to the elf(5) manual page,
"If the number of entries in the section header table is larger than or equal to
SHN_LORESERVE (0xff00), e_shnum holds the value zero and the real number of
entries in the section header table is held in the sh_size member of the initial
entry in section header table. Otherwise, the sh_size member of the initial
entry in the section header table holds the value zero."

Since this check wasn't being made, grub_elfXX_get_shnum() is being added to
elfXX.c to make this check and use whichever member doesn't have a value of
zero. If both are zero, then we must return an error. We also need to make sure
that e_shnum doesn't have a value greater than or equal to SHN_LORESERVE and
sh_size isn't less than SHN_LORESERVE.

In order to get this function to work, the type ElfXX_Shnum is being added where
Elf32_Shnum defines Elf32_Word and Elf64_Shnum defines Elf64_Xword. This new
type is needed because if shnum obtains a value from sh_size, sh_size could be
of type El32_Word for Elf32_Shdr structures or Elf64_Xword for Elf64_Shdr
structures.

Note that even though elf.c and elfXX.c are located in grub-core/kern, they are
compiled as modules and don't need the EXPORT_FUNC() macro to define the functions
in elf.h.

For a few smaller changes, changed casts of shnum to match variables being set
as well as dropped casts when unnecessary and fixed spacing errors in bsdXX.c.
Also, shnum is an unsigned integer and is compared to int i in multiboot_elfxx.c,
it should be unsigned to match shnum.

Signed-off-by: Alec Brown <alec.r.brown@oracle.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
2022-08-19 22:26:35 +02:00
Peter Jones
8812755d97 kern/i386/tsc_pmtimer: Make pmtimer tsc calibration not take 51 seconds to fail
On my laptop running at 2.4GHz, if I run a VM where tsc calibration
using pmtimer will fail presuming a broken pmtimer, it takes ~51 seconds
to do so (as measured with the stopwatch on my phone), with a tsc delta
of 0x1cd1c85300, or around 125 billion cycles.

If instead of trying to wait for 5-200ms to show up on the pmtimer, we
try to wait for 5-200us, it decides it's broken in ~0x2626aa0 TSCs, aka
~2.4 million cycles, or more or less instantly.

Additionally, this reading the pmtimer was returning 0xffffffff anyway,
and that's obviously an invalid return. I've added a check for that and
0 so we don't bother waiting for the test if what we're seeing is dead
pins with no response at all.

If "debug" includes "pmtimer", you will see one of the following three
outcomes. If pmtimer gives all 0 or all 1 bits, you will see:

  pmtimer: 0xffffff bad_reads: 1
  pmtimer: 0xffffff bad_reads: 2
  pmtimer: 0xffffff bad_reads: 3
  pmtimer: 0xffffff bad_reads: 4
  pmtimer: 0xffffff bad_reads: 5
  pmtimer: 0xffffff bad_reads: 6
  pmtimer: 0xffffff bad_reads: 7
  pmtimer: 0xffffff bad_reads: 8
  pmtimer: 0xffffff bad_reads: 9
  pmtimer: 0xffffff bad_reads: 10
  timer is broken; giving up.

This outcome was tested using qemu+kvm with UEFI (OVMF) firmware and
these options: -machine pc-q35-2.10 -cpu Broadwell-noTSX

If pmtimer gives any other bit patterns but is not actually marching
forward fast enough to use for clock calibration, you will see:

  pmtimer delta is 0x0 (1904 iterations)
  tsc delta is implausible: 0x2626aa0

This outcome was tested using GRUB patched to not ignore bad reads using
qemu+kvm with UEFI (OVMF) firmware, and these options:
-machine pc-q35-2.10 -cpu Broadwell-noTSX

If pmtimer actually works, you'll see something like:

  pmtimer delta is 0xdff
  tsc delta is 0x278756

This outcome was tested using qemu+kvm with UEFI (OVMF) firmware, and
these options: -machine pc-i440fx-2.4 -cpu Broadwell-noTSX

I've also tested this outcome on a real Intel Xeon E3-1275v3 on an Intel
Server Board S1200V3RPS using the SDV.RP.B8 "Release" build here:
https://www.intel.com/content/www/us/en/download/674448/firmware-update-for-the-intel-server-board-s1200rp-uefi-development-kit-release-vb8.html

Signed-off-by: Peter Jones <pjones@redhat.com>
Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
Signed-off-by: Robbie Harwood <rharwood@redhat.com>
Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
2022-08-10 14:24:46 +02:00
Robbie Harwood
92005be6d8 kern/fs: The grub_fs_probe() should dprint errors from filesystems
When filesystem detection fails, all that's currently debug-logged is
a series of messages like:

    grub-core/kern/fs.c:56:fs: Detecting ntfs...
    grub-core/kern/fs.c:76:fs: ntfs detection failed.

repeated for each filesystem. Any messages provided to grub_error() by
the filesystem are lost, and one has to break out gdb to figure out what
went wrong.

With this change, one instead sees:

    grub-core/kern/fs.c:56:fs: Detecting fat...
    grub-core/osdep/hostdisk.c:357:hostdisk: reusing open device
    `/path/to/device'
    grub-core/kern/fs.c:77:fs: error: invalid modification timestamp for /.
    grub-core/kern/fs.c:79:fs: fat detection failed.

in the debug prints.

Signed-off-by: Robbie Harwood <rharwood@redhat.com>
Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
2022-07-27 19:20:53 +02:00
Glenn Washburn
f5a92e6040 disk: Allow read hook callback to take read buffer to potentially modify it
It will be desirable in the future to allow having the read hook modify the
data passed back from a read function call on a disk or file. This adds that
infrastructure and has no impact on code flow for existing uses of the read
hook. Also changed is that now when the read hook callback is called it can
also indicate what error code should be sent back to the read caller.

Signed-off-by: Glenn Washburn <development@efficientek.com>
Reviewed-by: Patrick Steinhardt <ps@pks.im>
Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
2022-07-04 14:43:25 +02:00
Patrick Steinhardt
1df2934822 kern/efi/mm: Implement runtime addition of pages
Adjust the interface of grub_efi_mm_add_regions() to take a set of
GRUB_MM_ADD_REGION_* flags, which most notably is currently only the
GRUB_MM_ADD_REGION_CONSECUTIVE flag. This allows us to set the function
up as callback for the memory subsystem and have it call out to us in
case there's not enough pages available in the current heap.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Daniel Axtens <dja@axtens.net>
Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
Tested-by: Patrick Steinhardt <ps@pks.im>
2022-07-04 14:43:17 +02:00
Patrick Steinhardt
15a0156989 kern/efi/mm: Pass up errors from add_memory_regions()
The function add_memory_regions() is currently only called on system
initialization to allocate a fixed amount of pages. As such, it didn't
need to return any errors: in case it failed, we cannot proceed anyway.
This will change with the upcoming support for requesting more memory
from the firmware at runtime, where it doesn't make sense anymore to
fail hard.

Refactor the function to return an error to prepare for this. Note that
this does not change the behaviour when initializing the memory system
because grub_efi_mm_init() knows to call grub_fatal() in case
grub_efi_mm_add_regions() returns an error.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Daniel Axtens <dja@axtens.net>
Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
Tested-by: Patrick Steinhardt <ps@pks.im>
2022-07-04 14:38:57 +02:00
Patrick Steinhardt
96a7ea29e3 kern/efi/mm: Extract function to add memory regions
In preparation of support for runtime-allocating additional memory
region, this patch extracts the function to retrieve the EFI memory
map and add a subset of it to GRUB's own memory regions.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Daniel Axtens <dja@axtens.net>
Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
Tested-by: Patrick Steinhardt <ps@pks.im>
2022-07-04 14:36:15 +02:00
Patrick Steinhardt
938c3760b8 kern/efi/mm: Always request a fixed number of pages on init
When initializing the EFI memory subsystem, we will by default request
a quarter of the available memory, bounded by a minimum/maximum value.
Given that we're about to extend the EFI memory system to dynamically
request additional pages from the firmware as required, this scaling of
requested memory based on available memory will not make a lot of sense
anymore.

Remove this logic as a preparatory patch such that we'll instead defer
to the runtime memory allocator. Note that ideally, we'd want to change
this after dynamic requesting of pages has been implemented for the EFI
platform. But because we'll need to split up initialization of the
memory subsystem and the request of pages from the firmware, we'd have
to duplicate quite some logic at first only to remove it afterwards
again. This seems quite pointless, so we instead have patches slightly
out of order.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Daniel Axtens <dja@axtens.net>
Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
Tested-by: Patrick Steinhardt <ps@pks.im>
2022-07-04 14:25:41 +02:00
Patrick Steinhardt
887f98f0db mm: Allow dynamically requesting additional memory regions
Currently, all platforms will set up their heap on initialization of the
platform code. While this works mostly fine, it poses some limitations
on memory management on us. Most notably, allocating big chunks of
memory in the gigabyte range would require us to pre-request this many
bytes from the firmware and add it to the heap from the beginning on
some platforms like EFI. As this isn't needed for most configurations,
it is inefficient and may even negatively impact some usecases when,
e.g., chainloading. Nonetheless, allocating big chunks of memory is
required sometimes, where one example is the upcoming support for the
Argon2 key derival function in LUKS2.

In order to avoid pre-allocating big chunks of memory, this commit
implements a runtime mechanism to add more pages to the system. When
a given allocation cannot be currently satisfied, we'll call a given
callback set up by the platform's own memory management subsystem,
asking it to add a memory area with at least "n" bytes. If this
succeeds, we retry searching for a valid memory region, which should
now succeed.

If this fails, we try asking for "n" bytes, possibly spread across
multiple regions, in hopes that region merging means that we end up
with enough memory for things to work out.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Daniel Axtens <dja@axtens.net>
Tested-by: Stefan Berger <stefanb@linux.ibm.com>
Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
Tested-by: Patrick Steinhardt <ps@pks.im>
2022-07-04 14:25:41 +02:00
Patrick Steinhardt
139fd9b134 mm: Drop unused unloading of modules on OOM
In grub_memalign(), there's a commented section which would allow for
unloading of unneeded modules in case where there is not enough free
memory available to satisfy a request. Given that this code is never
compiled in, let's remove it together with grub_dl_unload_unneeded().

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Daniel Axtens <dja@axtens.net>
Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
Tested-by: Patrick Steinhardt <ps@pks.im>
2022-07-04 14:25:41 +02:00
Daniel Axtens
8afa5ef45b mm: Debug support for region operations
This is handy for debugging. Enable with "set debug=regions".

Signed-off-by: Daniel Axtens <dja@axtens.net>
Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
Tested-by: Patrick Steinhardt <ps@pks.im>
2022-07-04 14:25:29 +02:00
Daniel Axtens
052e6068be mm: When adding a region, merge with region after as well as before
On x86_64-efi (at least) regions seem to be added from top down. The mm
code will merge a new region with an existing region that comes
immediately before the new region. This allows larger allocations to be
satisfied that would otherwise be the case.

On powerpc-ieee1275, however, regions are added from bottom up. So if
we add 3x 32MB regions, we can still only satisfy a 32MB allocation,
rather than the 96MB allocation we might otherwise be able to satisfy.

  * Define 'post_size' as being bytes lost to the end of an allocation
    due to being given weird sizes from firmware that are not multiples
    of GRUB_MM_ALIGN.

  * Allow merging of regions immediately _after_ existing regions, not
    just before. As with the other approach, we create an allocated
    block to represent the new space and the pass it to grub_free() to
    get the metadata right.

Signed-off-by: Daniel Axtens <dja@axtens.net>
Tested-by: Stefan Berger <stefanb@linux.ibm.com>
Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
Tested-by: Patrick Steinhardt <ps@pks.im>
2022-07-04 14:13:56 +02:00
Daniel Axtens
f1ce0e15e7 kern/file: Do not leak device_name on error in grub_file_open()
If we have an error in grub_file_open() before we free device_name, we
will leak it.

Free device_name in the error path and null out the pointer in the good
path once we free it there.

Signed-off-by: Daniel Axtens <dja@axtens.net>
Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
2022-06-07 16:39:32 +02:00
Julian Andres Klode
6fe755c5c0 kern/efi/sb: Reject non-kernel files in the shim_lock verifier
We must not allow other verifiers to pass things like the GRUB modules.
Instead of maintaining a blocklist, maintain an allowlist of things
that we do not care about.

This allowlist really should be made reusable, and shared by the
lockdown verifier, but this is the minimal patch addressing
security concerns where the TPM verifier was able to mark modules
as verified (or the OpenPGP verifier for that matter), when it
should not do so on shim-powered secure boot systems.

Fixes: CVE-2022-28735

Signed-off-by: Julian Andres Klode <julian.klode@canonical.com>
Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
2022-06-07 16:39:31 +02:00
Michael Chang
acffb81485 build: Fix -Werror=array-bounds array subscript 0 is outside array bounds
The GRUB is failing to build with GCC-12 in many places like this:

  In function 'init_cbfsdisk',
      inlined from 'grub_mod_init' at ../../grub-core/fs/cbfs.c:391:3:
  ../../grub-core/fs/cbfs.c:345:7: error: array subscript 0 is outside array bounds of 'grub_uint32_t[0]' {aka 'unsigned int[]'} [-Werror=array-bounds]
    345 |   ptr = *(grub_uint32_t *) 0xfffffffc;
        |   ~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This is caused by GCC regression in 11/12 [1]. In a nut shell, the
warning is about detected invalid accesses at non-zero offsets to NULL
pointers. Since hardwired constant address is treated as NULL plus an
offset in the same underlying code, the warning is therefore triggered.

Instead of inserting #pragma all over the places where literal pointers
are accessed to avoid diagnosing array-bounds, we can try to borrow the
idea from Linux kernel that the absolute_pointer() macro [2][3] is used
to disconnect a pointer using literal address from it's original object,
hence GCC won't be able to make assumptions on the boundary while doing
pointer arithmetic. With that we can greatly reduce the code we have to
cover up by making initial literal pointer assignment to use the new
wrapper but not having to track everywhere literal pointers are
accessed. This also makes code looks cleaner.

Please note the grub_absolute_pointer() macro requires to be invoked in
a function as long as it is compound expression. Some global variables
with literal pointers has been changed to local ones in order to use
grub_absolute_pointer() to initialize it. The shuffling is basically done
in a selective and careful way that the variable's scope doesn't matter
being local or global, for example, the global variable must not get
modified at run time throughout. For the record, here's the list of
global variables got shuffled in this patch:

  grub-core/commands/i386/pc/drivemap.c:int13slot
  grub-core/term/i386/pc/console.c:bios_data_area
  grub-core/term/ns8250.c:serial_hw_io_addr

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99578
[2] https://elixir.bootlin.com/linux/v5.16.14/source/include/linux/compiler.h#L180
[3] https://elixir.bootlin.com/linux/v5.16.14/source/include/linux/compiler-gcc.h#L31

Signed-off-by: Michael Chang <mchang@suse.com>
Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
2022-04-20 18:27:52 +02:00
Chad Kimes
c143056a34 kern/efi/efi: Print VLAN info in EFI device path
Signed-off-by: Chad Kimes <chkimes@github.com>
Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
2022-04-20 13:58:13 +02:00
Chris Coulson
37ddd9457f kern/efi/init: Log a console error during a stack check failure
The initial implementation of the stack protector just busy looped
in __stack_chk_fail in order to reduce the amount of code being
executed after the stack has been compromised because of a lack of
firmware memory protections. With future firmware implementations
incorporating memory protections such as W^X, call in to boot services
when an error occurs in order to log a message to the console before
automatically rebooting the machine.

Signed-off-by: Chris Coulson <chris.coulson@canonical.com>
Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
2022-04-04 22:26:31 +02:00
Hans de Goede
3e4cbbeca0 kern/main: Suppress the "Welcome to GRUB!" message in EFI builds
GRUB EFI builds are now often used in combination with flicker-free
boot, but this breaks with upstream GRUB because the "Welcome to GRUB!"
message will kick the EFI fb into text mode and show the msg, breaking
the flicker-free experience.

EFI systems are so fast, that when the menu or the countdown are
enabled the message will be immediately overwritten, so in these cases
not printing the message does not matter.

And in case when the timeout_style is set to TIMEOUT_STYLE_HIDDEN,
the user has asked GRUB to be quiet (for example to allow flickfree
boot) and thus the message should not be printed.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Robbie Harwood <rharwood@redhat.com>
Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
2022-04-04 17:23:16 +02:00