Xuantie/T-Head processors such as the C910 (as used in the Sipeed
Lichee Pi 4A) use the high bits of the PTE in a very non-standard way
that is incompatible with the RISC-V specification.
As per the "Memory Attribute Extension (XTheadMae)", bits 62 and 61
represent cacheability and "bufferability" (write-back cacheability)
respectively. If we do not enable these bits, then the processor gets
incredibly confused at the point that paging is enabled. The symptom
is that cache lines will occasionally fail to fill, and so reads from
any address may return unrelated data from a previously read cache
line for a different address.
Work around these hardware flaws by detecting T-Head CPUs (via the
"get machine vendor ID" SBI call), then reading the vendor-specific
SXSTATUS register to determine whether or not the vendor-specific
Memory Attribute Extension has been enabled by the M-mode firmware.
If it has, then set bits 61 and 62 in each page table entry that is
used to access normal memory.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add a fence between the write to the UART transmit register and the
subsequent read from the transmit status register, to ensure that the
status correctly reflects the occurrence of the write.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The RISC-V specification states that "if SATP is written with an
unsupported mode, the entire write has no effect; no fields in SATP
are modified". We currently rely on this specified behaviour when
calculating the early UART base address: if SATP has a non-zero value
then we assume that paging must be enabled.
The XuanTie C910 CPU (as used in the Lichee Pi 4A) does not conform to
this specified behaviour. Writing SATP with an unsupported mode will
leave SATP.MODE as zero (i.e. bare physical addressing) but the write
to SATP.PPN will still take effect, leaving SATP with an illegal
non-zero value.
Work around this misbehaviour by explicitly writing zero to SATP if we
detect that the mode change has not taken effect (e.g. because the CPU
does not support the requested paging mode).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
We currently rely on the recursive nature of devicetree bus probing to
obtain the region cell size specification from the parent device.
This blocks the possibility of creating a standalone console device
based on /chosen/stdout-path before probing the whole bus.
Fix by using fdt_parent() to locate the parent device at the point of
use within dt_ioremap().
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some platforms (such as the Sipeed Lichee Pi 4A) choose to make early
debugging entertainingly cumbersome for the programmer. These
platforms not only fail to provide a functional SBI debug console, but
also choose to place the UART at a physical address that cannot be
identity-mapped under the only paging model supported by the CPU.
Support such platforms by creating a virtual address mapping for the
early UART (in the 2MB megapage immediately below iPXE itself), and
using this as the UART base address whenever paging is enabled.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some platforms (such as the Sipeed Lichee Pi 4A) do not provide a
functional SBI debug console. We can obtain early debug messages on
these systems by writing directly to the UART used by the vendor
firmware.
There is no viable way to parse the UART address from the device tree,
since the prefix debug messages occur extremely early, before the C
runtime environment is available and therefore before any information
has been parsed from the device tree. The early UART model and
register addresses must be configured by editing config/serial.h if
needed. (This is an acceptable limitation, since prefix debugging is
an extremely specialised use case.)
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Abstract out the SBI debug console calls into macros that can be
shared between print_message and print_hex_value.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The riscv,isa devicetree property appears not to be fully populated on
some real-world systems. For example, the Sipeed Lichee Pi 4A
(running the vendor U-Boot) reports itself as "rv64imafdcvsu", which
does not include the "zicntr" extension even though the time CSR is
present and functional.
Ignore the riscv,isa property and rely solely on CSR testing to
determine whether or not extensions are present.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
With the 64-bit paging schemes (Sv39, Sv48, and Sv57), we identity-map
as much of the physical address space as is possible. Experimentation
shows that this is not sufficient to provide access to all I/O
devices. For example: the Sipeed Lichee Pi 4A includes a CPU that
supports only Sv39, but places I/O devices at the top of a 40-bit
address space.
Add support for creating I/O page table entries on demand to map I/O
devices, based on the existing design used for x86_64 BIOS.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Support debug consoles that do not automatically convert LF to CRLF by
including the CR character within the debug message strings.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Provide DBGC_MEMMAP() as a replacement for memmap_dump(), allowing the
colour used to match other messages within the same message group.
Retain a dedicated colour for output from memmap_dump_all(), on the
basis that it is generally most useful to visually compare full memory
dumps against previous full memory dumps.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Use the terminology "min" and "max" for addresses covered by a memory
region descriptor, since this is sufficiently intuitive to generally
not require further explanation.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Use the shared initrd reshuffling and CPIO header construction code
for RISC-V bare-metal kernels. This allows for files to be injected
into the constructed ("magic") initrd image in exactly the same way as
is done for bzImage and UEFI kernels.
We append a dummy image encompassing the FDT to the end of the
reshuffle list, so that it ends up directly following the constructed
initrd in memory (but excluded from the initrd length, which was
recorded before constructing the FDT).
We also temporarily prepend the kernel binary itself to the reshuffle
list. This is guaranteed to be safe (since reshuffling is designed to
be unable to fail), and avoids the requirement for the kernel segment
to be available before reshuffling. This is useful since current
RISC-V bare-metal kernels tend to be distributed as EFI zboot images,
which require large temporary allocations from the external heap for
the intermediate images created during archive extraction.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Any initrd images that are not within the external heap (e.g. embedded
images) do not need to be copied to the external heap for reshuffling,
and can just be left in their original locations.
Ignore any images that are not already within the external heap (or,
more precisely, that are wholly outside of the reshuffle region within
the external heap) when squashing and swapping images.
This reduces the maximum additional storage required by squashing and
swapping to zero, and so ensures that the reshuffling step is
guaranteed to succeed under all circumstances. (This is unrelated to
the post-reshuffle load region check, which is still required.)
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Provide a reusable function initrd_load_all() to load all initrds
(including any constructed CPIO headers) into a contiguous memory
region, and support functions to find the constructed total length and
permissible post-reshuffling load address range.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
It is hypothetically possible for external heap memory allocated
during driver startup to have been freed before an image was
downloaded, which could therefore leave an image straddling the
address recorded as the top of the reshuffle region.
Allow for this possibility by skipping squashing for any images
already straddling (or touching) the top of the reshuffle region.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Alignment of initrd lengths is applicable to all Linux kernels, not
just those in the x86 bzImage format.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Eliminate the requirement for free space when reshuffling initrds by
swapping adjacent initrds using an in-place triple reversal.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
We currently rely on implicit detection of the external heap region.
The INT 15 memory map mangler relies on examining the corresponding
in-use memory region, and the initrd reshuffler relies on performing a
separate detection of the largest free memory block after startup has
completed.
Replace these with explicit public symbols to describe the external
heap region.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
If the external heap ends up at the top of the system memory map then
leave a gap after the heap to ensure that no block ends up being
allocated with either a start or end address of zero, since this is
frequently confusing to both code and humans.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow for relocation to a region at the very end of the physical
address space (where the next address wraps to zero).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Use the word-at-a-time variable-length memcpy() implementation when
performing an overlapping copy in the forwards direction, since this
is guaranteed to be safe and likely to be substantially faster than
the existing bytewise copy.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow a single initrd image to be passed verbatim to the booted RISC-V
kernel, as a proof of concept.
We do not yet support reshuffling to make optimal use of available
memory, or dynamic construction of CPIO headers, but this is
sufficient to allow iPXE to start up the Fedora 42 kernel with its
matching initrd image.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Allow an initrd location to be specified in our constructed device
tree via the "linux,initrd-start" and "linux,initrd-end" properties.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
There is nothing x86-specific in initrd.c, and a variant of the
reshuffling logic will be required for executing bare-metal kernels on
RISC-V and AArch64.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Use image_replace() to transfer execution to the extracted image,
rather than calling image_exec() directly. This allows the original
archive image to be freed immediately if it was marked as an
automatically freeable image (e.g. via "chain --autofree").
In particular, this ensures that in the case of an archive image
containing another archive image (such as an EFI zboot kernel wrapper
image containing a gzip-compressed kernel image), the intermediate
extracted image will be freed as early as possible, since extracted
images are always marked as automatically freeable.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Current RISC-V and AArch64 kernels found in the wild tend not to be in
the documented kernel format, but are instead "EFI zboot" kernels
comprising a small EFI executable that decompresses and executes the
inner payload (which is a kernel in the expected format).
The EFI zboot header includes a recognisable magic value "zimg" along
with two fields describing the offset and length of the compressed
payload. We can therefore treat this as an archive image format,
extracting the payload as-is and then relying on our existing ability
to execute compressed images.
This is sufficient to allow iPXE to execute the Fedora 42 RISC-V
kernel binary as currently published.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
The RISC-V and AArch64 bare-metal kernel images share a common header
format, and require essentially the same execution environment: loaded
close to the start of RAM, entered with paging disabled, and passed a
pointer to a flattened device tree that describes the hardware and any
boot arguments.
Implement basic support for executing bare-metal RISC-V and AArch64
kernel images. The (trivial) AArch64-specific code path is untested
since we do not yet have the ability to build for any bare-metal
AArch64 platforms. Constructing and passing an initramfs image is not
yet supported.
Rename the IMAGE_BZIMAGE build configuration option to IMAGE_LKRN,
since "bzImage" is specific to x86. To retain backwards compatibility
with existing local build configurations, we leave IMAGE_BZIMAGE as
the enabled option in config/default/pcbios.h and treat IMAGE_LKRN as
a synonym for IMAGE_BZIMAGE when building for x86 BIOS.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Add an implementation of umalloc() using the generalised model of a
heap, placing the external heap in the largest usable region obtained
from the system memory map.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Size-tracked pointers allocated via umalloc() have historically been
aligned to a page boundary, as have the edges of the hidden memory
region covering the external heap.
Allow the block and size-tracked pointer alignments to be specified as
heap configuration parameters.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Create a generic model of a heap as a list of free blocks with
optional methods for growing and shrinking the heap.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
All memory map users have been updated to use the new system memory
map API. Remove get_memmap() and its associated definitions.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
There are several places where get_memmap() is called solely to
produce debug output. Replace these with calls to memmap_dump_all()
(which will be a no-op unless debugging is enabled).
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Use the concept of an in-use memory region defined as part of the
system memory map API to describe the umalloc() heap.
Signed-off-by: Michael Brown <mcb30@ipxe.org>