Standing on Bare Metal

The machine with no floor

When you write a normal C++ program, you take an enormous amount of infrastructure for granted. Your compiler assumes there is an OS that will load your binary, set up your stack, initialize your heap, and hand you argc and argv. Your standard library assumes it can make system calls to allocate memory, write to files, and exit gracefully. Your linker assumes it is building something that fits into a well-defined executable format understood by a loader.

When you write an operating system, none of this exists. You are the thing everything else depends on.

This is the fundamental distinction between a hosted environment and a freestanding environment:

Hosted: The program runs on top of an OS. It has access to the full standard library, a runtime that calls main() after initialization, and system calls for I/O, memory, and process management.
Freestanding: The program runs directly on the hardware (or on a bootloader’s handoff). There is no standard library, no runtime initialization, no system calls. The language standard guarantees almost nothing - only a handful of headers like <stddef.h>, <stdint.h>, <float.h>, and <limits.h> are required to be available.

The first step in building an operating system is accepting this, and building a toolchain that works within these constraints.

Why you need a cross-compiler

You might think you can just use the compiler installed on your Linux or macOS system. After all, g++ compiles C++ - what is the problem?

The problem is that your system compiler is configured for your host platform. It assumes:

A specific target triple (e.g., x86_64-linux-gnu), which encodes the OS the binary will run on
That the generated code will be linked against your system’s libc (glibc, musl, etc.)
That system headers from /usr/include are valid
That the binary will be loaded by a dynamic linker (ld-linux.so)

None of these assumptions hold for kernel code. If you try to build a freestanding kernel binary with your system compiler, you will get link errors from glibc dependencies, incorrect ABI assumptions, and subtle bugs from compiler-generated code that relies on OS features (like thread-local storage) that do not exist yet.

The solution is a cross-compiler: a compiler built from source, configured to target a generic, OS-independent architecture. For x86-64 OS development, the standard target triple is x86_64-elf - no OS, no libc, just raw ELF binaries for the x86-64 instruction set.

Building the toolchain

A complete cross-compilation toolchain consists of three components:

Binutils - the assembler (as), linker (ld), and binary utilities (objdump, readelf, nm)
GCC - the cross-compiler itself, configured for the x86_64-elf target
GDB (optional) - a cross-debugger for stepping through kernel code in QEMU

The OSDev Wiki’s GCC Cross-Compiler guide covers the build process in detail. The key configuration flags when building GCC are:

../gcc-source/configure \
    --target=x86_64-elf \
    --prefix="$PREFIX" \
    --disable-nls \
    --enable-languages=c,c++ \
    --without-headers

The critical flags:

--target=x86_64-elf tells GCC it is generating code for a bare-metal x86-64 target
--without-headers means the compiler will not look for system headers - there is no OS to provide them
--disable-nls skips native language support (irrelevant for our purposes)

After building, you will have x86_64-elf-gcc, x86_64-elf-g++, x86_64-elf-ld, and friends. These are the only compilers you should use for kernel code.

Note: You will eventually need a second toolchain for userspace - one that does know about your OS and links against your custom libc. We will cover that when we get to the userspace chapters. For now, the freestanding toolchain is all you need.

The build system

Once you have a cross-compiler, you need a way to orchestrate compilation. An OS kernel is not a single source file. Even a minimal one involves:

Assembly files for the boot entry point and interrupt stubs
C/C++ source for the kernel logic
Linker scripts that control the memory layout of the final binary
Configuration flags that vary between debug and release builds
Third-party libraries (like uACPI) that have their own build requirements
Eventually, userspace programs with a completely different compilation model

Managing this with raw Makefiles works for small projects but becomes unsustainable as complexity grows. We use CMake as a meta-build system - it generates the actual build files (Makefiles or Ninja files) from a higher-level description.

Toolchain files

CMake supports cross-compilation through toolchain files. These are small CMake scripts that tell the build system to use your cross-compiler instead of the host compiler:

set(CMAKE_SYSTEM_NAME Generic)
set(CMAKE_SYSTEM_PROCESSOR x86_64)

set(CMAKE_C_COMPILER   ${TOOLCHAIN_PATH}/bin/x86_64-elf-gcc)
set(CMAKE_CXX_COMPILER ${TOOLCHAIN_PATH}/bin/x86_64-elf-g++)
set(CMAKE_ASM_COMPILER ${TOOLCHAIN_PATH}/bin/x86_64-elf-gcc)

set(CMAKE_FIND_ROOT_PATH_MODE_PROGRAM NEVER)
set(CMAKE_FIND_ROOT_PATH_MODE_LIBRARY ONLY)
set(CMAKE_FIND_ROOT_PATH_MODE_INCLUDE ONLY)

Two things matter here:

CMAKE_SYSTEM_NAME Generic - this tells CMake the target is not a known OS. Without this, CMake will try to find host system libraries and headers, which will either fail or produce silently broken builds.
The FIND_ROOT_PATH modes - these prevent CMake from searching the host system for libraries and headers when resolving dependencies.

This architecture is powerful. If you later want to target AArch64, you swap the toolchain file. The rest of your CMakeLists.txt files stay the same.

Compiler flags for kernel code

Compiling kernel code requires a specific set of flags that most application developers never encounter:

set(KERNEL_CXX_FLAGS
    -ffreestanding          # No hosted standard library
    -fno-exceptions         # C++ exceptions require runtime support we don't have
    -fno-rtti               # Runtime type information requires a runtime
    -fno-stack-protector    # Stack canaries require __stack_chk_fail from libc
    -mno-red-zone           # Critical on x86-64 - see below
    -mcmodel=kernel         # Use the kernel code model for higher-half addressing
    -mno-sse                # Disable SSE unless explicitly managed
    -mno-mmx                # Same for MMX
    -mno-avx                # Same for AVX
    -nostdlib               # Don't link against the standard library
)

Most of these are self-explanatory, but two deserve deeper explanation:

-mno-red-zone: On x86-64, the System V ABI defines a 128-byte “red zone” below the stack pointer. Leaf functions (functions that don’t call other functions) can use this space without adjusting the stack pointer - it is an optimization that avoids the overhead of sub rsp, N / add rsp, N for small local variables. The problem is that in kernel mode, interrupts can fire at any time. When an interrupt occurs, the CPU pushes the interrupt frame onto the current stack, starting at the current stack pointer. If a function was using the red zone, the interrupt frame overwrites it, corrupting local variables. The resulting bugs are extraordinarily difficult to diagnose - they manifest as rare, seemingly random memory corruption that depends on exact interrupt timing. Disable the red zone. Always.

-mcmodel=kernel: By default, GCC generates code that assumes all symbols are within the lower 2 GiB of the virtual address space (the small code model). A higher-half kernel is mapped far above that - typically at addresses starting with 0xFFFF8000_00000000. The kernel code model tells GCC that symbols reside in the upper 2 GiB of the 64-bit address space, enabling it to use the correct addressing modes.

Configuration management

As your kernel grows, you will accumulate configuration options: debug logging levels, feature flags for experimental subsystems, scheduler policy selections, memory allocator parameters. If these are scattered across #defines in random header files, you will eventually lose track.

We use a YAML configuration file as the single source of truth. A generation step reads this file and produces three synchronized artifacts:

A C++ header with compile-time constants
CMake variables that control which source files and features are included in the build
Shell script variables for automation and deployment scripts

This pattern - define once, generate everywhere - prevents the drift that inevitably occurs when the same flag is defined in multiple places.

Build profiles

You need at least two build profiles:

Debug: Optimizations disabled (-O0 or -Og), debug symbols enabled (-g), sanitizers and assertions active. The generated code closely matches your source, making GDB stepping predictable.
Release: Optimizations enabled (-O2), but with care. Some optimizations that are perfectly safe for userspace code can cause problems in kernel mode. Automatic vectorization (-ftree-vectorize) can generate SSE/AVX instructions in code paths where the FPU state has not been saved, leading to corruption. Be explicit about what you enable.

Emulation

You need a fast feedback loop. Flashing a kernel image to a USB drive, rebooting a physical machine, waiting for POST, and observing the result is a valid testing strategy, but doing it for every code change will destroy your productivity.

QEMU is the standard tool for OS development. It emulates a complete x86-64 machine, including a BIOS/UEFI firmware, multiple CPU cores, RAM, disk controllers, a PCI bus, and peripherals. You can boot your kernel image in QEMU the same way it would boot on real hardware, but the cycle time drops from minutes to seconds.

A typical invocation:

qemu-system-x86_64 \
    -kernel build/alkos.elf \
    -m 256M \
    -serial stdio \
    -no-reboot \
    -d int,cpu_reset \
    -D qemu.log

Key flags:

-serial stdio redirects the emulated serial port to your terminal - essential for kernel debug output before you have a working framebuffer driver
-no-reboot makes QEMU exit on triple fault instead of rebooting, so you can see what went wrong
-d int,cpu_reset logs interrupts and CPU resets to a file, invaluable for debugging early boot issues

Debugging with GDB

QEMU has built-in GDB stub support. Start QEMU with -s -S (listen for GDB on port 1234, pause CPU at start) and connect from your cross-debugger:

# Terminal 1
qemu-system-x86_64 -s -S -kernel build/alkos.elf ...

# Terminal 2
x86_64-elf-gdb build/alkos.elf
(gdb) target remote :1234
(gdb) break kernel_main
(gdb) continue

You can now set breakpoints, inspect memory, step through assembly, and examine register state. This is the closest thing to running your kernel under a debugger that bare-metal development offers, and it is remarkably effective.

Tip: Write a qemu-debug.sh script early. You will use it hundreds of times. Automate everything: building the kernel, creating the disk image, launching QEMU with the right flags, and optionally connecting GDB.

Your own standard library

Here is something that catches most beginners off guard: your kernel needs a C standard library. Not the full thing - not printf to stdout or fopen or malloc (those all require an OS) - but the subset that does not depend on the OS. String manipulation (memcpy, memset, strlen), formatted output to a buffer (snprintf), character classification (isdigit, isalpha), and type-safe integer limits.

You have two options:

Use an existing library

Projects like Newlib and mlibc are designed to be portable across operating systems. You implement a small set of backend functions (syscall stubs for write, sbrk, etc.), and the library provides everything else. This gets you up and running fast.

Build your own

We chose this path for AlkOS. It is more work, but it gives you complete control over memory usage, performance, and behavior. Every function does exactly what you expect, because you wrote it. When something goes wrong in snprintf, you can read your implementation instead of reverse-engineering a third-party library.

The critical insight is that you actually need two variants of the standard library:

libk - the kernel variant. Compiled for the freestanding environment. No system calls, no dynamic memory allocation (until the heap is initialized). Functions like kprintf talk directly to the serial port or framebuffer instead of going through a file descriptor abstraction.
libc - the userspace variant. Compiled for the hosted environment. System calls for I/O and memory. A proper main() entry point with CRT initialization.

This split prevents kernel code from accidentally calling functions that require an OS - a class of bug that is subtle, dangerous, and easy to introduce.

The CRT initialization sequence

User programs do not start at main(). They start at _start, which is provided by the C runtime startup code (crt0.o). The initialization sequence looks like this:

_start (from crt0.o) - aligns the stack, sets up stack protection, prepares argc/argv
.init section prologue (from crti.o) - begins the initialization frame
Global constructors (from crtbegin.o, provided by GCC) - calls C++ global constructors
main() - your code
exit() - triggers cleanup
Global destructors (from crtend.o) - calls C++ global destructors
.fini section epilogue (from crtn.o) - ends the finalization frame
Process termination syscall - tells the kernel to clean up

If you get the link order wrong, global constructors silently don’t run, and any C++ code using static initialization will fail in baffling ways. This is one of those problems that can cost you days if you don’t know what to look for.

The boot chain

With the toolchain, build system, and standard library in place, you have everything you need to start writing actual kernel code. But before you do, it is worth understanding what happens before your code even executes - the boot chain that brings a powered-off machine to the point where it can run your kernel.

Power-on

When a modern x86-64 machine powers on, the CPU does not immediately start executing your code. The process involves multiple stages:

Power sequencing: Voltage rails are brought up in a precise order. The CPU is literally non-functional until its power planes stabilize.
Platform initialization: A co-processor (Intel Management Engine or AMD Platform Security Processor) initializes critical hardware - DRAM training, clock stabilization, PCI bus enumeration - before the main CPU executes its first instruction.
Firmware: The CPU wakes at the reset vector and executes system firmware (UEFI or legacy BIOS). The firmware provides basic hardware abstraction and a mechanism to load a bootloader from disk.
Bootloader: A program like GRUB or Limine loads your kernel file from the disk into RAM and transfers control to it. We use Limine - it provides a clean, well-documented boot protocol that gives the kernel a predictable initial state.
OS trampoline: Your own architecture-specific initialization code. This is the first code you control. It takes the machine from the bootloader’s state to the state your kernel expects - enabling paging, setting up a higher-half mapping, enabling required CPU features.
Kernel main: Architecture-agnostic C++ code. From here on, you are in your operating system.

The key insight is that by the time your kernel starts, it is the fifth or sixth program to run on the machine. Every layer in this chain trusts the one before it and normalizes the state for the one after it.

Freestanding vs. hosted, again

This chain illustrates why the freestanding/hosted distinction matters so deeply. Your kernel’s trampoline code runs in an environment where:

There is no heap - new and malloc do not exist
There is no stack beyond what the bootloader set up - and you probably want to set up your own
Interrupts may or may not be enabled, depending on the bootloader
The CPU may be in 32-bit Protected Mode (if using Multiboot) or 64-bit Long Mode (if using Limine)
Virtual memory may or may not be active

Your first job is to bring order to this chaos. Set up your own stack. Map the kernel to its intended virtual addresses. Initialize the GDT and IDT. Enable the features you need. Only then can you call a C++ function with any confidence that it will behave correctly.

Putting it together

Let’s summarize what you need before writing kernel code:

Component	Purpose	Tool
Cross-compiler	Compile code for bare-metal x86-64	`x86_64-elf-gcc` / `g++`
Assembler & linker	Assemble boot stubs, link kernel binary	`x86_64-elf-as` / `ld`
Meta-build system	Orchestrate the build	CMake
Toolchain file	Tell CMake to use the cross-compiler	`.cmake` config
Emulator	Fast testing cycle	QEMU
Debugger	Step through kernel code	GDB + QEMU stub
Standard library	String ops, formatted output, type limits	Custom libk
Bootloader	Load kernel from disk, normalize state	Limine

This might seem like a lot of setup before you write any actual OS code. It is. And it is worth every minute. The alternative is discovering, three months in, that your compiler was silently generating code that uses the red zone, and that is why your kernel crashes under load once every thousand interrupts. Or that your build system cannot handle a second architecture. Or that you have no way to debug a page fault handler because you never set up GDB integration.

The foundation matters. Get it right, and everything that follows is hard but tractable. Get it wrong, and you will fight your tools instead of solving interesting problems.

Next up: We enter the kernel itself. We will implement a bootloader trampoline, set up the GDT and IDT, and get our first characters onto the screen. The machine is ours.