Projects

How to Build a UNIX-like Operating System

Parker Carlson | 10 Feb. 2023

For OSU's CS 444: Operating Systems II class, I built a basic Unix-like operating system. This follows MIT's JOS curriculum, with some tweaks inspiried by UW and Georgia Tech. In this post, I'll break down the core aspects of this operating system, as well as key code snippets.

To future OS II students: As I wrote this article, I realized that it has become a sort of "everything that I wish I knew while taking OS II."" If you find this page as a future OS II student, welcome! I hope this information is useful, but please do not copy the code.


The Bootloader

Note: Code for the bootloader and the kernel's boot functionality was provided. In this section I provide a brief technical overview of the bootloading process.

When they first boot, x86 CPUs load 512 bytes from 0x7c00 to 0x7dff into memory, then jmp to 0x7c00, transferring control to the user. In these 512 bytes, we switch into 32-bit protected mode (allowing access to more than 1MB of RAM via paging), call a C function which loads the kernel into memory from an ELF binary, and then transfers execution to the kernel.

Once in the kernel, we begin execution in Assembly again. First, we load a static page directory and page table with two pages: one mapping from kernel addresses to physical addresses ([KERNBASE, KERNBASE+4MB) -> [0,4MB)), as well as an identity page mapping ([0,4MB) -> [0,4MB)). This identity mapping allows us to continue running Assembly instructions after loading the page directory/page table until we can jmp to C code, which is linked to run at KERNBASE (0xf0000000). Kernels are often designed to run at high virtual memory addresses to avoid potential memory collisions with user environments.

Finally, we are in the kernel and running in C. We finish loading the ELF binary by setting the BSS memory to zero, and initialize the console. For the first lab, I implemented mon_backtrace(), a console command which provides a stack backtrace. This begins with the currently running function (again, mon_backtrace()) and prints the stack frame of every function call until the stack base pointer %EBP is 0, representing the end of the stack.

Key Files:

boot/boot.S     // initial entry point, switch to protected mode, call boot/main.c
boot/main.c     // load the kernel from ELF, transfer control to kern/entry.S
kern/entry.S    // load static pagedir, enable virtual memory and transfer control to kern/init.c
kern/init.c     // finish ELF loading, initialize the console, handle errors
kern/monitor.c  // various console functions, including mon_backtrace() at line 72

[Code]


Memory Management

In this section, I implement the operating system's memory management model. First, I develop tools to allocate and free 4 kB sections of physical memory called pages. Then, I develop the kernel's virtual memory system, which allows the kernel to translate user memory addresses to physical memory addresses, using the x86 processor's Memory Management Unit (MMU).

Physical Memory Management

The code in this section is fairly simple, but sets up an important foundation upon which we build the OS' virtual memory system. First, I implement boot_alloc(), a function which allocates physical memory above the kernel's code (technically, above .bss). This allows me to allocate memory while we are setting up the page directory and page tables. It is straightforward; it returns a pointer to the next 4kB chunk of free physical memory, tracked by a static pointer.

Then, I begin implementing the physical page management system. In page_init(), I create a list of all free pages, and mark reserved pages (kernel, memory mapped IO, and boot_alloc()'d pages) as in-use by incrementing the page reference counter. page_alloc() returns the next empty physical page, and optionally sets all of the memory to 0. Finally, page_free() returns unused pages to the list of unused pages.

Key Functions (in kern/pmap.c):

 87 | boot_alloc()  // allocate memory until virtual memory system is fully set up
264 | page_init()   // initialize physical page tracking system
316 | page_alloc()  // allocate a new physical page
342 | page_free()   // free a physical page
Virtual Memory Management

Virtual memory is divided into multiple layers. Recall, a page represents 4kB of physical memory. A page table is a look-up table that stores the addresses of pages. A page table is stored on one physical page. Given that x86 addresses are 32 bits, a page table stores the addresses of 1024 pages. One level higher, a page directory is a page table of page tables. Also stored on a page, a page directory stores the addresses of (up to) 1024 page tables, giving a page directory access to 1024^2 pages, or 4 kB * 1024^2 = 4 GB of memory, the maximum accessible with a 32-bit address system. The motivation for using paging versus other memory management models is beyond this blog post (See here).

Page Directory Entries (PDEs) and Page Table Entries (PTEs) are both 4 bytes. These include 20 bits of an address, and 10 bits for managing permissions. Because pages (and thus page tables) are 4 kB, and are aligned to 4 kB sections of memory, we don't need the log2(4096) = 12 lower bits of an address. Thus, 20 bits is a sufficient address for page tables and pages. Likewise, because PDEs and PTEs are 4 bytes, we only need 10 bits to access every PDE or PTE in a page directory (or table), because the last two bits will always be zero. Finally, we have enough information to implement our paging system

Given a 32-bit virtual address (VA), we divide it into three parts. The 10 highest bits access the page directory, the next 10 bits access the page table, and the final 12 bits access the physical page. The location of the page directory is stored in the CR3 register, and our first 10 VA bits are used as an offset to select a PDE. From that PDE, we take its 20 bit page table address, and combine it with the next 10 bits of our VA to select a PTE. Finally, we take the 20 bit page address and combine it with the final 12 bits of our VA to access our desired page with byte-level granularity. This process is called address translation.

Above: Diagram of the x86 address translation process.

Once you understand the details of address translation, implementing it is fairly easy. Unfortunately, nobody truly understands the details of address translation until they have implemented it themselves. This means that your first time implementing address translation will probably be excruciatingly painful, like it was for me. Because all of the address translation functions call each other, an error in one breaks all of them, leading to bugs which are very challenging to find and fix.

Q: Is address translation handled by the OS, or by the CPU's MMU (Memory Management Unit)?

A: The MMU, kinda*. After paging is enabled by the bootloader, every address the CPU encounters is ran through the MMU. The MMU checks the Translation Lookaside Buffer (TLB), a cache of previously translated addresses. If the address is found, the CPU accesses the physical address and completes the instruction. If the address is not in the TLB, the MMU starts a "page table walk", where it completes address translation as specified above.

*Q: If the MMU does address translation, then why do we need to implement it in the OS?

A: After a TLB miss, the MMU accesses the current page directory stored in the CR3 register. If the bits in the page directory are random, the translation will be meaningless. It is the OS' job to set up and maintain an accurate page directory in CR3, which is why we need to initialize the page directory and page tables. Please read this to understand fully.

Key Functions (in kern/pmap.c):

138 | mem_init()         // create and initialize page directory; initialize kernel memory mappings
387 | pgdir_walk()       // search the page directory for a PTE given a virtual address
435 | boot_map_region()  // allocate and initialize static kernel memory mappings
470 | page_insert()      // insert a new mapping from a virtual address to physical memory
504 | page_lookup()      // lookup a physical page from a given virtual address
538 | page_remove()      // unmap the physical page from a given virtual address. Invalidate the TLB if needed.

[Code]


User Environments & Exception Handling

In this section, I add code to enable loading and running user programs through the operating system. This involves creating protected control transfers between the user environment and operating system, as well as handling exceptions such as page faults.

User Environments

To run a user program, we setup a virtual environment for the program to run in and then tranfer control to this program. First, I create a high level function to create an environment (process) for a user program, env_create(). This allocates memory for the virtual environment, sets it up, then loads in the user program from the ELF binary. These tasks are also further broken down into env_alloc(), env_setup_vm() and lode_icode() respectively. Recall that a trap frame stores the CPU's current registers and is created by the operating system when it recieves an interrupt. By modifying registers in a trap frame, we can set up processes from the operating system so they are ready to run.

In env_alloc(), I allocate memory for the environment, generate a process ID, set up various utility variables (parent process, permissions, ect.), and set up the initial register values via the environment's trapframe. Then, in env_setup_vm(), I initialize the new process' page directory, copy the kernel's page directory mappings, and set the process' CR3 register to point to this page directory, through the trap frame again. Finally, in load_icode(), I load the compiled ELF binary, verify that it's correct, allocate a stack, and then store the program's entry point in the process' EIP register in the trap frame. Finally, we are ready to run the environment with env_run(), which marks the process as running, load its page directory, then loads its registers from the trap frame, which also sets the instruction pointer to the process' entry point, officially starting the process.

Some small functions were omitted from the above overview, such as code to clean up an environment once it has finished running.

Key Functions (in kern/env.c):

164 | env_setup_vm()     // create and initialize process page directory
217 | env_alloc()        // allocate memory for environment, generate process ID, initialize trap frame
332 | load_icode()       // load an ELF binary, set trap frame EIP to program entry point
416 | env_create()       // allocate and set up environment, then load program into environment
527 | env_run()          // mark process as running and transfer control to process
Exception Handling

Technical details coming soon...

[Code]


Multitasking & Scheduling

Technical details coming soon...

[Code]