SStudy of Firecracker MicroVM
Madhur Jain
Khoury College of Computer SciencesNortheastern University
Boston, [email protected]
Abstract —Firecracker is a virtualization technology that makesuse of Kernel Virtual Machine (KVM). Firecracker belongs toa new virtualization class named the micro-virtual machines(MicroVMs). Using Firecracker, we can launch lightweight Mi-croVMs in non-virtualized environments in a fraction of a second,at the same time offering the security and workload isolationprovided by traditional VMs and also the resource efficiency thatcomes along with containers [1]. Firecracker aims to provide aslimmed-down MicroVM, comprised of approximately 50K linesof code in Rust and with a reduced attack surface for guestVMs. This report will examine the internals of Firecracker andunderstand why Firecracker is the next big thing going forwardin virtualization and cloud computing.
Index Terms —firecracker, microvm, Rust, VMM, QEMU,KVM
I. I
NTRODUCTION
Firecracker is a new open source Virtual Machine Monitor(VMM) developed by AWS for serverless workloads. It is avirtualization technology which makes use of KVM, meaningit can only be run on KVM-supported and enabled hosts anda corresponding Linux Kernel v4.14+. With the recommendedLinux kernel guest configuration, Firecracker claims to offera memory overhead of less than 5MB per container, boots toapplication code within 125ms, and allows for the creation ofup to 150 MicroVMs per second. The number of FirecrackerMicroVMs running simultaneously on a single host is onlylimited by the availability of hardware resources. [1]Firecracker provides security and resource isolation by run-ning the Firecracker userpsace process inside a jailer process.The jailer sets up system resources that require elevatedpermissions (e.g., cgroup, chroot), drops privileges, and thenexec()s into the Firecracker binary, which then runs as anunprivileged process. Past this point, Firecracker can onlyaccess resources to which a privileged third-party grants access(e.g., by copying a file into the chroot, or passing a filedescriptor). [13]Seccomp filters limit the system calls that the Firecrackerprocess can use. There are 3 possible levels of seccompfiltering, configurable by passing a command line argumentto the jailer: 0 (disabled), 1 (whitelists a set of trusted systemcalls by their identifiers) and 2 (whitelists a set of trustedsystem calls with trusted parameter values), the latter beingthe most restrictive and the recommended one. The filtersare loaded in the Firecracker process, immediately before theexecution of the untrusted guest code starts. [13] Firecracker was developed to handle serverless workloadsand has been running in production for AWS Lambda andFargate since 2018. Serverless workloads require isolationand security, and at the same time, have container benefitsof faster boot time. A lot of research has been done withrespect to the cold-start and warm-start of the VM instancesfor serverless workloads. [17] The idea behind developingFirecracker was to make use of the KVM module loaded inthe Linux kernel and get rid of the legacy devices that othervirtualization technologies like Xen and VMWare offer. Thisway, Firecracker can create a VMM with a smaller memoryfootprint and also provide improved performance.Section 2 explains the high-level architecture of the Fire-cracker Micro VM, Section 3 dives deep into the boot se-quence of the Firecracker, Section 4 describes the devicemodel emulation provided, and Section 5 shares some lighton the conclusions garnered through this study.II. A
RCHITECTURE OF F IRECRACKER
KVM is an enabler of hardware extensions provided byvendors such as Intel and AMD with their virtualizationextensions such as SVM and VMX. These extensions allowKVM to directly execute the guest code on the host CPU.There are three sets of ioctls that make up the KVM API andare issued to control the various aspects of the virtual machine.The three classes that the iocltls belongs to are [7] - • System IOCTLs : These query and set global attributes,which affect the whole KVM subsystem. In addition, asystem ioctl is used to create virtual machines. • VM IOCTLs : These query and set attributes that affectan entire virtual machinefor example, memory layout.In addition, a VM ioctl is used to create virtual CPUs(vCPUs). VM ioctls are run from the same userspaceprocess (address space) that was used to create the VM. • vCPU IOCTLs : These query and set attributes thatcontrol the operation of a single virtual CPU. They runvCPU ioctls from the same thread that was used to createthe vCPU.QEMU is an open source machine emulator and a virtualiza-tion technique to run KVM-enabled Virtual Machines. Similarto QEMU, Firecracker uses a multithreaded event-driven ar-chitecture. Each Firecracker process runs one and only oneMicroVM. The process consists of three main threads: API,VMM and vCPU . The API server thread is an HTTP serverthat is created to accept requests and performs actions on those a r X i v : . [ c s . O S ] M a y ig. 1. Firecracker Design - Threads requests. The API server thread structure comprises of a socketconnection to listen to requests on the port, epoll fd to listen toevents on the socket port, and a hashmap of connections. Thehashmap consists of token and connection pairs correspondingto the file descriptor of the streamin this case, the socket.Traditionally, QEMU has made use of select or poll systemcalls to maintain an event loop of file descriptors on whichto listen for new events. [16] select or poll system callsrequires a list of all open file descriptors maintained in theVMM structure and then it would go through each of the filedescriptors to determine which FDs have new events. Thiswould take up O(N) time where N is the number of filedescriptors to listen for new events. [14]Firecracker takes the epoll approach where the host kernelmaintains a list of file descriptors for VM process and notifiesthe VMM whenever there is a new event that occurs in anyof the file descriptors. This is called as a Edge Triggeredmechanism (”pull”), whereas the select/poll was a LevelTriggered mechanism (”push”). The epoll fd structure createdby Firecracker has the ‘close on exec‘ flag set, which meansif a process forks the Firecracker process, the file descriptorswould not be shared.The API server exposes a number of requests as RESTAPIs which can be used to configure the MicroVM. Oncethe MicroVM has been started, i.e., once it receives the”InstanceStart” command, the API server thread will justblock on the epoll file descriptor until new events come in.Firecracker creates a channel (see Rust channels) to enablecommunication between the API Server thread and the VMMthread. Rust channels are similar to Unix pipes for comparison.The VMM server thread manages the entire MicroVM.Once the VMM server thread is created, it runs an eventloop which takes the parsed request one by one from the APIserver thread and dispatches it to the appropriate handlers. Thehandlers are defined according to the dispatch table set bythe event loop. For now, Firecracker supports the followinghandlers - Exit, Stdin, DeviceHandler, VMMActionRequest,WriteMetrics. The dispatch table is managed by the epoll fd.The dispatch table maintains a map of file descriptors thatare to be monitored, and the kind of events to be monitoredfor. When the vCPU thread creation request is received bythe VMM thread, the VMM spawns the required number ofvCPU by using the KVM vCPU ioctls. A vCPU thread is created for each virtual CPU. These vCPU threads are nothingbut POSIX threads created by KVM. To run guest code, thevCPUs execute an ioctl with KVM RUN as its argument.Software in the root mode is the hypervisor. Hypervisoror the VMM forms a new plane that runs in root modewhile the VMs run in non-root mode. KVM uses virtualizationextensions to provide these different modes on the host CPUs.In the case of Intel CPUs, VT-x is the CPU virtualization andVT-d is the IO virtualization. For vCPUs, VT-x provides twomodes of guest code execution: root and non-root. Whenevera VM attempts to execute an instruction that is prohibitedby the non-root model, vCPU immediately switches to aroot mode in a trap-like way. This is called a VMEXIT.Hypervisor deals with the reason for VMEXIT and thenexecutes VMRESUME to re-enter non-root mode for that VMinstance. This interaction between root and non-root is theessence of hardware virtualization.III. F IRECRACKER B OOT S EQUENCE
Traditional PCs boot Linux with a BIOS and a bootloader.The primary responsibilities of the BIOS includes bootingthe CPU in real mode and performing a Power on Self Test(POST) setup before loading the bootloader. BIOS determinesthe candidates for boot devices, and once a bootable deviceis found, the bootloader is loaded into RAM and executed.Different systems have different stages of the bootloader tobe executed. LILO, for example, has a two stage bootloaderwhile GRUB contains a 3 stage bootloader. Multiple stagesof a bootloader are used because of the system limitations ofsome of the older devices that were used to boot Linux.Linux kernels actually do not require a BIOS and a boot-loader. Instead, Firecracker uses what is known as
Linux BootProtocol . [15] There are multiple versions of the Linux BootProtocol standard that exist. Firecracker follows the 64-bitLinux Boot Protocol Standard. Thus, Firecracker can directlyload the Linux kernel and mount the corresponding root filesystem.The Linux kernel is an uncompressed bzImage (big com-pressed image, usually larger than 512KB). The bzImage format consists of a real mode kernel code and a protectedmode kernel code. Instead of booting into the entry pointdefined by the real mode, Firecracker directly boots into the64-bit entry point located at 0x200 in the protected modekernel code. Firecracker loads the uncompressed Linux kernelas well as the init process, thereby saving approximately 20to 30ms of the time taken to decompress the kernel.Linux kernel also contains another component, namely theinitramfs [5]. There are four primary reasons to have aninitramfs in the LFS environment: loading the rootfs froma network, loading it from an LVM logical volume, havingan encrypted rootfs where a password is required, or for theconvenience of specifying the rootfs as a LABEL or UUID.Anything else usually means that the kernel was not configuredproperly.Since Firecracker doesn’t need any of the above statedreasons for loading the initramfs before mounting the root ˜| Protected-mode kernel |100000 +------------------------+| I/O memory hole |0A0000 +------------------------+| Reserved for BIOS | Leave as much as possible unused˜ ˜| Command line | (Can also be below the X+10000 mark)X+10000 +------------------------+| Stack/heap | For use by the kernel real-mode code.X+08000 +------------------------+| Kernel setup | The kernel real-mode code.| Kernel boot sector | The kernel legacy boot sector.X +------------------------+| Boot loader | <- Boot sector entry point 0000:7C00001000 +------------------------+| Reserved for MBR/BIOS |000800 +------------------------+| Typically used by MBR |000600 +------------------------+| BIOS use only |000000 +------------------------+
Fig. 2. Linux BzImage Memory Layout file system, it is recommended to avoid loading the initramfsat boot time, thereby further reducing the overall boot timeand the memory footprint for the kernel. So, if no initramfs isconfigured externally, then at boot time, Firecracker replacesthe initramfs with a default empty, 134 byte initramfs.IV. F
IRECRACKER D EVICE M ODEL
Until this section, we have talked about the similar architec-tures and execution flow for Firecracker and QEMU. So, whatis different between QEMU and Firecracker? One of the maindifferences is with the device emulations. There are only 5Device emulations available in Firecracker: network, block de-vices, sockets, serial console and minimal keyboard controller,as shown in Figure 3. Firecracker does not provide support fordevice emulations like USB, GPU and 9P filesystem in orderto provide increased security compared to other virtualizationtechnologies like QEMU. On the other hand, QEMU has mostdevice model emulations available in the VMM. The carefulreader will notice that Firecracker does not the use the vhostimplementation in the host kernel that provides more efficientIO performance without doing VMEXITS.An open specification for emulating device models in virtu-alization has been developed, named Virtio. Virtio is definedas a straightforward, efficient and a standard mechanism toallow guest OS to talk to the virtual device driver in a similarway the host OS would call the actual hardware device driver.It takes advantage of the fact that the guest can share memorywith the host for IO.The general flow for the virtio specification [8] includes afront-end driver representing the virtual device in the guest, and the corresponding device being exposed by the hyper-visor or the VMM. A transport layer enables communicationbetween the host and the guest. For the transport layers, Virtioemploys a ring-buffer virtqueue structure. A virtuqueue is aqueue of guest-allocated buffers that the host interacts witheither by reading or writing to them. Each device can havezero or more virtqueues. A back-end driver present in thehost kernel completes the communication flow, to which thevirtqueue is connected.Firecracker device model architecture using Virtio is shownin Figure 3. The following list provides a description of thedevices available within Firecracker: • virtio-net : implementation for the network driver (tun/tapdevices) • virtio-blk : implementation for the block devices • virtio-vsock : implementation for VM sockets providingN:1 serial communications • serial console : implementation for the legacy consoledevices for serial communication - terminal • keyboard controller : implementation for the keyboarddevice, though only one function is implemented -Ctrl+Alt+Del to reboot/shutdown the system.V. S COPE FOR I MPROVEMENTS
Even with all the excellent features providing near-nativeperformance of the guest code using KVM, as well as fasterboot times and lower memory footprint of the VMs dueto fewer support for available device emulations, there stillexist some areas for improvements that can make Firecracker ig. 3. Firecracker Device Model more suitable for general use cases and not just for serverlessworkloads. • Support for virtio-fs : virto-fs is the interface to provideefficient sharing between the host and the guest filesystemavoiding context switches (VMEXITS) thereby providingmore performance. virtio-fs is an upgrade on the existingvirtio-9p interface for the same purpose. Though moreresearch is required for security purposes before includingit as part of Firecracker. • Increased IO Performance : The results of the testsperformed between the Firecracker, QEMU and CloudHypervisor show limitations in Firecracker’s virtio im-plementation and serial execution. [1] [2] • Larger number of device emulations : Currently, Fire-cracker can emulate only 10 devices, since each devicegets its own IRQ. [10] • Support for attaching devices at runtime : Firecrackeronly allows specifying the devices at booting time. De-vices can only be attached when the MicroVM is shutdown. • Hotplugging Support : For any workload, it is beneficialto allow guest memory/CPU hotplugging within a VMat runtime in order to avoid interference to the workload.Firecracker oversubscribes the allocated memory requiredfor the guest, but there is no way to expand the allocatedmemory for the guest. • Memory Ballooning Support : At present, Firecrackerdoes not have any support for reclaiming unused mem-ory from the guest, since no communication is presentbetween the host and the guest. This, along with the hot-plugging feature would make it very easy to dynamicallyadd/remove memory/CPU at runtime thereby providingelasticity to the MicroVM. [9]VI. C
ONCLUSION /T HOUGHTS
This paper reviews the implementation of a minimalist andmodular VMM in the form of Firecracker MicroVM. It alsoidentifies the process of how Firecracker provides resourceisolation and security through the use of seccomp filters andjailer process and provides faster boot times and lower memoryfootprint due to KVM and minimal device model emulation.One other thing to note is that Firecracker embodies themodular design in the development of the hypervisor. The modular design approach has also led to the development ofcommunity-driven high-quality rust-vmm crates which pro-vide us with the core modules required for the implementationof a hypervisor [11]. rust-vmm [12] is a community approachinitiated by AWS. Amazon along with Intel, Redhat andGoogle, are trying to provide a platform to build a hypervisorfrom scratch by only consuming the modules required from therust-vmm crates. This approach also enables the developmentof a plug-n-play architecture in hypervisors, which we haven’tseen so far. Rcrates which pro-vide us with the core modules required for the implementationof a hypervisor [11]. rust-vmm [12] is a community approachinitiated by AWS. Amazon along with Intel, Redhat andGoogle, are trying to provide a platform to build a hypervisorfrom scratch by only consuming the modules required from therust-vmm crates. This approach also enables the developmentof a plug-n-play architecture in hypervisors, which we haven’tseen so far. R