[GSoC 2024] Hardware Virtualization: Final Report

Blog post by dalme on Wed, 2024-08-21 00:00

Project overview

QEMU is a virtual machine which allows running an operating system inside of another. While there already is a Haiku port, it currently does not support any acceleration system through native virtualization (through Intel VT-x and AMD SVM.) This makes it too slow for many uses. This project aimed to bring hardware virtualization to Haiku by porting NVMM, a hypervisor that already has QEMU support, into Haiku from DragonFlyBSD. The project goals (as included on the proposal) were:

Project goals

  • NVMM driver ported to Haiku (VMX backend only)
  • QEMU capable of accelerating virtual machines through NVMM

The project goals didn’t change much during the project (although there were plans to get SVM backend working too, which didn’t happen since I didn’t got any time to get into it) and they’ve been completed to some extent: The driver (VMX backend) is almost completely ported (there are only a few details missing) and QEMU is capable of accelerating virtual machines but several bugs remain.

Now, the details:

Completed objectives

  • NVMM frontend ported
  • NVMM VMX backend ported (with a couple issues remaining)
  • libnvmm (& test suite) ported
  • EPT support added to the kernel
  • QEMU patched to support NVMM on Haiku and working

Unresolved issues

  • SVM backend not ported
  • Some OSes don’t work properly on QEMU
  • Different behavior on real hardware
  • QEMU crashes when SMP is enabled
  • EPT translations not flushed from TLB on time

These are just the major issues, for a full list of incomplete stuff refer to GitHub (43 still opened while writing this post). Here we can see QEMU virtualizing KolibriOS through NVMM (on nested virtualization):

KolibriOS running on QEMU+NVMM (nested virtualization)

Code

The code is available at GitHub (commits dated May 13 and later in the master branch). None of it has been merged yet since the driver isn’t fully working and the kernel changes I made are only needed for NVMM.

Completed objectives: Technical details

Some details about each thing I got done, in chronological order, with some of the more relevant commits included. Some commits from the beginning of the project contain changes to more stuff than what the commit message says since at the beginning I had to do lots of changes in many different parts of the project and it was hard to split things into separate commits.

Importing missing BSD headers

While the NVMM code from DragonFlyBSD was more OS-independent than the NetBSD one it still assumed a lot of BSD macros were available. Many of them weren’t available on Haiku so we brought some headers from DragonFlyBSD:

Making Haiku headers C compatible

NVMM is written in C while huge chunks of Haiku are written on C++. We decided to keep NVMM in C (that is, compile it with gcc) and just put C++ code into the nvmm_haiku.cpp file which holds any OS-dependent logic for the Haiku port, just like there are nvmm_netbsd.c and nvmm_dragonflybsd.c. However, there are some Haiku headers we needed to include from NVMM C code. Those headers were actually supposed to have a C-API but apparently the x86_64 versions of those headers didn’t. I had to made them work with C:

SVM backend

Just for the sake of completion. The only commits involving the SVM backend:

Porting libnvmm

Porting libnvmm was straightforward and painless, although it required to remove some non-standard error codes.

VMX backend

This was the part of the project that spans greater over time (I worked on it from May 13 to around July 16, and there were bugfixes after that). It slowly gained functionality over time, each day adding a new feature until finally the calc example became the first VM to be virtualized by NVMM on Haiku.

The code contained here is very straight-forward since the hard part was getting the OS specific functions that these IOCTLs needed working.

calc-vm virtualized by NVMM

Haiku specific driver

Implemented Haiku legacy driver API into the driver.

Memory management

Arguably the hardest and longest part of the project since it was a prerequisite to implement almost everything (VMX backend and parts of NVMM frontend). It required me to understand a big part of the virtual memory management code, how NVMM handles memory and how QEMU maps it. Finally guest memory management is integrated into Haiku virtual memory subsystem, as shown by the guest page fault handler which simply calls Haiku page fault handler.

EPT (Extended Page Tables)

These tables handle the guest physical memory so our guest OS can access memory at nearly native speed. This meant adding X86GPAtoHPATranslationMap to the kernel: a new type of translation map that handles GPA (Guest Physical Address) to HPA (Host Physical Address) translations. VMVirtualAdressSpace, a new class of address space that handles the guest physical memory address space was also added. It doesn’t have any Haiku kernel pages mapped and uses X86GPAtoHPATranslationMap as translation map. Finally we have EPTPagingMethod which creates, destroys and populates EPT tables. It’s called from X86GPAtoHPATranslationMap.

QEMU

Getting QEMU to work required a new patch since QEMU buildsystem only looks for NVMM on NetBSD. In addition to that there were a few other issues:

  • Non-standard error codes were used.
  • QEMU couldn’t find NVMM or libnvmm headers.

Since NVMM isn’t ready to merge yet, QEMU recipe and patch aren’t merged either. I’ve pushed them to GitHub.

Unresolved issues: Technical details

Some OSes don’t work properly on QEMU

Haiku virtualized on Haiku

As you can see it doesn’t manage to virtualize Haiku correctly. It feels like some memory is copied wrong at some point. Xubuntu fails very early too. Sortix appears to work fine (on nested virtualization) but that’s likely just because it’s a smaller OS and the live ISO loads almost everything using GRUB (which uses BIOS calls to do so). Other toy OSes appear to work fine. I never managed to get the time to debug this so I can’t provide any more information.

Different behavior on real hardware

One of the biggest bugs I have debugged during GSoC consisted on very poor virtualization performance on real hardware. When running on nested virtualization (that is, executing QEMU+NVMM inside a VM running Haiku) things would ran slow (as expected on nested virtualization) but with similar performance as QEMU+NVMM on NetBSD. However, when running on real hardware performance would be way worse than that. Loading GRUB alone took a couple of minutes (if not more). There are a couple of reasons why this was happening but I didn’t find the “big” one until yesterday. To keep it short I was convinced by some reason that EPT tables format was the same as in X86 page tables but it’s not (although it’s similar enough to work). This was preventing the TLB from caching any EPT stuff and thus causing a huge performance loss.

I’ve written the correct EPT tables into the kernel and it finally runs fast. The problem (yes, there is always a new problem) is that several new bugs have appeared. I’ve managed to fix a few of them, but others remain:

  • The Space Invaders boot sector game I showed on the forum keeps working on nested virtualization but doesn’t work anymore on real hardware. BIOS just hangs trying to load it.
  • Sortix doesn’t work anymore. It reboots while loading GUI (probably a triple fault). It seems to work fine if you disable GUI on GRUB.

Since I fixed the performance issue yesterday I just haven’t had enough time to fix this yet.

QEMU crashes when SMP is enabled

As the title says QEMU tends to get an invalid VM exit from NVMM which causes it to abort, when running on a multiprocessor machine. This happens when a VM is migrated from one physical CPU to another, but it only happens sometimes. I prioritized fixing the performance issues and getting QEMU+NVMM working correctly on one CPU before even trying to fix this. I took a look at the code I wrote in May to handle VM migrations and didn’t see any obvious failure.

EPT translations not flushed from TLB on time

This is one of the few missing things on the VMX backend. Cached values from EPT tables (aka host TLB) needs to be flushed everytime there is a change on the EPT tables. On NetBSD they do this by installing a callback into the translation map. On DragonFlyBSD the translation map increments a counter everytime a flush is needed. On Haiku we don’t yet manage this in any way. This isn’t a problem yet because everytime we do a context switch we flush the whole TLB (by reloading CR3) but it’ll be a problem for having VMs with more than one CPU since we should invalidate remote TLBs too. Both the NetBSD and the DragonFlyBSD solution should be easy to add to the brand new X86GPAtoHPATranslationMap.

Conclusion

Throughout this project I’ve learned a lot about VMX and virtualization in general. I’ve also learned about paging and got some hands on experience doing kernel development on a well-established project.

I have also learned more about assembly and debugging (GDB scripts).

Acknowledgements

This awesome summer wouldn’t have been possible without my awesome mentors: waddlesplash and scottmc. Thank you!

I also want to thank Pulkomandy for checking my proposal draft back on March and trungnt2910 for fixing his GDB port so it would be able to debug QEMU. And of course all the people in the community that have been showing interest on this project in the forum and IRC.

I can’t believe it’s over. Time flies when you’re having fun!