Using the KVM API
Many developers, users, and entire industries rely on virtualization, as provided by software like Xen, QEMU/KVM, or kvmtool. While QEMU can run a software-based virtual machine, and Xen can run cooperating paravirtualized OSes without hardware support, most current uses and deployments of virtualization rely on hardware-accelerated virtualization, as provided on many modern hardware platforms. Linux supports hardware virtualization via the Kernel Virtual Machine (KVM) API. In this article, we'll take a closer look at the KVM API, using it to directly set up a virtual machine without using any existing virtual machine implementation.
A virtual machine using KVM need not run a complete operating system or emulate a full suite of hardware devices. Using the KVM API, a program can run code inside a sandbox and provide arbitrary virtual hardware interfaces to that sandbox. If you want to emulate anything other than standard hardware, or run anything other than a standard operating system, you'll need to work with the KVM API used by virtual machine implementations. As a demonstration that KVM can run more (or less) than just a complete operating system, we'll instead run a small handful of instructions that simply compute 2+2 and print the result to an emulated serial port.
The KVM API provides an abstraction over the hardware-virtualization features of various platforms. However, any software making use of the KVM API still needs to handle certain machine-specific details, such as processor registers and expected hardware devices. For the purposes of this article, we'll set up an x86 virtual machine using Intel VT. For another platform, you'd need to handle different registers, different virtual hardware, and different expectations about memory layout and initial state.
The Linux kernel includes documentation of the KVM API in Documentation/virtual/kvm/api.txt and other files in the Documentation/virtual/kvm/ directory.
This article includes snippets of sample code from a fully functional sample program (MIT licensed). The program makes extensive use of the err() and errx() functions for error handling; however, the snippets quoted in the article only include non-trivial error handling.
Definition of the sample virtual machine
A full virtual machine using KVM typically emulates a variety of virtual hardware devices and firmware functionality, as well as a potentially complex initial state and initial memory contents. For our sample virtual machine, we'll run the following 16-bit x86 code:
mov $0x3f8, %dx add %bl, %al add $'0', %al out %al, (%dx) mov $'\n', %al out %al, (%dx) hlt
These instructions will add the initial contents of the al and bl registers (which we will pre-initialize to 2), convert the resulting sum (4) to ASCII by adding '0', output it to a serial port at 0x3f8 followed by a newline, and then halt.
Rather than reading code from an object file or executable, we'll pre-assemble these instructions (via gcc and objdump) into machine code stored in a static array:
const uint8_t code[] = { 0xba, 0xf8, 0x03, /* mov $0x3f8, %dx */ 0x00, 0xd8, /* add %bl, %al */ 0x04, '0', /* add $'0', %al */ 0xee, /* out %al, (%dx) */ 0xb0, '\n', /* mov $'\n', %al */ 0xee, /* out %al, (%dx) */ 0xf4, /* hlt */ };
For our initial state, we will preload this code into the second page of guest "physical" memory (to avoid conflicting with a non-existent real-mode interrupt descriptor table at address 0). al and bl will contain 2, the code segment (cs) will have a base of 0, and the instruction pointer (ip) will point to the start of the second page at 0x1000.
Rather than the extensive set of virtual hardware typically provided by a virtual machine, we'll emulate only a trivial serial port on port 0x3f8.
Finally, note that running 16-bit real-mode code with hardware VT support requires a processor with "unrestricted guest" support. The original VT implementations only supported protected mode with paging enabled; emulators like QEMU thus had to handle virtualization in software until reaching a paged protected mode (typically after OS boot), then feed the virtual system state into KVM to start doing hardware emulation. However, processors from the "Westmere" generation and newer support "unrestricted guest" mode, which adds hardware support for emulating 16-bit real mode, "big real mode", and protected mode without paging. The Linux KVM subsystem has supported the "unrestricted guest" feature since Linux 2.6.32 in June 2009.
Building a virtual machine
First, we'll need to open /dev/kvm:
kvm = open("/dev/kvm", O_RDWR | O_CLOEXEC);
We need read-write access to the device to set up a virtual machine, and all opens not explicitly intended for inheritance across exec should use O_CLOEXEC.
Depending on your system, you likely have access to /dev/kvm either via a group named "kvm" or via an access control list (ACL) granting access to users logged in at the console.
Before you use the KVM API, you should make sure you have a version you can work with. Early versions of KVM had an unstable API with an increasing version number, but the KVM_API_VERSION last changed to 12 with Linux 2.6.22 in April 2007, and got locked to that as a stable interface in 2.6.24; since then, KVM API changes occur only via backward-compatible extensions (like all other kernel APIs). So, your application should first confirm that it has version 12, via the KVM_GET_API_VERSION ioctl():
ret = ioctl(kvm, KVM_GET_API_VERSION, NULL); if (ret == -1) err(1, "KVM_GET_API_VERSION"); if (ret != 12) errx(1, "KVM_GET_API_VERSION %d, expected 12", ret);
After checking the version, you may want to check for any extensions you use, using the KVM_CHECK_EXTENSION ioctl(). However, for extensions that add new ioctl() calls, you can generally just call the ioctl(), which will fail with an error (ENOTTY) if it does not exist.
If we wanted to check for the one extension we use in this sample program, KVM_CAP_USER_MEM (required to set up guest memory via the KVM_SET_USER_MEMORY_REGION ioctl()), that check would look like this:
ret = ioctl(kvm, KVM_CHECK_EXTENSION, KVM_CAP_USER_MEMORY); if (ret == -1) err(1, "KVM_CHECK_EXTENSION"); if (!ret) errx(1, "Required extension KVM_CAP_USER_MEM not available");
Next, we need to create a virtual machine (VM), which represents everything associated with one emulated system, including memory and one or more CPUs. KVM gives us a handle to this VM in the form of a file descriptor:
vmfd = ioctl(kvm, KVM_CREATE_VM, (unsigned long)0);
The VM will need some memory, which we provide in pages. This corresponds to the "physical" address space as seen by the VM. For performance, we wouldn't want to trap every memory access and emulate it by returning the corresponding data; instead, when a virtual CPU attempts to access memory, the hardware virtualization for that CPU will first try to satisfy that access via the memory pages we've configured. If that fails (due to the VM accessing a "physical" address without memory mapped to it), the kernel will then let the user of the KVM API handle the access, such as by emulating a memory-mapped I/O device or generating a fault.
For our simple example, we'll allocate a single page of memory to hold our code, using mmap() directly to obtain page-aligned zero-initialized memory:
mem = mmap(NULL, 0x1000, PROT_READ | PROT_WRITE, MAP_SHARED | MAP_ANONYMOUS, -1, 0);
We then need to copy our machine code into it:
memcpy(mem, code, sizeof(code));
And finally tell the KVM virtual machine about its spacious new 4096-byte memory:
struct kvm_userspace_memory_region region = { .slot = 0, .guest_phys_addr = 0x1000, .memory_size = 0x1000, .userspace_addr = (uint64_t)mem, }; ioctl(vmfd, KVM_SET_USER_MEMORY_REGION, ®ion);
The slot field provides an integer index identifying each region of memory we hand to KVM; calling KVM_SET_USER_MEMORY_REGION again with the same slot will replace this mapping, while calling it with a new slot will create a separate mapping. guest_phys_addr specifies the base "physical" address as seen from the guest, and userspace_addr points to the backing memory in our process that we allocated with mmap(); note that these always use 64-bit values, even on 32-bit platforms. memory_size specifies how much memory to map: one page, 0x1000 bytes.
Now that we have a VM, with memory containing code to run, we need to create a virtual CPU to run that code. A KVM virtual CPU represents the state of one emulated CPU, including processor registers and other execution state. Again, KVM gives us a handle to this VCPU in the form of a file descriptor:
vcpufd = ioctl(vmfd, KVM_CREATE_VCPU, (unsigned long)0);
The 0 here represents a sequential virtual CPU index. A VM with multiple CPUs would assign a series of small identifiers here, from 0 to a system-specific limit (obtainable by checking the KVM_CAP_MAX_VCPUS capability with KVM_CHECK_EXTENSION).
Each virtual CPU has an associated struct kvm_run data structure, used to communicate information about the CPU between the kernel and user space. In particular, whenever hardware virtualization stops (called a "vmexit"), such as to emulate some virtual hardware, the kvm_run structure will contain information about why it stopped. We map this structure into user space using mmap(), but first, we need to know how much memory to map, which KVM tells us with the KVM_GET_VCPU_MMAP_SIZE ioctl():
mmap_size = ioctl(kvm, KVM_GET_VCPU_MMAP_SIZE, NULL);
Note that the mmap size typically exceeds that of the kvm_run structure, as the kernel will also use that space to store other transient structures that kvm_run may point to.
Now that we have the size, we can mmap() the kvm_run structure:
run = mmap(NULL, mmap_size, PROT_READ | PROT_WRITE, MAP_SHARED, vcpufd, 0);
The VCPU also includes the processor's register state, broken into two sets of registers: standard registers and "special" registers. These correspond to two architecture-specific data structures: struct kvm_regs and struct kvm_sregs, respectively. On x86, the standard registers include general-purpose registers, as well as the instruction pointer and flags; the "special" registers primarily include segment registers and control registers.
Before we can run code, we need to set up the initial states of these sets of registers. Of the "special" registers, we only need to change the code segment (cs); its default state (along with the initial instruction pointer) points to the reset vector at 16 bytes below the top of memory, but we want cs to point to 0 instead. Each segment in kvm_sregs includes a full segment descriptor; we don't need to change the various flags or the limit, but we zero the base and selector fields which together determine what address in memory the segment points to. To avoid changing any of the other initial "special" register states, we read them out, change cs, and write them back:
ioctl(vcpufd, KVM_GET_SREGS, &sregs); sregs.cs.base = 0; sregs.cs.selector = 0; ioctl(vcpufd, KVM_SET_SREGS, &sregs);
For the standard registers, we set most of them to 0, other than our initial instruction pointer (pointing to our code at 0x1000, relative to cs at 0), our addends (2 and 2), and the initial state of the flags (specified as 0x2 by the x86 architecture; starting the VM will fail with this not set):
struct kvm_regs regs = { .rip = 0x1000, .rax = 2, .rbx = 2, .rflags = 0x2, }; ioctl(vcpufd, KVM_SET_REGS, ®s);
With our VM and VCPU created, our memory mapped and initialized, and our initial register states set, we can now start running instructions with the VCPU, using the KVM_RUN ioctl(). That will return successfully each time virtualization stops, such as for us to emulate hardware, so we'll run it in a loop:
while (1) { ioctl(vcpufd, KVM_RUN, NULL); switch (run->exit_reason) { /* Handle exit */ } }
Note that KVM_RUN runs the VM in the context of the current thread and doesn't return until emulation stops. To run a multi-CPU VM, the user-space process must spawn multiple threads, and call KVM_RUN for different virtual CPUs in different threads.
To handle the exit, we check run->exit_reason to see why we exited. This can contain any of several dozen exit reasons, which correspond to different branches of the union in kvm_run. For this simple VM, we'll just handle a few of them, and treat any other exit_reason as an error.
We treat a hlt instruction as a sign that we're done, since we have nothing to ever wake us back up:
case KVM_EXIT_HLT: puts("KVM_EXIT_HLT"); return 0;
To let the virtualized code output its result, we emulate a serial port on I/O port 0x3f8. Fields in run->io indicate the direction (input or output), the size (1, 2, or 4), the port, and the number of values. To pass the actual data, the kernel uses a buffer mapped after the kvm_run structure, and run->io.data_offset provides the offset from the start of that mapping.
case KVM_EXIT_IO: if (run->io.direction == KVM_EXIT_IO_OUT && run->io.size == 1 && run->io.port == 0x3f8 && run->io.count == 1) putchar(*(((char *)run) + run->io.data_offset)); else errx(1, "unhandled KVM_EXIT_IO"); break;
To make it easier to debug the process of setting up and running the VM, we handle a few common kinds of errors. KVM_EXIT_FAIL_ENTRY, in particular, shows up often when changing the initial conditions of the VM; it indicates that the underlying hardware virtualization mechanism (VT in this case) can't start the VM because the initial conditions don't match its requirements. (Among other reasons, this error will occur if the flags register does not have bit 0x2 set, or if the initial values of the segment or task-switching registers fail various setup criteria.) The hardware_entry_failure_reason does not actually distinguish many of those cases, so an error of this type typically requires a careful read through the hardware documentation.
case KVM_EXIT_FAIL_ENTRY: errx(1, "KVM_EXIT_FAIL_ENTRY: hardware_entry_failure_reason = 0x%llx", (unsigned long long)run->fail_entry.hardware_entry_failure_reason);
KVM_EXIT_INTERNAL_ERROR indicates an error from the Linux KVM subsystem rather than from the hardware. In particular, under various circumstances, the KVM subsystem will emulate one or more instructions in the kernel rather than via hardware, such as for performance reasons (to coalesce a series of vmexits for I/O). The run->internal.suberror value KVM_INTERNAL_ERROR_EMULATION indicates that the VM encountered an instruction it doesn't know how to emulate, which most commonly indicates an invalid instruction.
case KVM_EXIT_INTERNAL_ERROR: errx(1, "KVM_EXIT_INTERNAL_ERROR: suberror = 0x%x", run->internal.suberror);
When we put all of this together into the sample code, build it, and run it, we get the following:
$ ./kvmtest 4 KVM_EXIT_HLT
Success! We ran our machine code, which added 2+2, turned it into an ASCII 4, and wrote it to port 0x3f8. This caused the KVM_RUN ioctl() to stop with KVM_EXIT_IO, which we emulated by printing the 4. We then looped and re-entered KVM_RUN, which stops with KVM_EXIT_IO again for the \n. On the third and final loop, KVM_RUN stops with KVM_EXIT_HLT, so we print a message and quit.
Additional KVM API features
This sample virtual machine demonstrates the core of the KVM API, but ignores several other major areas that many non-trivial virtual machines will care about.
Prospective implementers of memory-mapped I/O devices will want to look at the exit_reason KVM_EXIT_MMIO, as well as the KVM_CAP_COALESCED_MMIO extension to reduce vmexits, and the ioeventfd mechanism to process I/O asynchronously without a vmexit.
For hardware interrupts, see the irqfd mechanism, using the KVM_CAP_IRQFD extension capability. This provides a file descriptor that can inject a hardware interrupt into the KVM virtual machine without stopping it first. A virtual machine may thus write to this from a separate event loop or device-handling thread, and threads running KVM_RUN for a virtual CPU will process that interrupt at the next available opportunity.
x86 virtual machines will likely want to support CPUID and model-specific registers (MSRs), both of which have architecture-specific ioctl()s that minimize vmexits.
Applications of the KVM API
Other than learning, debugging a virtual machine implementation, or as a party trick, why use /dev/kvm directly?
Virtual machines like qemu-kvm or kvmtool typically emulate the standard hardware of the target architecture; for instance, a standard x86 PC. While they can support other devices and virtio hardware, if you want to emulate a completely different type of system that shares little more than the instruction set architecture, you might want to implement a new VM instead. And even within an existing virtual machine implementation, authors of a new class of virtio hardware device will want a clear understanding of the KVM API.
Efforts like novm and kvmtool use the KVM API to construct a lightweight VM, dedicated to running Linux rather than an arbitrary OS. More recently, the Clear Containers project uses kvmtool to run containers using hardware virtualization.
Alternatively, a VM need not run an OS at all. A KVM-based VM could instead implement a hardware-assisted sandbox with no virtual hardware devices and no OS, providing arbitrary virtual "hardware" devices as the API between the sandbox and the sandboxing VM.
While running a full virtual machine remains the primary use case for hardware virtualization, we've seen many innovative uses of the KVM API recently, and we can certainly expect more in the future.
Index entries for this article | |
---|---|
GuestArticles | Triplett, Josh |
Posted Sep 29, 2015 18:30 UTC (Tue)
by flewellyn (subscriber, #5047)
[Link] (2 responses)
The kernel API looks pretty low-level. I can see why something like libvirt would be needed to abstract it a bit.
Posted Sep 29, 2015 18:38 UTC (Tue)
by luto (subscriber, #39314)
[Link] (1 responses)
Posted Sep 29, 2015 18:45 UTC (Tue)
by josh (subscriber, #17465)
[Link]
Posted Sep 29, 2015 19:15 UTC (Tue)
by eru (subscriber, #2753)
[Link] (23 responses)
Posted Sep 29, 2015 19:40 UTC (Tue)
by josh (subscriber, #17465)
[Link] (22 responses)
Posted Sep 29, 2015 21:18 UTC (Tue)
by pbonzini (subscriber, #60935)
[Link] (21 responses)
But KVM is not really needed in this case: on one hand you don't need near bare-metal performance that KVM provides, because dosemu/dosbox only need to emulate a 100 MHz machine or so, and a simple interpreter or a JIT compiler like QEMU's can handle it (QEMU is known as slowish for a JIT translator, but there's some work being done on that side as well). On the other hand KVM's performance comes with some fine print, which you cannot really afford in the case of dosemu/dosbox. A KVM_EXIT_IO exit is very slow, on the order of a few thousand cycles on the newest processors. By comparison, QEMU can dispatch a single memory-mapped I/O operation in about 100 clock cycles, so 60-150 times faster than KVM. Hence running demos like Unreal (https://www.youtube.com/watch?v=VjYQeMExIwk#t=7m) doesn't work too well on QEMU with KVM because they do an insane number of such exits.
To play old games (man, I should send those Jazz Jackrabbit patches upstream...) I typically use QEMU without KVM.
Posted Sep 29, 2015 21:26 UTC (Tue)
by josh (subscriber, #17465)
[Link] (5 responses)
True. Out of curiosity, does any means exist to turn that *off*? I have some interest in compiling out most or all of the in-kernel instruction emulation, to reduce attack surface area.
> A KVM_EXIT_IO exit is very slow, on the order of a few thousand cycles on the newest processors. By comparison, QEMU can dispatch a single memory-mapped I/O operation in about 100 clock cycles, so 60-150 times faster than KVM.
What about with coalesced or fd-ed I/O?
Posted Sep 30, 2015 13:45 UTC (Wed)
by pbonzini (subscriber, #60935)
[Link] (4 responses)
With unrestricted_guest=1 you only exit to the emulator for a few privileged instructions (where for simplicity KVM emulates them instead of having a mini-interpreter in vmx.c/svm.c) and for I/O. But unfortunately, thanks to the x86 ISA's read-modify-write instructions that's still a _lot_ of different instructions that you can emulate.
So there's not much that you can compile out. You could simply modify KVM to refuse loading if unrestricted_guest=0, but you can still trigger any bit of emulator code by setting up a race between two VCPUs. One triggers I/O continuously, the other races against the emulator changing the opcodes of the I/O instruction into something else. This actually used to be a vulnerability, but it's been patched for several years and the emulator is now considered a security sensitive component.
> > A KVM_EXIT_IO exit is very slow, on the order of a few thousand cycles on the
Still around 1500-2000 cycles. For ioeventfd you have to add the latency of waking up the I/O thread if it's sleeping (but if the fd is really busy, e.g. running fio in the guest, it won't have time to go to sleep).
Posted Sep 30, 2015 17:53 UTC (Wed)
by josh (subscriber, #17465)
[Link] (3 responses)
How much *minimum* latency comes from the vmexit, and how much gets added by the path from the in-kernel vmexit handling and whatever mechanism it uses to contact the I/O thread? If much of it comes from the latter, perhaps we could find a way to accelerate that via another (latency-optimized) interface.
Posted Oct 1, 2015 7:17 UTC (Thu)
by pbonzini (subscriber, #60935)
[Link] (2 responses)
Posted Oct 1, 2015 15:50 UTC (Thu)
by josh (subscriber, #17465)
[Link] (1 responses)
Posted Oct 1, 2015 16:02 UTC (Thu)
by pbonzini (subscriber, #60935)
[Link]
Posted Sep 30, 2015 3:42 UTC (Wed)
by voltagex (subscriber, #86296)
[Link] (1 responses)
Yes, yes you should. By the way, can you still buy that game? That and OMF 2097 are my favourites of all time.
Posted Sep 30, 2015 13:46 UTC (Wed)
by pbonzini (subscriber, #60935)
[Link]
Posted Sep 30, 2015 6:17 UTC (Wed)
by eru (subscriber, #2753)
[Link] (7 responses)
Probably true for old games, but the situation I am thinking of involves using ancient cross-compilers to compiler large masses of legacy code for a weird environment that still has to be maintained. One would think (and I did think) this is an I/O-bound operation, but it turned out the speed difference between dosemu with VM86 and dosemu on x86_86 with emulation is very noticeable (order of magnitude for large inputs). On the other hand, dosemu also has advantages, because it can run "headless", can easily access native files, and starts up quickly. These are important features, because the ancient compilers are wrapped in layers that hide their MS-DOS internals, so from the Linux user's point of view they act like normal command-line tools.
Posted Sep 30, 2015 13:46 UTC (Wed)
by pbonzini (subscriber, #60935)
[Link] (6 responses)
Posted Sep 30, 2015 15:28 UTC (Wed)
by eru (subscriber, #2753)
[Link] (5 responses)
Posted Sep 30, 2015 17:15 UTC (Wed)
by felix.s (guest, #104710)
[Link] (4 responses)
Also, funnily enough, some time ago I've been working on a DOS/BIOS ABI layer based on KVM (I tried to make backends interchangeable, but I'm not sure how well I've succeeded), and I think it would be ideal for the use case you describe. I even managed to include a simplistic packet driver, so I can use the FDNPKG package manager to download programs to test. However, the code is currently such a mess that I'm too embarrassed to publish it. Maybe some day...
Posted Sep 30, 2015 17:34 UTC (Wed)
by josh (subscriber, #17465)
[Link] (3 responses)
Posted Sep 30, 2015 17:49 UTC (Wed)
by kvaneesh (subscriber, #45646)
[Link] (2 responses)
Posted Oct 1, 2015 6:29 UTC (Thu)
by kleptog (subscriber, #1183)
[Link] (1 responses)
Now, I get that it's probably a configuration issue since it clearly wasn't caching anything, but I found it really hard to find documentation about qemu that explained this behaviour. On top of that I'm managing them via libvirt, so even if I find a command-line option to deal with something, if libvirt doesn't support it I'm still SOL.
Overall, it hasn't been a great experience, next time I'll probably do what other people do, use VirtualBox or VMWare.
But back to the article, it's a pretty nice interface actually. Hopefully I'll find some reason to use it sometime :)
Posted Oct 8, 2015 16:48 UTC (Thu)
by LightDot (guest, #73140)
[Link]
<domain type='kvm' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'>
The VNC option doesn't need to be presented as a command line, I just left it as an example.
Posted Sep 30, 2015 21:16 UTC (Wed)
by luto (subscriber, #39314)
[Link] (1 responses)
Please tell me that this is at least *guest* vm86 mode and not host vm86 mode.
Also, why does it care how the guest->host physical mappings are set up?
Posted Oct 1, 2015 7:21 UTC (Thu)
by pbonzini (subscriber, #60935)
[Link]
> Also, why does it care how the guest->host physical mappings are set up?
Can you explain your question better? Who is the subject?
Posted Sep 30, 2015 23:04 UTC (Wed)
by josh (subscriber, #17465)
[Link] (2 responses)
Posted Oct 1, 2015 8:21 UTC (Thu)
by pbonzini (subscriber, #60935)
[Link] (1 responses)
If you don't call KVM_SET_TSS_ADDR you actually get a complaint in dmesg, and the TR stays at 0. I am not really sure what kind of bad things can happen with unrestricted_guest=0, probably you just get a VM Entry failure. The TSS takes 3 pages of memory. An interesting point is that you actually don't need to set the TR selector to a valid value (as you would do when running in "normal" vm86 mode), you can simply set the base and limit registers that are hidden in the processor, and generally inaccessible except through VMREAD/VMWRITE or system management mode. So KVM needs to set up a TSS but not a GDT.
For paging, instead, 1 page is enough because we have only 4GB of memory to address. KVM disables CR4.PAE (page address extensions, aka 8-byte entries in each page directory or page table) and enables CR4.PSE (page size extensions, aka 4MB huge pages support with 4-byte page directory entries). One page then fits 1024 4-byte page directory entries, each for a 4MB huge pages, totaling exactly 4GB. Here if you don't set it the page table is at address 0xFFFBC000. QEMU changes it to 0xFEFFC000 so that the BIOS can be up to 16MB in size (the default only allows 256k between 0xFFFC0000 and 0xFFFFFFFF).
The different handling, where only the page table has a default, is unfortunate, but so goes life...
Posted Oct 1, 2015 15:54 UTC (Thu)
by josh (subscriber, #17465)
[Link]
Ah, I see.
> If you don't call KVM_SET_TSS_ADDR you actually get a complaint in dmesg, and the TR stays at 0.
While I saw the mention of that message in a few places, I don't actually get that message at any point. Presumably that only happens with unrestricted_guest=0?
Please consider documenting the use of these two ioctls and the data they point to, as well as what circumstances require them; the current KVM documentation doesn't mention any of that.
Posted Sep 29, 2015 19:16 UTC (Tue)
by Cyberax (✭ supporter ✭, #52523)
[Link] (2 responses)
Posted Sep 29, 2015 20:17 UTC (Tue)
by drag (guest, #31333)
[Link]
Previously you had kernel mode and then you had usermode ways to execute code. Now you have a third way, the kvm way. There isn't much to emulate because it's not using emulation at all. It's just executing the code directly on your processor. The only really magic/complicated stuff that happens is to deal with memory addressing for the guest OS.
Qemu does all the heavy lifting as far as 'system emulator' goes.
Posted Oct 1, 2015 0:22 UTC (Thu)
by rahvin (guest, #16953)
[Link]
Posted Oct 1, 2015 15:39 UTC (Thu)
by sjj (guest, #2020)
[Link] (2 responses)
I generally don't run non-linux VMs, so these would be nice. Everytime I see a cloud image pull in some firmware packages I cry a little inside.
Posted Oct 1, 2015 15:58 UTC (Thu)
by josh (subscriber, #17465)
[Link] (1 responses)
True, novm appears somewhat stale, and I don't know if it sees active use. I mentioned it only as an example of an interesting project using KVM directly.
> kvmtool that I found doesn't seem all that active either.
It does most of what people using it want; it tend to grow new features as the KVM API does, grow new architectures when KVM supports them and someone wants to run kvmtool on them, and occasionally gain new drivers. I'd recommend trying kvmtool in its current state; it works quite well.
Posted Oct 4, 2015 4:10 UTC (Sun)
by sjj (guest, #2020)
[Link]
Posted Oct 2, 2015 10:13 UTC (Fri)
by lkundrak (subscriber, #43452)
[Link]
Posted Oct 2, 2015 14:36 UTC (Fri)
by kjp (guest, #39639)
[Link] (1 responses)
Posted Oct 2, 2015 18:54 UTC (Fri)
by josh (subscriber, #17465)
[Link]
Nothing better than lsof or equivalent.
> Is there a limit as to how many can be created on the system total,
As far as I know, no limit on the number of VMs other than available memory, but at the moment some limits exist on the number of VCPUs per VM. Those limits seem fixable, not inherent, though.
> and how to enumerate all of the existing ones?
Not as far as I know.
Posted Oct 10, 2015 21:00 UTC (Sat)
by alison (subscriber, #63752)
[Link] (2 responses)
Is anyone among the readers familiar enough with ARM to comment on how similar an implementation of a simple VM there would be? Obviously an ARM version would not employ Intel VT, and ARM ISA has its own notion of privilege levels, to begin with.
> Other than learning, debugging a virtual machine implementation, or as a party trick, why use
Please invite me to your parties.
Posted Oct 29, 2015 2:52 UTC (Thu)
by yehuday (guest, #93707)
[Link] (1 responses)
there is a great article from Christopher Dall et all explaining how KVM was ported to ARM
Posted Nov 19, 2015 23:00 UTC (Thu)
by marcH (subscriber, #57642)
[Link]
1972, 2005, 2014,...
In some respects computers can be quite depressing: mostly re-inventing the old https://en.wikipedia.org/wiki/Hardware-assisted_virtualiz...
Sure, they're now much faster... to crash.
(on a more positive note the article is really good)
Posted Oct 4, 2016 2:24 UTC (Tue)
by sahil (guest, #100553)
[Link] (4 responses)
Posted Oct 4, 2016 4:15 UTC (Tue)
by zlynx (guest, #2285)
[Link] (3 responses)
See COM on Wikipedia.
Posted Oct 4, 2016 6:28 UTC (Tue)
by sahil (guest, #100553)
[Link] (2 responses)
Posted Oct 4, 2016 7:01 UTC (Tue)
by jem (subscriber, #24231)
[Link] (1 responses)
You should search the IBM PC reference manual instead.
Posted Oct 4, 2016 16:51 UTC (Tue)
by sahil (guest, #100553)
[Link]
Posted Nov 19, 2019 11:00 UTC (Tue)
by dizz (guest, #134779)
[Link] (1 responses)
[1] https://github.com/rust-vmm
Posted Feb 18, 2020 16:31 UTC (Tue)
by josh (subscriber, #17465)
[Link]
I gave a talk on the /dev/kvm API, Rust, and rust-vmm, at BangBangCon: https://www.youtube.com/watch?v=A_diEEpAfpM
Using the KVM API
Using the KVM API
Using the KVM API
Using the KVM API
Using the KVM API
Using the KVM API
Using the KVM API
Using the KVM API
> in compiling out most or all of the in-kernel instruction emulation, to reduce attack
> surface area.
> > newest processors. By comparison, QEMU can dispatch a single memory-mapped I/O
> > operation in about 100 clock cycles, so 60-150 times faster than KVM.
>
> What about with coalesced or fd-ed I/O?
Using the KVM API
Using the KVM API
Using the KVM API
Using the KVM API
Using the KVM API
Using the KVM API
on one hand you don't need near bare-metal performance that KVM provides, because dosemu/dosbox only need to emulate a 100 MHz machine or so, and a simple interpreter or a JIT compiler like QEMU's can handle it
Using the KVM API
Using the KVM API
I probably should look into qemu again some day. One problem is file system access. As noted, I want the MS-DOS compilers to transparently compile sources in the Linux file system and write the objects there, and preferably without having to install any network support in the emulated MS-DOS or FreeDOS, so to leave maximum "real" memory for the compilers. Both dosemu and dosbox handle this requirement.
Using the KVM API
Using the KVM API
Using the KVM API
Using the KVM API
Using the KVM API
Using the KVM API
...
<qemu:commandline>
<qemu:arg value='-vnc'/>
<qemu:arg value=':30,tls'/>
<qemu:arg value='-k'/>
<qemu:arg value='fr'/>
<qemu:arg value='-no-fd-bootchk'/>
</qemu:commandline>
</domain>
Using the KVM API
Using the KVM API
Using the KVM API
Using the KVM API
Using the KVM API
Using the KVM API
Using the KVM API
Using the KVM API
novm / kvmtool still alive?
novm / kvmtool still alive?
novm / kvmtool still alive?
Using the KVM API
Using the KVM API
Using the KVM API
Using the KVM API with ARM?
> different registers, different virtual hardware, and different expectations about memory layout
> and initial state.
> /dev/kvm directly?
Using the KVM API with ARM?
> and ARM ISA has its own notion of privilege levels, to begin with.
ARM added support for virtualization as an extension to ARMv7 architecture which later become part of ARMv8.
Hardware support for virtualization on ARM is a achieved by:
1. introducing a new CPU privellege level for hypervisors called Hyp or Excpetion Level 2 (EL2).
2. Adding 2nd stage translation to the CPUs MMU
3. Adding support for virtual interrupts to the generic interrupt controller (GIC)
4, Adding support for virtual timer in the ARM architected timer
http://systems.cs.columbia.edu/files/wpid-asplos2014-kvm.pdf
Using the KVM API with ARM?
Using the KVM API
Is this related to the guest physical address range, in which we have mapped the memory?
As far back as I can remember that was the serial port address. It's historical.
Using the KVM API
Using the KVM API
Using the KVM API
Using the KVM API
Using the KVM API
[2] https://github.com/intel/cloud-hypervisor
[3] https://github.com/intel/cloud-hypervisor/blob/master/vmm...
Using the KVM API