|
| 1 | +Paravirt_ops on IA64 |
| 2 | +==================== |
| 3 | + 21 May 2008, Isaku Yamahata <yamahata@valinux.co.jp> |
| 4 | + |
| 5 | + |
| 6 | +Introduction |
| 7 | +------------ |
| 8 | +The aim of this documentation is to help with maintainability and/or to |
| 9 | +encourage people to use paravirt_ops/IA64. |
| 10 | + |
| 11 | +paravirt_ops (pv_ops in short) is a way for virtualization support of |
| 12 | +Linux kernel on x86. Several ways for virtualization support were |
| 13 | +proposed, paravirt_ops is the winner. |
| 14 | +On the other hand, now there are also several IA64 virtualization |
| 15 | +technologies like kvm/IA64, xen/IA64 and many other academic IA64 |
| 16 | +hypervisors so that it is good to add generic virtualization |
| 17 | +infrastructure on Linux/IA64. |
| 18 | + |
| 19 | + |
| 20 | +What is paravirt_ops? |
| 21 | +--------------------- |
| 22 | +It has been developed on x86 as virtualization support via API, not ABI. |
| 23 | +It allows each hypervisor to override operations which are important for |
| 24 | +hypervisors at API level. And it allows a single kernel binary to run on |
| 25 | +all supported execution environments including native machine. |
| 26 | +Essentially paravirt_ops is a set of function pointers which represent |
| 27 | +operations corresponding to low level sensitive instructions and high |
| 28 | +level functionalities in various area. But one significant difference |
| 29 | +from usual function pointer table is that it allows optimization with |
| 30 | +binary patch. It is because some of these operations are very |
| 31 | +performance sensitive and indirect call overhead is not negligible. |
| 32 | +With binary patch, indirect C function call can be transformed into |
| 33 | +direct C function call or in-place execution to eliminate the overhead. |
| 34 | + |
| 35 | +Thus, operations of paravirt_ops are classified into three categories. |
| 36 | +- simple indirect call |
| 37 | + These operations correspond to high level functionality so that the |
| 38 | + overhead of indirect call isn't very important. |
| 39 | + |
| 40 | +- indirect call which allows optimization with binary patch |
| 41 | + Usually these operations correspond to low level instructions. They |
| 42 | + are called frequently and performance critical. So the overhead is |
| 43 | + very important. |
| 44 | + |
| 45 | +- a set of macros for hand written assembly code |
| 46 | + Hand written assembly codes (.S files) also need paravirtualization |
| 47 | + because they include sensitive instructions or some of code paths in |
| 48 | + them are very performance critical. |
| 49 | + |
| 50 | + |
| 51 | +The relation to the IA64 machine vector |
| 52 | +--------------------------------------- |
| 53 | +Linux/IA64 has the IA64 machine vector functionality which allows the |
| 54 | +kernel to switch implementations (e.g. initialization, ipi, dma api...) |
| 55 | +depending on executing platform. |
| 56 | +We can replace some implementations very easily defining a new machine |
| 57 | +vector. Thus another approach for virtualization support would be |
| 58 | +enhancing the machine vector functionality. |
| 59 | +But paravirt_ops approach was taken because |
| 60 | +- virtualization support needs wider support than machine vector does. |
| 61 | + e.g. low level instruction paravirtualization. It must be |
| 62 | + initialized very early before platform detection. |
| 63 | + |
| 64 | +- virtualization support needs more functionality like binary patch. |
| 65 | + Probably the calling overhead might not be very large compared to the |
| 66 | + emulation overhead of virtualization. However in the native case, the |
| 67 | + overhead should be eliminated completely. |
| 68 | + A single kernel binary should run on each environment including native, |
| 69 | + and the overhead of paravirt_ops on native environment should be as |
| 70 | + small as possible. |
| 71 | + |
| 72 | +- for full virtualization technology, e.g. KVM/IA64 or |
| 73 | + Xen/IA64 HVM domain, the result would be |
| 74 | + (the emulated platform machine vector. probably dig) + (pv_ops). |
| 75 | + This means that the virtualization support layer should be under |
| 76 | + the machine vector layer. |
| 77 | + |
| 78 | +Possibly it might be better to move some function pointers from |
| 79 | +paravirt_ops to machine vector. In fact, Xen domU case utilizes both |
| 80 | +pv_ops and machine vector. |
| 81 | + |
| 82 | + |
| 83 | +IA64 paravirt_ops |
| 84 | +----------------- |
| 85 | +In this section, the concrete paravirt_ops will be discussed. |
| 86 | +Because of the architecture difference between ia64 and x86, the |
| 87 | +resulting set of functions is very different from x86 pv_ops. |
| 88 | + |
| 89 | +- C function pointer tables |
| 90 | +They are not very performance critical so that simple C indirect |
| 91 | +function call is acceptable. The following structures are defined at |
| 92 | +this moment. For details see linux/include/asm-ia64/paravirt.h |
| 93 | + - struct pv_info |
| 94 | + This structure describes the execution environment. |
| 95 | + - struct pv_init_ops |
| 96 | + This structure describes the various initialization hooks. |
| 97 | + - struct pv_iosapic_ops |
| 98 | + This structure describes hooks to iosapic operations. |
| 99 | + - struct pv_irq_ops |
| 100 | + This structure describes hooks to irq related operations |
| 101 | + - struct pv_time_op |
| 102 | + This structure describes hooks to steal time accounting. |
| 103 | + |
| 104 | +- a set of indirect calls which need optimization |
| 105 | +Currently this class of functions correspond to a subset of IA64 |
| 106 | +intrinsics. At this moment the optimization with binary patch isn't |
| 107 | +implemented yet. |
| 108 | +struct pv_cpu_op is defined. For details see |
| 109 | +linux/include/asm-ia64/paravirt_privop.h |
| 110 | +Mostly they correspond to ia64 intrinsics 1-to-1. |
| 111 | +Caveat: Now they are defined as C indirect function pointers, but in |
| 112 | +order to support binary patch optimization, they will be changed |
| 113 | +using GCC extended inline assembly code. |
| 114 | + |
| 115 | +- a set of macros for hand written assembly code (.S files) |
| 116 | +For maintenance purpose, the taken approach for .S files is single |
| 117 | +source code and compile multiple times with different macros definitions. |
| 118 | +Each pv_ops instance must define those macros to compile. |
| 119 | +The important thing here is that sensitive, but non-privileged |
| 120 | +instructions must be paravirtualized and that some privileged |
| 121 | +instructions also need paravirtualization for reasonable performance. |
| 122 | +Developers who modify .S files must be aware of that. At this moment |
| 123 | +an easy checker is implemented to detect paravirtualization breakage. |
| 124 | +But it doesn't cover all the cases. |
| 125 | + |
| 126 | +Sometimes this set of macros is called pv_cpu_asm_op. But there is no |
| 127 | +corresponding structure in the source code. |
| 128 | +Those macros mostly 1:1 correspond to a subset of privileged |
| 129 | +instructions. See linux/include/asm-ia64/native/inst.h. |
| 130 | +And some functions written in assembly also need to be overrided so |
| 131 | +that each pv_ops instance have to define some macros. Again see |
| 132 | +linux/include/asm-ia64/native/inst.h. |
| 133 | + |
| 134 | + |
| 135 | +Those structures must be initialized very early before start_kernel. |
| 136 | +Probably initialized in head.S using multi entry point or some other trick. |
| 137 | +For native case implementation see linux/arch/ia64/kernel/paravirt.c. |
0 commit comments