Skip to content

Commit 636deed

Browse files
committed
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Paolo Bonzini: "ARM: - some cleanups - direct physical timer assignment - cache sanitization for 32-bit guests s390: - interrupt cleanup - introduction of the Guest Information Block - preparation for processor subfunctions in cpu models PPC: - bug fixes and improvements, especially related to machine checks and protection keys x86: - many, many cleanups, including removing a bunch of MMU code for unnecessary optimizations - AVIC fixes Generic: - memcg accounting" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (147 commits) kvm: vmx: fix formatting of a comment KVM: doc: Document the life cycle of a VM and its resources MAINTAINERS: Add KVM selftests to existing KVM entry Revert "KVM/MMU: Flush tlb directly in the kvm_zap_gfn_range()" KVM: PPC: Book3S: Add count cache flush parameters to kvmppc_get_cpu_char() KVM: PPC: Fix compilation when KVM is not enabled KVM: Minor cleanups for kvm_main.c KVM: s390: add debug logging for cpu model subfunctions KVM: s390: implement subfunction processor calls arm64: KVM: Fix architecturally invalid reset value for FPEXC32_EL2 KVM: arm/arm64: Remove unused timer variable KVM: PPC: Book3S: Improve KVM reference counting KVM: PPC: Book3S HV: Fix build failure without IOMMU support Revert "KVM: Eliminate extra function calls in kvm_get_dirty_log_protect()" x86: kvmguest: use TSC clocksource if invariant TSC is exposed KVM: Never start grow vCPU halt_poll_ns from value below halt_poll_ns_grow_start KVM: Expose the initial start value in grow_halt_poll_ns() as a module parameter KVM: grow_halt_poll_ns() should never shrink vCPU halt_poll_ns KVM: x86/mmu: Consolidate kvm_mmu_zap_all() and kvm_mmu_zap_mmio_sptes() KVM: x86/mmu: WARN if zapping a MMIO spte results in zapping children ...
2 parents aa2e3ac + 4a605bc commit 636deed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

93 files changed

+2623
-1199
lines changed

Documentation/virtual/kvm/api.txt

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,23 @@ the API. The only supported use is one virtual machine per process,
4545
and one vcpu per thread.
4646

4747

48+
It is important to note that althought VM ioctls may only be issued from
49+
the process that created the VM, a VM's lifecycle is associated with its
50+
file descriptor, not its creator (process). In other words, the VM and
51+
its resources, *including the associated address space*, are not freed
52+
until the last reference to the VM's file descriptor has been released.
53+
For example, if fork() is issued after ioctl(KVM_CREATE_VM), the VM will
54+
not be freed until both the parent (original) process and its child have
55+
put their references to the VM's file descriptor.
56+
57+
Because a VM's resources are not freed until the last reference to its
58+
file descriptor is released, creating additional references to a VM via
59+
via fork(), dup(), etc... without careful consideration is strongly
60+
discouraged and may have unwanted side effects, e.g. memory allocated
61+
by and on behalf of the VM's process may not be freed/unaccounted when
62+
the VM is shut down.
63+
64+
4865
3. Extensions
4966
-------------
5067

Documentation/virtual/kvm/halt-polling.txt

Lines changed: 23 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -53,7 +53,8 @@ the global max polling interval then the polling interval can be increased in
5353
the hope that next time during the longer polling interval the wake up source
5454
will be received while the host is polling and the latency benefits will be
5555
received. The polling interval is grown in the function grow_halt_poll_ns() and
56-
is multiplied by the module parameter halt_poll_ns_grow.
56+
is multiplied by the module parameters halt_poll_ns_grow and
57+
halt_poll_ns_grow_start.
5758

5859
In the event that the total block time was greater than the global max polling
5960
interval then the host will never poll for long enough (limited by the global
@@ -80,22 +81,30 @@ shrunk. These variables are defined in include/linux/kvm_host.h and as module
8081
parameters in virt/kvm/kvm_main.c, or arch/powerpc/kvm/book3s_hv.c in the
8182
powerpc kvm-hv case.
8283

83-
Module Parameter | Description | Default Value
84+
Module Parameter | Description | Default Value
8485
--------------------------------------------------------------------------------
85-
halt_poll_ns | The global max polling interval | KVM_HALT_POLL_NS_DEFAULT
86-
| which defines the ceiling value |
87-
| of the polling interval for | (per arch value)
88-
| each vcpu. |
86+
halt_poll_ns | The global max polling | KVM_HALT_POLL_NS_DEFAULT
87+
| interval which defines |
88+
| the ceiling value of the |
89+
| polling interval for | (per arch value)
90+
| each vcpu. |
8991
--------------------------------------------------------------------------------
90-
halt_poll_ns_grow | The value by which the halt | 2
91-
| polling interval is multiplied |
92-
| in the grow_halt_poll_ns() |
93-
| function. |
92+
halt_poll_ns_grow | The value by which the | 2
93+
| halt polling interval is |
94+
| multiplied in the |
95+
| grow_halt_poll_ns() |
96+
| function. |
9497
--------------------------------------------------------------------------------
95-
halt_poll_ns_shrink | The value by which the halt | 0
96-
| polling interval is divided in |
97-
| the shrink_halt_poll_ns() |
98-
| function. |
98+
halt_poll_ns_grow_start | The initial value to grow | 10000
99+
| to from zero in the |
100+
| grow_halt_poll_ns() |
101+
| function. |
102+
--------------------------------------------------------------------------------
103+
halt_poll_ns_shrink | The value by which the | 0
104+
| halt polling interval is |
105+
| divided in the |
106+
| shrink_halt_poll_ns() |
107+
| function. |
99108
--------------------------------------------------------------------------------
100109

101110
These module parameters can be set from the debugfs files in:

Documentation/virtual/kvm/mmu.txt

Lines changed: 9 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -224,10 +224,6 @@ Shadow pages contain the following information:
224224
A bitmap indicating which sptes in spt point (directly or indirectly) at
225225
pages that may be unsynchronized. Used to quickly locate all unsychronized
226226
pages reachable from a given page.
227-
mmu_valid_gen:
228-
Generation number of the page. It is compared with kvm->arch.mmu_valid_gen
229-
during hash table lookup, and used to skip invalidated shadow pages (see
230-
"Zapping all pages" below.)
231227
clear_spte_count:
232228
Only present on 32-bit hosts, where a 64-bit spte cannot be written
233229
atomically. The reader uses this while running out of the MMU lock
@@ -402,27 +398,6 @@ causes its disallow_lpage to be incremented, thus preventing instantiation of
402398
a large spte. The frames at the end of an unaligned memory slot have
403399
artificially inflated ->disallow_lpages so they can never be instantiated.
404400

405-
Zapping all pages (page generation count)
406-
=========================================
407-
408-
For the large memory guests, walking and zapping all pages is really slow
409-
(because there are a lot of pages), and also blocks memory accesses of
410-
all VCPUs because it needs to hold the MMU lock.
411-
412-
To make it be more scalable, kvm maintains a global generation number
413-
which is stored in kvm->arch.mmu_valid_gen. Every shadow page stores
414-
the current global generation-number into sp->mmu_valid_gen when it
415-
is created. Pages with a mismatching generation number are "obsolete".
416-
417-
When KVM need zap all shadow pages sptes, it just simply increases the global
418-
generation-number then reload root shadow pages on all vcpus. As the VCPUs
419-
create new shadow page tables, the old pages are not used because of the
420-
mismatching generation number.
421-
422-
KVM then walks through all pages and zaps obsolete pages. While the zap
423-
operation needs to take the MMU lock, the lock can be released periodically
424-
so that the VCPUs can make progress.
425-
426401
Fast invalidation of MMIO sptes
427402
===============================
428403

@@ -435,8 +410,7 @@ shadow pages, and is made more scalable with a similar technique.
435410
MMIO sptes have a few spare bits, which are used to store a
436411
generation number. The global generation number is stored in
437412
kvm_memslots(kvm)->generation, and increased whenever guest memory info
438-
changes. This generation number is distinct from the one described in
439-
the previous section.
413+
changes.
440414

441415
When KVM finds an MMIO spte, it checks the generation number of the spte.
442416
If the generation number of the spte does not equal the global generation
@@ -452,13 +426,16 @@ stored into the MMIO spte. Thus, the MMIO spte might be created based on
452426
out-of-date information, but with an up-to-date generation number.
453427

454428
To avoid this, the generation number is incremented again after synchronize_srcu
455-
returns; thus, the low bit of kvm_memslots(kvm)->generation is only 1 during a
429+
returns; thus, bit 63 of kvm_memslots(kvm)->generation set to 1 only during a
456430
memslot update, while some SRCU readers might be using the old copy. We do not
457431
want to use an MMIO sptes created with an odd generation number, and we can do
458-
this without losing a bit in the MMIO spte. The low bit of the generation
459-
is not stored in MMIO spte, and presumed zero when it is extracted out of the
460-
spte. If KVM is unlucky and creates an MMIO spte while the low bit is 1,
461-
the next access to the spte will always be a cache miss.
432+
this without losing a bit in the MMIO spte. The "update in-progress" bit of the
433+
generation is not stored in MMIO spte, and is so is implicitly zero when the
434+
generation is extracted out of the spte. If KVM is unlucky and creates an MMIO
435+
spte while an update is in-progress, the next access to the spte will always be
436+
a cache miss. For example, a subsequent access during the update window will
437+
miss due to the in-progress flag diverging, while an access after the update
438+
window closes will have a higher generation number (as compared to the spte).
462439

463440

464441
Further reading

MAINTAINERS

Lines changed: 8 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -8461,6 +8461,7 @@ F: include/linux/kvm*
84618461
F: include/kvm/iodev.h
84628462
F: virt/kvm/*
84638463
F: tools/kvm/
8464+
F: tools/testing/selftests/kvm/
84648465

84658466
KERNEL VIRTUAL MACHINE FOR AMD-V (KVM/amd)
84668467
M: Joerg Roedel <joro@8bytes.org>
@@ -8470,29 +8471,25 @@ S: Maintained
84708471
F: arch/x86/include/asm/svm.h
84718472
F: arch/x86/kvm/svm.c
84728473

8473-
KERNEL VIRTUAL MACHINE FOR ARM (KVM/arm)
8474+
KERNEL VIRTUAL MACHINE FOR ARM/ARM64 (KVM/arm, KVM/arm64)
84748475
M: Christoffer Dall <christoffer.dall@arm.com>
84758476
M: Marc Zyngier <marc.zyngier@arm.com>
8477+
R: James Morse <james.morse@arm.com>
8478+
R: Julien Thierry <julien.thierry@arm.com>
8479+
R: Suzuki K Pouloze <suzuki.poulose@arm.com>
84768480
L: linux-arm-kernel@lists.infradead.org (moderated for non-subscribers)
84778481
L: kvmarm@lists.cs.columbia.edu
84788482
W: http://systems.cs.columbia.edu/projects/kvm-arm
84798483
T: git git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git
8480-
S: Supported
8484+
S: Maintained
84818485
F: arch/arm/include/uapi/asm/kvm*
84828486
F: arch/arm/include/asm/kvm*
84838487
F: arch/arm/kvm/
8484-
F: virt/kvm/arm/
8485-
F: include/kvm/arm_*
8486-
8487-
KERNEL VIRTUAL MACHINE FOR ARM64 (KVM/arm64)
8488-
M: Christoffer Dall <christoffer.dall@arm.com>
8489-
M: Marc Zyngier <marc.zyngier@arm.com>
8490-
L: linux-arm-kernel@lists.infradead.org (moderated for non-subscribers)
8491-
L: kvmarm@lists.cs.columbia.edu
8492-
S: Maintained
84938488
F: arch/arm64/include/uapi/asm/kvm*
84948489
F: arch/arm64/include/asm/kvm*
84958490
F: arch/arm64/kvm/
8491+
F: virt/kvm/arm/
8492+
F: include/kvm/arm_*
84968493

84978494
KERNEL VIRTUAL MACHINE FOR MIPS (KVM/mips)
84988495
M: James Hogan <jhogan@kernel.org>

arch/arm/include/asm/arch_gicv3.h

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -55,7 +55,7 @@
5555
#define ICH_VTR __ACCESS_CP15(c12, 4, c11, 1)
5656
#define ICH_MISR __ACCESS_CP15(c12, 4, c11, 2)
5757
#define ICH_EISR __ACCESS_CP15(c12, 4, c11, 3)
58-
#define ICH_ELSR __ACCESS_CP15(c12, 4, c11, 5)
58+
#define ICH_ELRSR __ACCESS_CP15(c12, 4, c11, 5)
5959
#define ICH_VMCR __ACCESS_CP15(c12, 4, c11, 7)
6060

6161
#define __LR0(x) __ACCESS_CP15(c12, 4, c12, x)
@@ -152,7 +152,7 @@ CPUIF_MAP(ICH_HCR, ICH_HCR_EL2)
152152
CPUIF_MAP(ICH_VTR, ICH_VTR_EL2)
153153
CPUIF_MAP(ICH_MISR, ICH_MISR_EL2)
154154
CPUIF_MAP(ICH_EISR, ICH_EISR_EL2)
155-
CPUIF_MAP(ICH_ELSR, ICH_ELSR_EL2)
155+
CPUIF_MAP(ICH_ELRSR, ICH_ELRSR_EL2)
156156
CPUIF_MAP(ICH_VMCR, ICH_VMCR_EL2)
157157
CPUIF_MAP(ICH_AP0R3, ICH_AP0R3_EL2)
158158
CPUIF_MAP(ICH_AP0R2, ICH_AP0R2_EL2)

arch/arm/include/asm/kvm_emulate.h

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -265,6 +265,14 @@ static inline bool kvm_vcpu_dabt_isextabt(struct kvm_vcpu *vcpu)
265265
}
266266
}
267267

268+
static inline bool kvm_is_write_fault(struct kvm_vcpu *vcpu)
269+
{
270+
if (kvm_vcpu_trap_is_iabt(vcpu))
271+
return false;
272+
273+
return kvm_vcpu_dabt_iswrite(vcpu);
274+
}
275+
268276
static inline u32 kvm_vcpu_hvc_get_imm(struct kvm_vcpu *vcpu)
269277
{
270278
return kvm_vcpu_get_hsr(vcpu) & HSR_HVC_IMM_MASK;

arch/arm/include/asm/kvm_host.h

Lines changed: 46 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@
2626
#include <asm/kvm_asm.h>
2727
#include <asm/kvm_mmio.h>
2828
#include <asm/fpstate.h>
29+
#include <asm/smp_plat.h>
2930
#include <kvm/arm_arch_timer.h>
3031

3132
#define __KVM_HAVE_ARCH_INTC_INITIALIZED
@@ -57,10 +58,13 @@ int __attribute_const__ kvm_target_cpu(void);
5758
int kvm_reset_vcpu(struct kvm_vcpu *vcpu);
5859
void kvm_reset_coprocs(struct kvm_vcpu *vcpu);
5960

60-
struct kvm_arch {
61-
/* VTTBR value associated with below pgd and vmid */
62-
u64 vttbr;
61+
struct kvm_vmid {
62+
/* The VMID generation used for the virt. memory system */
63+
u64 vmid_gen;
64+
u32 vmid;
65+
};
6366

67+
struct kvm_arch {
6468
/* The last vcpu id that ran on each physical CPU */
6569
int __percpu *last_vcpu_ran;
6670

@@ -70,11 +74,11 @@ struct kvm_arch {
7074
*/
7175

7276
/* The VMID generation used for the virt. memory system */
73-
u64 vmid_gen;
74-
u32 vmid;
77+
struct kvm_vmid vmid;
7578

7679
/* Stage-2 page table */
7780
pgd_t *pgd;
81+
phys_addr_t pgd_phys;
7882

7983
/* Interrupt controller */
8084
struct vgic_dist vgic;
@@ -148,6 +152,13 @@ struct kvm_cpu_context {
148152

149153
typedef struct kvm_cpu_context kvm_cpu_context_t;
150154

155+
static inline void kvm_init_host_cpu_context(kvm_cpu_context_t *cpu_ctxt,
156+
int cpu)
157+
{
158+
/* The host's MPIDR is immutable, so let's set it up at boot time */
159+
cpu_ctxt->cp15[c0_MPIDR] = cpu_logical_map(cpu);
160+
}
161+
151162
struct vcpu_reset_state {
152163
unsigned long pc;
153164
unsigned long r0;
@@ -224,7 +235,35 @@ unsigned long kvm_arm_num_regs(struct kvm_vcpu *vcpu);
224235
int kvm_arm_copy_reg_indices(struct kvm_vcpu *vcpu, u64 __user *indices);
225236
int kvm_arm_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
226237
int kvm_arm_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
227-
unsigned long kvm_call_hyp(void *hypfn, ...);
238+
239+
unsigned long __kvm_call_hyp(void *hypfn, ...);
240+
241+
/*
242+
* The has_vhe() part doesn't get emitted, but is used for type-checking.
243+
*/
244+
#define kvm_call_hyp(f, ...) \
245+
do { \
246+
if (has_vhe()) { \
247+
f(__VA_ARGS__); \
248+
} else { \
249+
__kvm_call_hyp(kvm_ksym_ref(f), ##__VA_ARGS__); \
250+
} \
251+
} while(0)
252+
253+
#define kvm_call_hyp_ret(f, ...) \
254+
({ \
255+
typeof(f(__VA_ARGS__)) ret; \
256+
\
257+
if (has_vhe()) { \
258+
ret = f(__VA_ARGS__); \
259+
} else { \
260+
ret = __kvm_call_hyp(kvm_ksym_ref(f), \
261+
##__VA_ARGS__); \
262+
} \
263+
\
264+
ret; \
265+
})
266+
228267
void force_vm_exit(const cpumask_t *mask);
229268
int __kvm_arm_vcpu_get_events(struct kvm_vcpu *vcpu,
230269
struct kvm_vcpu_events *events);
@@ -275,7 +314,7 @@ static inline void __cpu_init_hyp_mode(phys_addr_t pgd_ptr,
275314
* compliant with the PCS!).
276315
*/
277316

278-
kvm_call_hyp((void*)hyp_stack_ptr, vector_ptr, pgd_ptr);
317+
__kvm_call_hyp((void*)hyp_stack_ptr, vector_ptr, pgd_ptr);
279318
}
280319

281320
static inline void __cpu_init_stage2(void)

arch/arm/include/asm/kvm_hyp.h

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,7 @@
4040
#define TTBR1 __ACCESS_CP15_64(1, c2)
4141
#define VTTBR __ACCESS_CP15_64(6, c2)
4242
#define PAR __ACCESS_CP15_64(0, c7)
43+
#define CNTP_CVAL __ACCESS_CP15_64(2, c14)
4344
#define CNTV_CVAL __ACCESS_CP15_64(3, c14)
4445
#define CNTVOFF __ACCESS_CP15_64(4, c14)
4546

@@ -85,6 +86,7 @@
8586
#define TID_PRIV __ACCESS_CP15(c13, 0, c0, 4)
8687
#define HTPIDR __ACCESS_CP15(c13, 4, c0, 2)
8788
#define CNTKCTL __ACCESS_CP15(c14, 0, c1, 0)
89+
#define CNTP_CTL __ACCESS_CP15(c14, 0, c2, 1)
8890
#define CNTV_CTL __ACCESS_CP15(c14, 0, c3, 1)
8991
#define CNTHCTL __ACCESS_CP15(c14, 4, c1, 0)
9092

@@ -94,6 +96,8 @@
9496
#define read_sysreg_el0(r) read_sysreg(r##_el0)
9597
#define write_sysreg_el0(v, r) write_sysreg(v, r##_el0)
9698

99+
#define cntp_ctl_el0 CNTP_CTL
100+
#define cntp_cval_el0 CNTP_CVAL
97101
#define cntv_ctl_el0 CNTV_CTL
98102
#define cntv_cval_el0 CNTV_CVAL
99103
#define cntvoff_el2 CNTVOFF

arch/arm/include/asm/kvm_mmu.h

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -421,9 +421,14 @@ static inline int hyp_map_aux_data(void)
421421

422422
static inline void kvm_set_ipa_limit(void) {}
423423

424-
static inline bool kvm_cpu_has_cnp(void)
424+
static __always_inline u64 kvm_get_vttbr(struct kvm *kvm)
425425
{
426-
return false;
426+
struct kvm_vmid *vmid = &kvm->arch.vmid;
427+
u64 vmid_field, baddr;
428+
429+
baddr = kvm->arch.pgd_phys;
430+
vmid_field = (u64)vmid->vmid << VTTBR_VMID_SHIFT;
431+
return kvm_phys_to_vttbr(baddr) | vmid_field;
427432
}
428433

429434
#endif /* !__ASSEMBLY__ */

arch/arm/kvm/Makefile

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,9 +8,8 @@ ifeq ($(plus_virt),+virt)
88
plus_virt_def := -DREQUIRES_VIRT=1
99
endif
1010

11-
ccflags-y += -Iarch/arm/kvm -Ivirt/kvm/arm/vgic
12-
CFLAGS_arm.o := -I. $(plus_virt_def)
13-
CFLAGS_mmu.o := -I.
11+
ccflags-y += -I $(srctree)/$(src) -I $(srctree)/virt/kvm/arm/vgic
12+
CFLAGS_arm.o := $(plus_virt_def)
1413

1514
AFLAGS_init.o := -Wa,-march=armv7-a$(plus_virt)
1615
AFLAGS_interrupts.o := -Wa,-march=armv7-a$(plus_virt)

0 commit comments

Comments
 (0)