Skip to content

Commit 80fac0f

Browse files
committed
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull more KVM updates from Paolo Bonzini: - ARM bugfix and MSI injection support - x86 nested virt tweak and OOPS fix - Simplify pvclock code (vdso bits acked by Andy Lutomirski). * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: nvmx: mark ept single context invalidation as supported nvmx: remove comment about missing nested vpid support KVM: lapic: fix access preemption timer stuff even if kernel_irqchip=off KVM: documentation: fix KVM_CAP_X2APIC_API information x86: vdso: use __pvclock_read_cycles pvclock: introduce seqcount-like API arm64: KVM: Set cpsr before spsr on fault injection KVM: arm: vgic-irqfd: Workaround changing kvm_set_routing_entry prototype KVM: arm/arm64: Enable MSI routing KVM: arm/arm64: Enable irqchip routing KVM: Move kvm_setup_default/empty_irq_routing declaration in arch specific header KVM: irqchip: Convey devid to kvm_set_msi KVM: Add devid in kvm_kernel_irq_routing_entry KVM: api: Pass the devid in the msi routing entry
2 parents 4305f42 + 45e1181 commit 80fac0f

File tree

21 files changed

+273
-117
lines changed

21 files changed

+273
-117
lines changed

Documentation/virtual/kvm/api.txt

Lines changed: 37 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1433,13 +1433,16 @@ KVM_ASSIGN_DEV_IRQ. Partial deassignment of host or guest IRQ is allowed.
14331433
4.52 KVM_SET_GSI_ROUTING
14341434

14351435
Capability: KVM_CAP_IRQ_ROUTING
1436-
Architectures: x86 s390
1436+
Architectures: x86 s390 arm arm64
14371437
Type: vm ioctl
14381438
Parameters: struct kvm_irq_routing (in)
14391439
Returns: 0 on success, -1 on error
14401440

14411441
Sets the GSI routing table entries, overwriting any previously set entries.
14421442

1443+
On arm/arm64, GSI routing has the following limitation:
1444+
- GSI routing does not apply to KVM_IRQ_LINE but only to KVM_IRQFD.
1445+
14431446
struct kvm_irq_routing {
14441447
__u32 nr;
14451448
__u32 flags;
@@ -1468,7 +1471,13 @@ struct kvm_irq_routing_entry {
14681471
#define KVM_IRQ_ROUTING_S390_ADAPTER 3
14691472
#define KVM_IRQ_ROUTING_HV_SINT 4
14701473

1471-
No flags are specified so far, the corresponding field must be set to zero.
1474+
flags:
1475+
- KVM_MSI_VALID_DEVID: used along with KVM_IRQ_ROUTING_MSI routing entry
1476+
type, specifies that the devid field contains a valid value. The per-VM
1477+
KVM_CAP_MSI_DEVID capability advertises the requirement to provide
1478+
the device ID. If this capability is not available, userspace should
1479+
never set the KVM_MSI_VALID_DEVID flag as the ioctl might fail.
1480+
- zero otherwise
14721481

14731482
struct kvm_irq_routing_irqchip {
14741483
__u32 irqchip;
@@ -1479,9 +1488,16 @@ struct kvm_irq_routing_msi {
14791488
__u32 address_lo;
14801489
__u32 address_hi;
14811490
__u32 data;
1482-
__u32 pad;
1491+
union {
1492+
__u32 pad;
1493+
__u32 devid;
1494+
};
14831495
};
14841496

1497+
If KVM_MSI_VALID_DEVID is set, devid contains a unique device identifier
1498+
for the device that wrote the MSI message. For PCI, this is usually a
1499+
BFD identifier in the lower 16 bits.
1500+
14851501
On x86, address_hi is ignored unless the KVM_X2APIC_API_USE_32BIT_IDS
14861502
feature of KVM_CAP_X2APIC_API capability is enabled. If it is enabled,
14871503
address_hi bits 31-8 provide bits 31-8 of the destination id. Bits 7-0 of
@@ -2199,18 +2215,19 @@ struct kvm_msi {
21992215
__u8 pad[12];
22002216
};
22012217

2202-
flags: KVM_MSI_VALID_DEVID: devid contains a valid value
2203-
devid: If KVM_MSI_VALID_DEVID is set, contains a unique device identifier
2204-
for the device that wrote the MSI message.
2205-
For PCI, this is usually a BFD identifier in the lower 16 bits.
2218+
flags: KVM_MSI_VALID_DEVID: devid contains a valid value. The per-VM
2219+
KVM_CAP_MSI_DEVID capability advertises the requirement to provide
2220+
the device ID. If this capability is not available, userspace
2221+
should never set the KVM_MSI_VALID_DEVID flag as the ioctl might fail.
22062222

2207-
The per-VM KVM_CAP_MSI_DEVID capability advertises the need to provide
2208-
the device ID. If this capability is not set, userland cannot rely on
2209-
the kernel to allow the KVM_MSI_VALID_DEVID flag being set.
2223+
If KVM_MSI_VALID_DEVID is set, devid contains a unique device identifier
2224+
for the device that wrote the MSI message. For PCI, this is usually a
2225+
BFD identifier in the lower 16 bits.
22102226

2211-
On x86, address_hi is ignored unless the KVM_CAP_X2APIC_API capability is
2212-
enabled. If it is enabled, address_hi bits 31-8 provide bits 31-8 of the
2213-
destination id. Bits 7-0 of address_hi must be zero.
2227+
On x86, address_hi is ignored unless the KVM_X2APIC_API_USE_32BIT_IDS
2228+
feature of KVM_CAP_X2APIC_API capability is enabled. If it is enabled,
2229+
address_hi bits 31-8 provide bits 31-8 of the destination id. Bits 7-0 of
2230+
address_hi must be zero.
22142231

22152232

22162233
4.71 KVM_CREATE_PIT2
@@ -2383,9 +2400,13 @@ Note that closing the resamplefd is not sufficient to disable the
23832400
irqfd. The KVM_IRQFD_FLAG_RESAMPLE is only necessary on assignment
23842401
and need not be specified with KVM_IRQFD_FLAG_DEASSIGN.
23852402

2386-
On ARM/ARM64, the gsi field in the kvm_irqfd struct specifies the Shared
2387-
Peripheral Interrupt (SPI) index, such that the GIC interrupt ID is
2388-
given by gsi + 32.
2403+
On arm/arm64, gsi routing being supported, the following can happen:
2404+
- in case no routing entry is associated to this gsi, injection fails
2405+
- in case the gsi is associated to an irqchip routing entry,
2406+
irqchip.pin + 32 corresponds to the injected SPI ID.
2407+
- in case the gsi is associated to an MSI routing entry, the MSI
2408+
message and device ID are translated into an LPI (support restricted
2409+
to GICv3 ITS in-kernel emulation).
23892410

23902411
4.76 KVM_PPC_ALLOCATE_HTAB
23912412

arch/arm/kvm/Kconfig

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,8 @@ config KVM
3232
select KVM_VFIO
3333
select HAVE_KVM_EVENTFD
3434
select HAVE_KVM_IRQFD
35+
select HAVE_KVM_IRQCHIP
36+
select HAVE_KVM_IRQ_ROUTING
3537
depends on ARM_VIRT_EXT && ARM_LPAE && ARM_ARCH_TIMER
3638
---help---
3739
Support hosting virtualized guest machines.

arch/arm/kvm/Makefile

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,4 +29,5 @@ obj-y += $(KVM)/arm/vgic/vgic-v2.o
2929
obj-y += $(KVM)/arm/vgic/vgic-mmio.o
3030
obj-y += $(KVM)/arm/vgic/vgic-mmio-v2.o
3131
obj-y += $(KVM)/arm/vgic/vgic-kvm-device.o
32+
obj-y += $(KVM)/irqchip.o
3233
obj-y += $(KVM)/arm/arch_timer.o

arch/arm/kvm/irq.h

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
/*
2+
* irq.h: in kernel interrupt controller related definitions
3+
* Copyright (c) 2016 Red Hat, Inc.
4+
*
5+
* This program is free software; you can redistribute it and/or modify it
6+
* under the terms and conditions of the GNU General Public License,
7+
* version 2, as published by the Free Software Foundation.
8+
*
9+
* This header is included by irqchip.c. However, on ARM, interrupt
10+
* controller declarations are located in include/kvm/arm_vgic.h since
11+
* they are mostly shared between arm and arm64.
12+
*/
13+
14+
#ifndef __IRQ_H
15+
#define __IRQ_H
16+
17+
#include <kvm/arm_vgic.h>
18+
19+
#endif

arch/arm64/kvm/Kconfig

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,8 @@ config KVM
3737
select KVM_ARM_VGIC_V3
3838
select KVM_ARM_PMU if HW_PERF_EVENTS
3939
select HAVE_KVM_MSI
40+
select HAVE_KVM_IRQCHIP
41+
select HAVE_KVM_IRQ_ROUTING
4042
---help---
4143
Support hosting virtualized guest machines.
4244
We don't support KVM with 16K page tables yet, due to the multiple

arch/arm64/kvm/Makefile

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,5 +30,6 @@ kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic/vgic-mmio-v2.o
3030
kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic/vgic-mmio-v3.o
3131
kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic/vgic-kvm-device.o
3232
kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic/vgic-its.o
33+
kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/irqchip.o
3334
kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/arch_timer.o
3435
kvm-$(CONFIG_KVM_ARM_PMU) += $(KVM)/arm/pmu.o

arch/arm64/kvm/inject_fault.c

Lines changed: 5 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -132,16 +132,14 @@ static u64 get_except_vector(struct kvm_vcpu *vcpu, enum exception_type type)
132132
static void inject_abt64(struct kvm_vcpu *vcpu, bool is_iabt, unsigned long addr)
133133
{
134134
unsigned long cpsr = *vcpu_cpsr(vcpu);
135-
bool is_aarch32;
135+
bool is_aarch32 = vcpu_mode_is_32bit(vcpu);
136136
u32 esr = 0;
137137

138-
is_aarch32 = vcpu_mode_is_32bit(vcpu);
139-
140-
*vcpu_spsr(vcpu) = cpsr;
141138
*vcpu_elr_el1(vcpu) = *vcpu_pc(vcpu);
142-
143139
*vcpu_pc(vcpu) = get_except_vector(vcpu, except_type_sync);
140+
144141
*vcpu_cpsr(vcpu) = PSTATE_FAULT_BITS_64;
142+
*vcpu_spsr(vcpu) = cpsr;
145143

146144
vcpu_sys_reg(vcpu, FAR_EL1) = addr;
147145

@@ -172,11 +170,11 @@ static void inject_undef64(struct kvm_vcpu *vcpu)
172170
unsigned long cpsr = *vcpu_cpsr(vcpu);
173171
u32 esr = (ESR_ELx_EC_UNKNOWN << ESR_ELx_EC_SHIFT);
174172

175-
*vcpu_spsr(vcpu) = cpsr;
176173
*vcpu_elr_el1(vcpu) = *vcpu_pc(vcpu);
177-
178174
*vcpu_pc(vcpu) = get_except_vector(vcpu, except_type_sync);
175+
179176
*vcpu_cpsr(vcpu) = PSTATE_FAULT_BITS_64;
177+
*vcpu_spsr(vcpu) = cpsr;
180178

181179
/*
182180
* Build an unknown exception, depending on the instruction

arch/arm64/kvm/irq.h

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
/*
2+
* irq.h: in kernel interrupt controller related definitions
3+
* Copyright (c) 2016 Red Hat, Inc.
4+
*
5+
* This program is free software; you can redistribute it and/or modify it
6+
* under the terms and conditions of the GNU General Public License,
7+
* version 2, as published by the Free Software Foundation.
8+
*
9+
* This header is included by irqchip.c. However, on ARM, interrupt
10+
* controller declarations are located in include/kvm/arm_vgic.h since
11+
* they are mostly shared between arm and arm64.
12+
*/
13+
14+
#ifndef __IRQ_H
15+
#define __IRQ_H
16+
17+
#include <kvm/arm_vgic.h>
18+
19+
#endif

arch/x86/entry/vdso/vclock_gettime.c

Lines changed: 5 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -96,9 +96,8 @@ static notrace cycle_t vread_pvclock(int *mode)
9696
{
9797
const struct pvclock_vcpu_time_info *pvti = &get_pvti0()->pvti;
9898
cycle_t ret;
99-
u64 tsc, pvti_tsc;
100-
u64 last, delta, pvti_system_time;
101-
u32 version, pvti_tsc_to_system_mul, pvti_tsc_shift;
99+
u64 last;
100+
u32 version;
102101

103102
/*
104103
* Note: The kernel and hypervisor must guarantee that cpu ID
@@ -123,29 +122,15 @@ static notrace cycle_t vread_pvclock(int *mode)
123122
*/
124123

125124
do {
126-
version = pvti->version;
127-
128-
smp_rmb();
125+
version = pvclock_read_begin(pvti);
129126

130127
if (unlikely(!(pvti->flags & PVCLOCK_TSC_STABLE_BIT))) {
131128
*mode = VCLOCK_NONE;
132129
return 0;
133130
}
134131

135-
tsc = rdtsc_ordered();
136-
pvti_tsc_to_system_mul = pvti->tsc_to_system_mul;
137-
pvti_tsc_shift = pvti->tsc_shift;
138-
pvti_system_time = pvti->system_time;
139-
pvti_tsc = pvti->tsc_timestamp;
140-
141-
/* Make sure that the version double-check is last. */
142-
smp_rmb();
143-
} while (unlikely((version & 1) || version != pvti->version));
144-
145-
delta = tsc - pvti_tsc;
146-
ret = pvti_system_time +
147-
pvclock_scale_delta(delta, pvti_tsc_to_system_mul,
148-
pvti_tsc_shift);
132+
ret = __pvclock_read_cycles(pvti);
133+
} while (pvclock_read_retry(pvti, version));
149134

150135
/* refer to vread_tsc() comment for rationale */
151136
last = gtod->cycle_last;

arch/x86/include/asm/pvclock.h

Lines changed: 23 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,24 @@ void pvclock_resume(void);
2525

2626
void pvclock_touch_watchdogs(void);
2727

28+
static __always_inline
29+
unsigned pvclock_read_begin(const struct pvclock_vcpu_time_info *src)
30+
{
31+
unsigned version = src->version & ~1;
32+
/* Make sure that the version is read before the data. */
33+
virt_rmb();
34+
return version;
35+
}
36+
37+
static __always_inline
38+
bool pvclock_read_retry(const struct pvclock_vcpu_time_info *src,
39+
unsigned version)
40+
{
41+
/* Make sure that the version is re-read after the data. */
42+
virt_rmb();
43+
return unlikely(version != src->version);
44+
}
45+
2846
/*
2947
* Scale a 64-bit delta by scaling and multiplying by a 32-bit fraction,
3048
* yielding a 64-bit result.
@@ -69,23 +87,12 @@ static inline u64 pvclock_scale_delta(u64 delta, u32 mul_frac, int shift)
6987
}
7088

7189
static __always_inline
72-
unsigned __pvclock_read_cycles(const struct pvclock_vcpu_time_info *src,
73-
cycle_t *cycles, u8 *flags)
90+
cycle_t __pvclock_read_cycles(const struct pvclock_vcpu_time_info *src)
7491
{
75-
unsigned version;
76-
cycle_t offset;
77-
u64 delta;
78-
79-
version = src->version;
80-
/* Make the latest version visible */
81-
smp_rmb();
82-
83-
delta = rdtsc_ordered() - src->tsc_timestamp;
84-
offset = pvclock_scale_delta(delta, src->tsc_to_system_mul,
85-
src->tsc_shift);
86-
*cycles = src->system_time + offset;
87-
*flags = src->flags;
88-
return version;
92+
u64 delta = rdtsc_ordered() - src->tsc_timestamp;
93+
cycle_t offset = pvclock_scale_delta(delta, src->tsc_to_system_mul,
94+
src->tsc_shift);
95+
return src->system_time + offset;
8996
}
9097

9198
struct pvclock_vsyscall_time_info {

arch/x86/kernel/pvclock.c

Lines changed: 6 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -64,14 +64,9 @@ u8 pvclock_read_flags(struct pvclock_vcpu_time_info *src)
6464
u8 flags;
6565

6666
do {
67-
version = src->version;
68-
/* Make the latest version visible */
69-
smp_rmb();
70-
67+
version = pvclock_read_begin(src);
7168
flags = src->flags;
72-
/* Make sure that the version double-check is last. */
73-
smp_rmb();
74-
} while ((src->version & 1) || version != src->version);
69+
} while (pvclock_read_retry(src, version));
7570

7671
return flags & valid_flags;
7772
}
@@ -84,10 +79,10 @@ cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src)
8479
u8 flags;
8580

8681
do {
87-
version = __pvclock_read_cycles(src, &ret, &flags);
88-
/* Make sure that the version double-check is last. */
89-
smp_rmb();
90-
} while ((src->version & 1) || version != src->version);
82+
version = pvclock_read_begin(src);
83+
ret = __pvclock_read_cycles(src);
84+
flags = src->flags;
85+
} while (pvclock_read_retry(src, version));
9186

9287
if (unlikely((flags & PVCLOCK_GUEST_STOPPED) != 0)) {
9388
src->flags &= ~PVCLOCK_GUEST_STOPPED;

arch/x86/kvm/irq.h

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -120,4 +120,7 @@ void __kvm_migrate_timers(struct kvm_vcpu *vcpu);
120120

121121
int apic_has_pending_timer(struct kvm_vcpu *vcpu);
122122

123+
int kvm_setup_default_irq_routing(struct kvm *kvm);
124+
int kvm_setup_empty_irq_routing(struct kvm *kvm);
125+
123126
#endif

arch/x86/kvm/lapic.c

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1349,6 +1349,9 @@ static void start_sw_tscdeadline(struct kvm_lapic *apic)
13491349

13501350
bool kvm_lapic_hv_timer_in_use(struct kvm_vcpu *vcpu)
13511351
{
1352+
if (!lapic_in_kernel(vcpu))
1353+
return false;
1354+
13521355
return vcpu->arch.apic->lapic_timer.hv_timer_in_use;
13531356
}
13541357
EXPORT_SYMBOL_GPL(kvm_lapic_hv_timer_in_use);

arch/x86/kvm/vmx.c

Lines changed: 7 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -2809,12 +2809,8 @@ static void nested_vmx_setup_ctls_msrs(struct vcpu_vmx *vmx)
28092809
vmx->nested.nested_vmx_ept_caps |=
28102810
VMX_EPT_EXECUTE_ONLY_BIT;
28112811
vmx->nested.nested_vmx_ept_caps &= vmx_capability.ept;
2812-
/*
2813-
* For nested guests, we don't do anything specific
2814-
* for single context invalidation. Hence, only advertise
2815-
* support for global context invalidation.
2816-
*/
2817-
vmx->nested.nested_vmx_ept_caps |= VMX_EPT_EXTENT_GLOBAL_BIT;
2812+
vmx->nested.nested_vmx_ept_caps |= VMX_EPT_EXTENT_GLOBAL_BIT |
2813+
VMX_EPT_EXTENT_CONTEXT_BIT;
28182814
} else
28192815
vmx->nested.nested_vmx_ept_caps = 0;
28202816

@@ -2945,7 +2941,6 @@ static int vmx_get_vmx_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata)
29452941
vmx->nested.nested_vmx_secondary_ctls_high);
29462942
break;
29472943
case MSR_IA32_VMX_EPT_VPID_CAP:
2948-
/* Currently, no nested vpid support */
29492944
*pdata = vmx->nested.nested_vmx_ept_caps |
29502945
((u64)vmx->nested.nested_vmx_vpid_caps << 32);
29512946
break;
@@ -7609,12 +7604,16 @@ static int handle_invept(struct kvm_vcpu *vcpu)
76097604

76107605
switch (type) {
76117606
case VMX_EPT_EXTENT_GLOBAL:
7607+
/*
7608+
* TODO: track mappings and invalidate
7609+
* single context requests appropriately
7610+
*/
7611+
case VMX_EPT_EXTENT_CONTEXT:
76127612
kvm_mmu_sync_roots(vcpu);
76137613
kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
76147614
nested_vmx_succeed(vcpu);
76157615
break;
76167616
default:
7617-
/* Trap single context invalidation invept calls */
76187617
BUG_ON(1);
76197618
break;
76207619
}

0 commit comments

Comments
 (0)