Skip to content

Commit 6794ad5

Browse files
christofferdall-armMarc Zyngier
authored andcommitted
KVM: arm/arm64: Fix unintended stage 2 PMD mappings
There are two things we need to take care of when we create block mappings in the stage 2 page tables: (1) The alignment within a PMD between the host address range and the guest IPA range must be the same, since otherwise we end up mapping pages with the wrong offset. (2) The head and tail of a memory slot may not cover a full block size, and we have to take care to not map those with block descriptors, since we could expose memory to the guest that the host did not intend to expose. So far, we have been taking care of (1), but not (2), and our commentary describing (1) was somewhat confusing. This commit attempts to factor out the checks of both into a common function, and if we don't pass the check, we won't attempt any PMD mappings for neither hugetlbfs nor THP. Note that we used to only check the alignment for THP, not for hugetlbfs, but as far as I can tell the check needs to be applied to both scenarios. Cc: Ralph Palutke <ralph.palutke@fau.de> Cc: Lukas Braun <koomi@moshbit.net> Reported-by: Lukas Braun <koomi@moshbit.net> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
1 parent 107352a commit 6794ad5

File tree

1 file changed

+64
-22
lines changed

1 file changed

+64
-22
lines changed

virt/kvm/arm/mmu.c

Lines changed: 64 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -1595,6 +1595,63 @@ static void kvm_send_hwpoison_signal(unsigned long address,
15951595
send_sig_mceerr(BUS_MCEERR_AR, (void __user *)address, lsb, current);
15961596
}
15971597

1598+
static bool fault_supports_stage2_pmd_mappings(struct kvm_memory_slot *memslot,
1599+
unsigned long hva)
1600+
{
1601+
gpa_t gpa_start, gpa_end;
1602+
hva_t uaddr_start, uaddr_end;
1603+
size_t size;
1604+
1605+
size = memslot->npages * PAGE_SIZE;
1606+
1607+
gpa_start = memslot->base_gfn << PAGE_SHIFT;
1608+
gpa_end = gpa_start + size;
1609+
1610+
uaddr_start = memslot->userspace_addr;
1611+
uaddr_end = uaddr_start + size;
1612+
1613+
/*
1614+
* Pages belonging to memslots that don't have the same alignment
1615+
* within a PMD for userspace and IPA cannot be mapped with stage-2
1616+
* PMD entries, because we'll end up mapping the wrong pages.
1617+
*
1618+
* Consider a layout like the following:
1619+
*
1620+
* memslot->userspace_addr:
1621+
* +-----+--------------------+--------------------+---+
1622+
* |abcde|fgh Stage-1 PMD | Stage-1 PMD tv|xyz|
1623+
* +-----+--------------------+--------------------+---+
1624+
*
1625+
* memslot->base_gfn << PAGE_SIZE:
1626+
* +---+--------------------+--------------------+-----+
1627+
* |abc|def Stage-2 PMD | Stage-2 PMD |tvxyz|
1628+
* +---+--------------------+--------------------+-----+
1629+
*
1630+
* If we create those stage-2 PMDs, we'll end up with this incorrect
1631+
* mapping:
1632+
* d -> f
1633+
* e -> g
1634+
* f -> h
1635+
*/
1636+
if ((gpa_start & ~S2_PMD_MASK) != (uaddr_start & ~S2_PMD_MASK))
1637+
return false;
1638+
1639+
/*
1640+
* Next, let's make sure we're not trying to map anything not covered
1641+
* by the memslot. This means we have to prohibit PMD size mappings
1642+
* for the beginning and end of a non-PMD aligned and non-PMD sized
1643+
* memory slot (illustrated by the head and tail parts of the
1644+
* userspace view above containing pages 'abcde' and 'xyz',
1645+
* respectively).
1646+
*
1647+
* Note that it doesn't matter if we do the check using the
1648+
* userspace_addr or the base_gfn, as both are equally aligned (per
1649+
* the check above) and equally sized.
1650+
*/
1651+
return (hva & S2_PMD_MASK) >= uaddr_start &&
1652+
(hva & S2_PMD_MASK) + S2_PMD_SIZE <= uaddr_end;
1653+
}
1654+
15981655
static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
15991656
struct kvm_memory_slot *memslot, unsigned long hva,
16001657
unsigned long fault_status)
@@ -1621,6 +1678,12 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
16211678
return -EFAULT;
16221679
}
16231680

1681+
if (!fault_supports_stage2_pmd_mappings(memslot, hva))
1682+
force_pte = true;
1683+
1684+
if (logging_active)
1685+
force_pte = true;
1686+
16241687
/* Let's check if we will get back a huge page backed by hugetlbfs */
16251688
down_read(&current->mm->mmap_sem);
16261689
vma = find_vma_intersection(current->mm, hva, hva + 1);
@@ -1637,28 +1700,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
16371700
*/
16381701
if ((vma_pagesize == PMD_SIZE ||
16391702
(vma_pagesize == PUD_SIZE && kvm_stage2_has_pud(kvm))) &&
1640-
!logging_active) {
1703+
!force_pte) {
16411704
gfn = (fault_ipa & huge_page_mask(hstate_vma(vma))) >> PAGE_SHIFT;
1642-
} else {
1643-
/*
1644-
* Fallback to PTE if it's not one of the Stage 2
1645-
* supported hugepage sizes or the corresponding level
1646-
* doesn't exist
1647-
*/
1648-
vma_pagesize = PAGE_SIZE;
1649-
1650-
/*
1651-
* Pages belonging to memslots that don't have the same
1652-
* alignment for userspace and IPA cannot be mapped using
1653-
* block descriptors even if the pages belong to a THP for
1654-
* the process, because the stage-2 block descriptor will
1655-
* cover more than a single THP and we loose atomicity for
1656-
* unmapping, updates, and splits of the THP or other pages
1657-
* in the stage-2 block range.
1658-
*/
1659-
if ((memslot->userspace_addr & ~PMD_MASK) !=
1660-
((memslot->base_gfn << PAGE_SHIFT) & ~PMD_MASK))
1661-
force_pte = true;
16621705
}
16631706
up_read(&current->mm->mmap_sem);
16641707

@@ -1697,7 +1740,6 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
16971740
* should not be mapped with huge pages (it introduces churn
16981741
* and performance degradation), so force a pte mapping.
16991742
*/
1700-
force_pte = true;
17011743
flags |= KVM_S2_FLAG_LOGGING_ACTIVE;
17021744

17031745
/*

0 commit comments

Comments
 (0)