Skip to content

Commit e4f7a94

Browse files
lsgunthbjorn-helgaas
authored andcommitted
PCI/P2PDMA: Add P2P DMA driver writer's documentation
Add a restructured text file describing how to write drivers with support for P2P DMA transactions. The document describes how to use the APIs that were added in the previous few commits. Also adds an index for the PCI documentation tree even though this is the only PCI document that has been converted to restructured text at this time. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Jonathan Corbet <corbet@lwn.net>
1 parent fcc78f9 commit e4f7a94

File tree

2 files changed

+146
-0
lines changed

2 files changed

+146
-0
lines changed

Documentation/driver-api/pci/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@ The Linux PCI driver implementer's API guide
1212
:maxdepth: 2
1313

1414
pci
15+
p2pdma
1516

1617
.. only:: subproject and html
1718

Lines changed: 145 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,145 @@
1+
.. SPDX-License-Identifier: GPL-2.0
2+
3+
============================
4+
PCI Peer-to-Peer DMA Support
5+
============================
6+
7+
The PCI bus has pretty decent support for performing DMA transfers
8+
between two devices on the bus. This type of transaction is henceforth
9+
called Peer-to-Peer (or P2P). However, there are a number of issues that
10+
make P2P transactions tricky to do in a perfectly safe way.
11+
12+
One of the biggest issues is that PCI doesn't require forwarding
13+
transactions between hierarchy domains, and in PCIe, each Root Port
14+
defines a separate hierarchy domain. To make things worse, there is no
15+
simple way to determine if a given Root Complex supports this or not.
16+
(See PCIe r4.0, sec 1.3.1). Therefore, as of this writing, the kernel
17+
only supports doing P2P when the endpoints involved are all behind the
18+
same PCI bridge, as such devices are all in the same PCI hierarchy
19+
domain, and the spec guarantees that all transactions within the
20+
hierarchy will be routable, but it does not require routing
21+
between hierarchies.
22+
23+
The second issue is that to make use of existing interfaces in Linux,
24+
memory that is used for P2P transactions needs to be backed by struct
25+
pages. However, PCI BARs are not typically cache coherent so there are
26+
a few corner case gotchas with these pages so developers need to
27+
be careful about what they do with them.
28+
29+
30+
Driver Writer's Guide
31+
=====================
32+
33+
In a given P2P implementation there may be three or more different
34+
types of kernel drivers in play:
35+
36+
* Provider - A driver which provides or publishes P2P resources like
37+
memory or doorbell registers to other drivers.
38+
* Client - A driver which makes use of a resource by setting up a
39+
DMA transaction to or from it.
40+
* Orchestrator - A driver which orchestrates the flow of data between
41+
clients and providers.
42+
43+
In many cases there could be overlap between these three types (i.e.,
44+
it may be typical for a driver to be both a provider and a client).
45+
46+
For example, in the NVMe Target Copy Offload implementation:
47+
48+
* The NVMe PCI driver is both a client, provider and orchestrator
49+
in that it exposes any CMB (Controller Memory Buffer) as a P2P memory
50+
resource (provider), it accepts P2P memory pages as buffers in requests
51+
to be used directly (client) and it can also make use of the CMB as
52+
submission queue entries (orchastrator).
53+
* The RDMA driver is a client in this arrangement so that an RNIC
54+
can DMA directly to the memory exposed by the NVMe device.
55+
* The NVMe Target driver (nvmet) can orchestrate the data from the RNIC
56+
to the P2P memory (CMB) and then to the NVMe device (and vice versa).
57+
58+
This is currently the only arrangement supported by the kernel but
59+
one could imagine slight tweaks to this that would allow for the same
60+
functionality. For example, if a specific RNIC added a BAR with some
61+
memory behind it, its driver could add support as a P2P provider and
62+
then the NVMe Target could use the RNIC's memory instead of the CMB
63+
in cases where the NVMe cards in use do not have CMB support.
64+
65+
66+
Provider Drivers
67+
----------------
68+
69+
A provider simply needs to register a BAR (or a portion of a BAR)
70+
as a P2P DMA resource using :c:func:`pci_p2pdma_add_resource()`.
71+
This will register struct pages for all the specified memory.
72+
73+
After that it may optionally publish all of its resources as
74+
P2P memory using :c:func:`pci_p2pmem_publish()`. This will allow
75+
any orchestrator drivers to find and use the memory. When marked in
76+
this way, the resource must be regular memory with no side effects.
77+
78+
For the time being this is fairly rudimentary in that all resources
79+
are typically going to be P2P memory. Future work will likely expand
80+
this to include other types of resources like doorbells.
81+
82+
83+
Client Drivers
84+
--------------
85+
86+
A client driver typically only has to conditionally change its DMA map
87+
routine to use the mapping function :c:func:`pci_p2pdma_map_sg()` instead
88+
of the usual :c:func:`dma_map_sg()` function. Memory mapped in this
89+
way does not need to be unmapped.
90+
91+
The client may also, optionally, make use of
92+
:c:func:`is_pci_p2pdma_page()` to determine when to use the P2P mapping
93+
functions and when to use the regular mapping functions. In some
94+
situations, it may be more appropriate to use a flag to indicate a
95+
given request is P2P memory and map appropriately. It is important to
96+
ensure that struct pages that back P2P memory stay out of code that
97+
does not have support for them as other code may treat the pages as
98+
regular memory which may not be appropriate.
99+
100+
101+
Orchestrator Drivers
102+
--------------------
103+
104+
The first task an orchestrator driver must do is compile a list of
105+
all client devices that will be involved in a given transaction. For
106+
example, the NVMe Target driver creates a list including the namespace
107+
block device and the RNIC in use. If the orchestrator has access to
108+
a specific P2P provider to use it may check compatibility using
109+
:c:func:`pci_p2pdma_distance()` otherwise it may find a memory provider
110+
that's compatible with all clients using :c:func:`pci_p2pmem_find()`.
111+
If more than one provider is supported, the one nearest to all the clients will
112+
be chosen first. If more than one provider is an equal distance away, the
113+
one returned will be chosen at random (it is not an arbitrary but
114+
truely random). This function returns the PCI device to use for the provider
115+
with a reference taken and therefore when it's no longer needed it should be
116+
returned with pci_dev_put().
117+
118+
Once a provider is selected, the orchestrator can then use
119+
:c:func:`pci_alloc_p2pmem()` and :c:func:`pci_free_p2pmem()` to
120+
allocate P2P memory from the provider. :c:func:`pci_p2pmem_alloc_sgl()`
121+
and :c:func:`pci_p2pmem_free_sgl()` are convenience functions for
122+
allocating scatter-gather lists with P2P memory.
123+
124+
Struct Page Caveats
125+
-------------------
126+
127+
Driver writers should be very careful about not passing these special
128+
struct pages to code that isn't prepared for it. At this time, the kernel
129+
interfaces do not have any checks for ensuring this. This obviously
130+
precludes passing these pages to userspace.
131+
132+
P2P memory is also technically IO memory but should never have any side
133+
effects behind it. Thus, the order of loads and stores should not be important
134+
and ioreadX(), iowriteX() and friends should not be necessary.
135+
However, as the memory is not cache coherent, if access ever needs to
136+
be protected by a spinlock then :c:func:`mmiowb()` must be used before
137+
unlocking the lock. (See ACQUIRES VS I/O ACCESSES in
138+
Documentation/memory-barriers.txt)
139+
140+
141+
P2P DMA Support Library
142+
=======================
143+
144+
.. kernel-doc:: drivers/pci/p2pdma.c
145+
:export:

0 commit comments

Comments
 (0)