|
| 1 | +.. SPDX-License-Identifier: GPL-2.0 |
| 2 | +
|
| 3 | +============================ |
| 4 | +PCI Peer-to-Peer DMA Support |
| 5 | +============================ |
| 6 | + |
| 7 | +The PCI bus has pretty decent support for performing DMA transfers |
| 8 | +between two devices on the bus. This type of transaction is henceforth |
| 9 | +called Peer-to-Peer (or P2P). However, there are a number of issues that |
| 10 | +make P2P transactions tricky to do in a perfectly safe way. |
| 11 | + |
| 12 | +One of the biggest issues is that PCI doesn't require forwarding |
| 13 | +transactions between hierarchy domains, and in PCIe, each Root Port |
| 14 | +defines a separate hierarchy domain. To make things worse, there is no |
| 15 | +simple way to determine if a given Root Complex supports this or not. |
| 16 | +(See PCIe r4.0, sec 1.3.1). Therefore, as of this writing, the kernel |
| 17 | +only supports doing P2P when the endpoints involved are all behind the |
| 18 | +same PCI bridge, as such devices are all in the same PCI hierarchy |
| 19 | +domain, and the spec guarantees that all transactions within the |
| 20 | +hierarchy will be routable, but it does not require routing |
| 21 | +between hierarchies. |
| 22 | + |
| 23 | +The second issue is that to make use of existing interfaces in Linux, |
| 24 | +memory that is used for P2P transactions needs to be backed by struct |
| 25 | +pages. However, PCI BARs are not typically cache coherent so there are |
| 26 | +a few corner case gotchas with these pages so developers need to |
| 27 | +be careful about what they do with them. |
| 28 | + |
| 29 | + |
| 30 | +Driver Writer's Guide |
| 31 | +===================== |
| 32 | + |
| 33 | +In a given P2P implementation there may be three or more different |
| 34 | +types of kernel drivers in play: |
| 35 | + |
| 36 | +* Provider - A driver which provides or publishes P2P resources like |
| 37 | + memory or doorbell registers to other drivers. |
| 38 | +* Client - A driver which makes use of a resource by setting up a |
| 39 | + DMA transaction to or from it. |
| 40 | +* Orchestrator - A driver which orchestrates the flow of data between |
| 41 | + clients and providers. |
| 42 | + |
| 43 | +In many cases there could be overlap between these three types (i.e., |
| 44 | +it may be typical for a driver to be both a provider and a client). |
| 45 | + |
| 46 | +For example, in the NVMe Target Copy Offload implementation: |
| 47 | + |
| 48 | +* The NVMe PCI driver is both a client, provider and orchestrator |
| 49 | + in that it exposes any CMB (Controller Memory Buffer) as a P2P memory |
| 50 | + resource (provider), it accepts P2P memory pages as buffers in requests |
| 51 | + to be used directly (client) and it can also make use of the CMB as |
| 52 | + submission queue entries (orchastrator). |
| 53 | +* The RDMA driver is a client in this arrangement so that an RNIC |
| 54 | + can DMA directly to the memory exposed by the NVMe device. |
| 55 | +* The NVMe Target driver (nvmet) can orchestrate the data from the RNIC |
| 56 | + to the P2P memory (CMB) and then to the NVMe device (and vice versa). |
| 57 | + |
| 58 | +This is currently the only arrangement supported by the kernel but |
| 59 | +one could imagine slight tweaks to this that would allow for the same |
| 60 | +functionality. For example, if a specific RNIC added a BAR with some |
| 61 | +memory behind it, its driver could add support as a P2P provider and |
| 62 | +then the NVMe Target could use the RNIC's memory instead of the CMB |
| 63 | +in cases where the NVMe cards in use do not have CMB support. |
| 64 | + |
| 65 | + |
| 66 | +Provider Drivers |
| 67 | +---------------- |
| 68 | + |
| 69 | +A provider simply needs to register a BAR (or a portion of a BAR) |
| 70 | +as a P2P DMA resource using :c:func:`pci_p2pdma_add_resource()`. |
| 71 | +This will register struct pages for all the specified memory. |
| 72 | + |
| 73 | +After that it may optionally publish all of its resources as |
| 74 | +P2P memory using :c:func:`pci_p2pmem_publish()`. This will allow |
| 75 | +any orchestrator drivers to find and use the memory. When marked in |
| 76 | +this way, the resource must be regular memory with no side effects. |
| 77 | + |
| 78 | +For the time being this is fairly rudimentary in that all resources |
| 79 | +are typically going to be P2P memory. Future work will likely expand |
| 80 | +this to include other types of resources like doorbells. |
| 81 | + |
| 82 | + |
| 83 | +Client Drivers |
| 84 | +-------------- |
| 85 | + |
| 86 | +A client driver typically only has to conditionally change its DMA map |
| 87 | +routine to use the mapping function :c:func:`pci_p2pdma_map_sg()` instead |
| 88 | +of the usual :c:func:`dma_map_sg()` function. Memory mapped in this |
| 89 | +way does not need to be unmapped. |
| 90 | + |
| 91 | +The client may also, optionally, make use of |
| 92 | +:c:func:`is_pci_p2pdma_page()` to determine when to use the P2P mapping |
| 93 | +functions and when to use the regular mapping functions. In some |
| 94 | +situations, it may be more appropriate to use a flag to indicate a |
| 95 | +given request is P2P memory and map appropriately. It is important to |
| 96 | +ensure that struct pages that back P2P memory stay out of code that |
| 97 | +does not have support for them as other code may treat the pages as |
| 98 | +regular memory which may not be appropriate. |
| 99 | + |
| 100 | + |
| 101 | +Orchestrator Drivers |
| 102 | +-------------------- |
| 103 | + |
| 104 | +The first task an orchestrator driver must do is compile a list of |
| 105 | +all client devices that will be involved in a given transaction. For |
| 106 | +example, the NVMe Target driver creates a list including the namespace |
| 107 | +block device and the RNIC in use. If the orchestrator has access to |
| 108 | +a specific P2P provider to use it may check compatibility using |
| 109 | +:c:func:`pci_p2pdma_distance()` otherwise it may find a memory provider |
| 110 | +that's compatible with all clients using :c:func:`pci_p2pmem_find()`. |
| 111 | +If more than one provider is supported, the one nearest to all the clients will |
| 112 | +be chosen first. If more than one provider is an equal distance away, the |
| 113 | +one returned will be chosen at random (it is not an arbitrary but |
| 114 | +truely random). This function returns the PCI device to use for the provider |
| 115 | +with a reference taken and therefore when it's no longer needed it should be |
| 116 | +returned with pci_dev_put(). |
| 117 | + |
| 118 | +Once a provider is selected, the orchestrator can then use |
| 119 | +:c:func:`pci_alloc_p2pmem()` and :c:func:`pci_free_p2pmem()` to |
| 120 | +allocate P2P memory from the provider. :c:func:`pci_p2pmem_alloc_sgl()` |
| 121 | +and :c:func:`pci_p2pmem_free_sgl()` are convenience functions for |
| 122 | +allocating scatter-gather lists with P2P memory. |
| 123 | + |
| 124 | +Struct Page Caveats |
| 125 | +------------------- |
| 126 | + |
| 127 | +Driver writers should be very careful about not passing these special |
| 128 | +struct pages to code that isn't prepared for it. At this time, the kernel |
| 129 | +interfaces do not have any checks for ensuring this. This obviously |
| 130 | +precludes passing these pages to userspace. |
| 131 | + |
| 132 | +P2P memory is also technically IO memory but should never have any side |
| 133 | +effects behind it. Thus, the order of loads and stores should not be important |
| 134 | +and ioreadX(), iowriteX() and friends should not be necessary. |
| 135 | +However, as the memory is not cache coherent, if access ever needs to |
| 136 | +be protected by a spinlock then :c:func:`mmiowb()` must be used before |
| 137 | +unlocking the lock. (See ACQUIRES VS I/O ACCESSES in |
| 138 | +Documentation/memory-barriers.txt) |
| 139 | + |
| 140 | + |
| 141 | +P2P DMA Support Library |
| 142 | +======================= |
| 143 | + |
| 144 | +.. kernel-doc:: drivers/pci/p2pdma.c |
| 145 | + :export: |
0 commit comments