Skip to content

Commit bd6bf7c

Browse files
committed
Merge tag 'pci-v4.20-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI updates from Bjorn Helgaas: - Fix ASPM link_state teardown on removal (Lukas Wunner) - Fix misleading _OSC ASPM message (Sinan Kaya) - Make _OSC optional for PCI (Sinan Kaya) - Don't initialize ASPM link state when ACPI_FADT_NO_ASPM is set (Patrick Talbert) - Remove x86 and arm64 node-local allocation for host bridge structures (Punit Agrawal) - Pay attention to device-specific _PXM node values (Jonathan Cameron) - Support new Immediate Readiness bit (Felipe Balbi) - Differentiate between pciehp surprise and safe removal (Lukas Wunner) - Remove unnecessary pciehp includes (Lukas Wunner) - Drop pciehp hotplug_slot_ops wrappers (Lukas Wunner) - Tolerate PCIe Slot Presence Detect being hardwired to zero to workaround broken hardware, e.g., the Wilocity switch/wireless device (Lukas Wunner) - Unify pciehp controller & slot structs (Lukas Wunner) - Constify hotplug_slot_ops (Lukas Wunner) - Drop hotplug_slot_info (Lukas Wunner) - Embed hotplug_slot struct into users instead of allocating it separately (Lukas Wunner) - Initialize PCIe port service drivers directly instead of relying on initcall ordering (Keith Busch) - Restore PCI config state after a slot reset (Keith Busch) - Save/restore DPC config state along with other PCI config state (Keith Busch) - Reference count devices during AER handling to avoid race issue with concurrent hot removal (Keith Busch) - If an Upstream Port reports ERR_FATAL, don't try to read the Port's config space because it is probably unreachable (Keith Busch) - During error handling, use slot-specific reset instead of secondary bus reset to avoid link up/down issues on hotplug ports (Keith Busch) - Restore previous AER/DPC handling that does not remove and re-enumerate devices on ERR_FATAL (Keith Busch) - Notify all drivers that may be affected by error recovery resets (Keith Busch) - Always generate error recovery uevents, even if a driver doesn't have error callbacks (Keith Busch) - Make PCIe link active reporting detection generic (Keith Busch) - Support D3cold in PCIe hierarchies during system sleep and runtime, including hotplug and Thunderbolt ports (Mika Westerberg) - Handle hpmemsize/hpiosize kernel parameters uniformly, whether slots are empty or occupied (Jon Derrick) - Remove duplicated include from pci/pcie/err.c and unused variable from cpqphp (YueHaibing) - Remove driver pci_cleanup_aer_uncorrect_error_status() calls (Oza Pawandeep) - Uninline PCI bus accessors for better ftracing (Keith Busch) - Remove unused AER Root Port .error_resume method (Keith Busch) - Use kfifo in AER instead of a local version (Keith Busch) - Use threaded IRQ in AER bottom half (Keith Busch) - Use managed resources in AER core (Keith Busch) - Reuse pcie_port_find_device() for AER injection (Keith Busch) - Abstract AER interrupt handling to disconnect error injection (Keith Busch) - Refactor AER injection callbacks to simplify future improvments (Keith Busch) - Remove unused Netronome NFP32xx Device IDs (Jakub Kicinski) - Use bitmap_zalloc() for dma_alias_mask (Andy Shevchenko) - Add switch fall-through annotations (Gustavo A. R. Silva) - Remove unused Switchtec quirk variable (Joshua Abraham) - Fix pci.c kernel-doc warning (Randy Dunlap) - Remove trivial PCI wrappers for DMA APIs (Christoph Hellwig) - Add Intel GPU device IDs to spurious interrupt quirk (Bin Meng) - Run Switchtec DMA aliasing quirk only on NTB endpoints to avoid useless dmesg errors (Logan Gunthorpe) - Update Switchtec NTB documentation (Wesley Yung) - Remove redundant "default n" from Kconfig (Bartlomiej Zolnierkiewicz) - Avoid panic when drivers enable MSI/MSI-X twice (Tonghao Zhang) - Add PCI support for peer-to-peer DMA (Logan Gunthorpe) - Add sysfs group for PCI peer-to-peer memory statistics (Logan Gunthorpe) - Add PCI peer-to-peer DMA scatterlist mapping interface (Logan Gunthorpe) - Add PCI configfs/sysfs helpers for use by peer-to-peer users (Logan Gunthorpe) - Add PCI peer-to-peer DMA driver writer's documentation (Logan Gunthorpe) - Add block layer flag to indicate driver support for PCI peer-to-peer DMA (Logan Gunthorpe) - Map Infiniband scatterlists for peer-to-peer DMA if they contain P2P memory (Logan Gunthorpe) - Register nvme-pci CMB buffer as PCI peer-to-peer memory (Logan Gunthorpe) - Add nvme-pci support for PCI peer-to-peer memory in requests (Logan Gunthorpe) - Use PCI peer-to-peer memory in nvme (Stephen Bates, Steve Wise, Christoph Hellwig, Logan Gunthorpe) - Cache VF config space size to optimize enumeration of many VFs (KarimAllah Ahmed) - Remove unnecessary <linux/pci-ats.h> include (Bjorn Helgaas) - Fix VMD AERSID quirk Device ID matching (Jon Derrick) - Fix Cadence PHY handling during probe (Alan Douglas) - Signal Cadence Endpoint interrupts via AXI region 0 instead of last region (Alan Douglas) - Write Cadence Endpoint MSI interrupts with 32 bits of data (Alan Douglas) - Remove redundant controller tests for "device_type == pci" (Rob Herring) - Document R-Car E3 (R8A77990) bindings (Tho Vu) - Add device tree support for R-Car r8a7744 (Biju Das) - Drop unused mvebu PCIe capability code (Thomas Petazzoni) - Add shared PCI bridge emulation code (Thomas Petazzoni) - Convert mvebu to use shared PCI bridge emulation (Thomas Petazzoni) - Add aardvark Root Port emulation (Thomas Petazzoni) - Support 100MHz/200MHz refclocks for i.MX6 (Lucas Stach) - Add initial power management for i.MX7 (Leonard Crestez) - Add PME_Turn_Off support for i.MX7 (Leonard Crestez) - Fix qcom runtime power management error handling (Bjorn Andersson) - Update TI dra7xx unaligned access errata workaround for host mode as well as endpoint mode (Vignesh R) - Fix kirin section mismatch warning (Nathan Chancellor) - Remove iproc PAXC slot check to allow VF support (Jitendra Bhivare) - Quirk Keystone K2G to limit MRRS to 256 (Kishon Vijay Abraham I) - Update Keystone to use MRRS quirk for host bridge instead of open coding (Kishon Vijay Abraham I) - Refactor Keystone link establishment (Kishon Vijay Abraham I) - Simplify and speed up Keystone link training (Kishon Vijay Abraham I) - Remove unused Keystone host_init argument (Kishon Vijay Abraham I) - Merge Keystone driver files into one (Kishon Vijay Abraham I) - Remove redundant Keystone platform_set_drvdata() (Kishon Vijay Abraham I) - Rename Keystone functions for uniformity (Kishon Vijay Abraham I) - Add Keystone device control module DT binding (Kishon Vijay Abraham I) - Use SYSCON API to get Keystone control module device IDs (Kishon Vijay Abraham I) - Clean up Keystone PHY handling (Kishon Vijay Abraham I) - Use runtime PM APIs to enable Keystone clock (Kishon Vijay Abraham I) - Clean up Keystone config space access checks (Kishon Vijay Abraham I) - Get Keystone outbound window count from DT (Kishon Vijay Abraham I) - Clean up Keystone outbound window configuration (Kishon Vijay Abraham I) - Clean up Keystone DBI setup (Kishon Vijay Abraham I) - Clean up Keystone ks_pcie_link_up() (Kishon Vijay Abraham I) - Fix Keystone IRQ status checking (Kishon Vijay Abraham I) - Add debug messages for all Keystone errors (Kishon Vijay Abraham I) - Clean up Keystone includes and macros (Kishon Vijay Abraham I) - Fix Mediatek unchecked return value from devm_pci_remap_iospace() (Gustavo A. R. Silva) - Fix Mediatek endpoint/port matching logic (Honghui Zhang) - Change Mediatek Root Port Class Code to PCI_CLASS_BRIDGE_PCI (Honghui Zhang) - Remove redundant Mediatek PM domain check (Honghui Zhang) - Convert Mediatek to pci_host_probe() (Honghui Zhang) - Fix Mediatek MSI enablement (Honghui Zhang) - Add Mediatek system PM support for MT2712 and MT7622 (Honghui Zhang) - Add Mediatek loadable module support (Honghui Zhang) - Detach VMD resources after stopping root bus to prevent orphan resources (Jon Derrick) - Convert pcitest build process to that used by other tools (iio, perf, etc) (Gustavo Pimentel) * tag 'pci-v4.20-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (140 commits) PCI/AER: Refactor error injection fallbacks PCI/AER: Abstract AER interrupt handling PCI/AER: Reuse existing pcie_port_find_device() interface PCI/AER: Use managed resource allocations PCI: pcie: Remove redundant 'default n' from Kconfig PCI: aardvark: Implement emulated root PCI bridge config space PCI: mvebu: Convert to PCI emulated bridge config space PCI: mvebu: Drop unused PCI express capability code PCI: Introduce PCI bridge emulated config space common logic PCI: vmd: Detach resources after stopping root bus nvmet: Optionally use PCI P2P memory nvmet: Introduce helper functions to allocate and free request SGLs nvme-pci: Add support for P2P memory in requests nvme-pci: Use PCI p2pmem subsystem to manage the CMB IB/core: Ensure we map P2P memory correctly in rdma_rw_ctx_[init|destroy]() block: Add PCI P2P flag for request queue PCI/P2PDMA: Add P2P DMA driver writer's documentation docs-rst: Add a new directory for PCI documentation PCI/P2PDMA: Introduce configfs/sysfs enable attribute helpers PCI/P2PDMA: Add PCI p2pmem DMA mappings to adjust the bus offset ...
2 parents a41efc2 + 663569d commit bd6bf7c

File tree

162 files changed

+5004
-3114
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

162 files changed

+5004
-3114
lines changed

Documentation/ABI/testing/sysfs-bus-pci

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -323,3 +323,27 @@ Description:
323323

324324
This is similar to /sys/bus/pci/drivers_autoprobe, but
325325
affects only the VFs associated with a specific PF.
326+
327+
What: /sys/bus/pci/devices/.../p2pmem/size
328+
Date: November 2017
329+
Contact: Logan Gunthorpe <logang@deltatee.com>
330+
Description:
331+
If the device has any Peer-to-Peer memory registered, this
332+
file contains the total amount of memory that the device
333+
provides (in decimal).
334+
335+
What: /sys/bus/pci/devices/.../p2pmem/available
336+
Date: November 2017
337+
Contact: Logan Gunthorpe <logang@deltatee.com>
338+
Description:
339+
If the device has any Peer-to-Peer memory registered, this
340+
file contains the amount of memory that has not been
341+
allocated (in decimal).
342+
343+
What: /sys/bus/pci/devices/.../p2pmem/published
344+
Date: November 2017
345+
Contact: Logan Gunthorpe <logang@deltatee.com>
346+
Description:
347+
If the device has any Peer-to-Peer memory registered, this
348+
file contains a '1' if the memory has been published for
349+
use outside the driver that owns the device.

Documentation/PCI/endpoint/pci-test-howto.txt

Lines changed: 11 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -99,17 +99,20 @@ Note that the devices listed here correspond to the value populated in 1.4 above
9999
2.2 Using Endpoint Test function Device
100100

101101
pcitest.sh added in tools/pci/ can be used to run all the default PCI endpoint
102-
tests. Before pcitest.sh can be used pcitest.c should be compiled using the
103-
following commands.
102+
tests. To compile this tool the following commands should be used:
104103

105-
cd <kernel-dir>
106-
make headers_install ARCH=arm
107-
arm-linux-gnueabihf-gcc -Iusr/include tools/pci/pcitest.c -o pcitest
108-
cp pcitest <rootfs>/usr/sbin/
109-
cp tools/pci/pcitest.sh <rootfs>
104+
# cd <kernel-dir>
105+
# make -C tools/pci
106+
107+
or if you desire to compile and install in your system:
108+
109+
# cd <kernel-dir>
110+
# make -C tools/pci install
111+
112+
The tool and script will be located in <rootfs>/usr/bin/
110113

111114
2.2.1 pcitest.sh Output
112-
# ./pcitest.sh
115+
# pcitest.sh
113116
BAR tests
114117

115118
BAR0: OKAY

Documentation/PCI/pci-error-recovery.txt

Lines changed: 10 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -110,7 +110,7 @@ The actual steps taken by a platform to recover from a PCI error
110110
event will be platform-dependent, but will follow the general
111111
sequence described below.
112112

113-
STEP 0: Error Event: ERR_NONFATAL
113+
STEP 0: Error Event
114114
-------------------
115115
A PCI bus error is detected by the PCI hardware. On powerpc, the slot
116116
is isolated, in that all I/O is blocked: all reads return 0xffffffff,
@@ -228,7 +228,13 @@ proceeds to either STEP3 (Link Reset) or to STEP 5 (Resume Operations).
228228
If any driver returned PCI_ERS_RESULT_NEED_RESET, then the platform
229229
proceeds to STEP 4 (Slot Reset)
230230

231-
STEP 3: Slot Reset
231+
STEP 3: Link Reset
232+
------------------
233+
The platform resets the link. This is a PCI-Express specific step
234+
and is done whenever a fatal error has been detected that can be
235+
"solved" by resetting the link.
236+
237+
STEP 4: Slot Reset
232238
------------------
233239

234240
In response to a return value of PCI_ERS_RESULT_NEED_RESET, the
@@ -314,7 +320,7 @@ Failure).
314320
>>> However, it probably should.
315321

316322

317-
STEP 4: Resume Operations
323+
STEP 5: Resume Operations
318324
-------------------------
319325
The platform will call the resume() callback on all affected device
320326
drivers if all drivers on the segment have returned
@@ -326,7 +332,7 @@ a result code.
326332
At this point, if a new error happens, the platform will restart
327333
a new error recovery sequence.
328334

329-
STEP 5: Permanent Failure
335+
STEP 6: Permanent Failure
330336
-------------------------
331337
A "permanent failure" has occurred, and the platform cannot recover
332338
the device. The platform will call error_detected() with a
@@ -349,27 +355,6 @@ errors. See the discussion in powerpc/eeh-pci-error-recovery.txt
349355
for additional detail on real-life experience of the causes of
350356
software errors.
351357

352-
STEP 0: Error Event: ERR_FATAL
353-
-------------------
354-
PCI bus error is detected by the PCI hardware. On powerpc, the slot is
355-
isolated, in that all I/O is blocked: all reads return 0xffffffff, all
356-
writes are ignored.
357-
358-
STEP 1: Remove devices
359-
--------------------
360-
Platform removes the devices depending on the error agent, it could be
361-
this port for all subordinates or upstream component (likely downstream
362-
port)
363-
364-
STEP 2: Reset link
365-
--------------------
366-
The platform resets the link. This is a PCI-Express specific step and is
367-
done whenever a fatal error has been detected that can be "solved" by
368-
resetting the link.
369-
370-
STEP 3: Re-enumerate the devices
371-
--------------------
372-
Initiates the re-enumeration.
373358

374359
Conclusion; General Remarks
375360
---------------------------

Documentation/devicetree/bindings/pci/fsl,imx6q-pcie.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,7 @@ Additional required properties for imx7d-pcie:
5050
- reset-names: Must contain the following entires:
5151
- "pciephy"
5252
- "apps"
53+
- "turnoff"
5354

5455
Example:
5556

Documentation/devicetree/bindings/pci/pci-keystone.txt

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,9 @@ pcie_msi_intc : Interrupt controller device node for MSI IRQ chip
1919
interrupt-cells: should be set to 1
2020
interrupts: GIC interrupt lines connected to PCI MSI interrupt lines
2121

22+
ti,syscon-pcie-id : phandle to the device control module required to set device
23+
id and vendor id.
24+
2225
Example:
2326
pcie_msi_intc: msi-interrupt-controller {
2427
interrupt-controller;

Documentation/devicetree/bindings/pci/pci-rcar-gen2.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ OHCI and EHCI controllers.
77

88
Required properties:
99
- compatible: "renesas,pci-r8a7743" for the R8A7743 SoC;
10+
"renesas,pci-r8a7744" for the R8A7744 SoC;
1011
"renesas,pci-r8a7745" for the R8A7745 SoC;
1112
"renesas,pci-r8a7790" for the R8A7790 SoC;
1213
"renesas,pci-r8a7791" for the R8A7791 SoC;

Documentation/devicetree/bindings/pci/rcar-pci.txt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,13 +2,15 @@
22

33
Required properties:
44
compatible: "renesas,pcie-r8a7743" for the R8A7743 SoC;
5+
"renesas,pcie-r8a7744" for the R8A7744 SoC;
56
"renesas,pcie-r8a7779" for the R8A7779 SoC;
67
"renesas,pcie-r8a7790" for the R8A7790 SoC;
78
"renesas,pcie-r8a7791" for the R8A7791 SoC;
89
"renesas,pcie-r8a7793" for the R8A7793 SoC;
910
"renesas,pcie-r8a7795" for the R8A7795 SoC;
1011
"renesas,pcie-r8a7796" for the R8A7796 SoC;
1112
"renesas,pcie-r8a77980" for the R8A77980 SoC;
13+
"renesas,pcie-r8a77990" for the R8A77990 SoC;
1214
"renesas,pcie-rcar-gen2" for a generic R-Car Gen2 or
1315
RZ/G1 compatible device.
1416
"renesas,pcie-rcar-gen3" for a generic R-Car Gen3 compatible device.

Documentation/devicetree/bindings/pci/ti-pci.txt

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,11 @@ HOST MODE
2626
ranges,
2727
interrupt-map-mask,
2828
interrupt-map : as specified in ../designware-pcie.txt
29+
- ti,syscon-unaligned-access: phandle to the syscon DT node. The 1st argument
30+
should contain the register offset within syscon
31+
and the 2nd argument should contain the bit field
32+
for setting the bit to enable unaligned
33+
access.
2934

3035
DEVICE MODE
3136
===========

Documentation/driver-api/index.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ available subsections can be seen below.
3030
input
3131
usb/index
3232
firewire
33-
pci
33+
pci/index
3434
spi
3535
i2c
3636
hsi
Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
.. SPDX-License-Identifier: GPL-2.0
2+
3+
============================================
4+
The Linux PCI driver implementer's API guide
5+
============================================
6+
7+
.. class:: toc-title
8+
9+
Table of contents
10+
11+
.. toctree::
12+
:maxdepth: 2
13+
14+
pci
15+
p2pdma
16+
17+
.. only:: subproject and html
18+
19+
Indices
20+
=======
21+
22+
* :ref:`genindex`
Lines changed: 145 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,145 @@
1+
.. SPDX-License-Identifier: GPL-2.0
2+
3+
============================
4+
PCI Peer-to-Peer DMA Support
5+
============================
6+
7+
The PCI bus has pretty decent support for performing DMA transfers
8+
between two devices on the bus. This type of transaction is henceforth
9+
called Peer-to-Peer (or P2P). However, there are a number of issues that
10+
make P2P transactions tricky to do in a perfectly safe way.
11+
12+
One of the biggest issues is that PCI doesn't require forwarding
13+
transactions between hierarchy domains, and in PCIe, each Root Port
14+
defines a separate hierarchy domain. To make things worse, there is no
15+
simple way to determine if a given Root Complex supports this or not.
16+
(See PCIe r4.0, sec 1.3.1). Therefore, as of this writing, the kernel
17+
only supports doing P2P when the endpoints involved are all behind the
18+
same PCI bridge, as such devices are all in the same PCI hierarchy
19+
domain, and the spec guarantees that all transactions within the
20+
hierarchy will be routable, but it does not require routing
21+
between hierarchies.
22+
23+
The second issue is that to make use of existing interfaces in Linux,
24+
memory that is used for P2P transactions needs to be backed by struct
25+
pages. However, PCI BARs are not typically cache coherent so there are
26+
a few corner case gotchas with these pages so developers need to
27+
be careful about what they do with them.
28+
29+
30+
Driver Writer's Guide
31+
=====================
32+
33+
In a given P2P implementation there may be three or more different
34+
types of kernel drivers in play:
35+
36+
* Provider - A driver which provides or publishes P2P resources like
37+
memory or doorbell registers to other drivers.
38+
* Client - A driver which makes use of a resource by setting up a
39+
DMA transaction to or from it.
40+
* Orchestrator - A driver which orchestrates the flow of data between
41+
clients and providers.
42+
43+
In many cases there could be overlap between these three types (i.e.,
44+
it may be typical for a driver to be both a provider and a client).
45+
46+
For example, in the NVMe Target Copy Offload implementation:
47+
48+
* The NVMe PCI driver is both a client, provider and orchestrator
49+
in that it exposes any CMB (Controller Memory Buffer) as a P2P memory
50+
resource (provider), it accepts P2P memory pages as buffers in requests
51+
to be used directly (client) and it can also make use of the CMB as
52+
submission queue entries (orchastrator).
53+
* The RDMA driver is a client in this arrangement so that an RNIC
54+
can DMA directly to the memory exposed by the NVMe device.
55+
* The NVMe Target driver (nvmet) can orchestrate the data from the RNIC
56+
to the P2P memory (CMB) and then to the NVMe device (and vice versa).
57+
58+
This is currently the only arrangement supported by the kernel but
59+
one could imagine slight tweaks to this that would allow for the same
60+
functionality. For example, if a specific RNIC added a BAR with some
61+
memory behind it, its driver could add support as a P2P provider and
62+
then the NVMe Target could use the RNIC's memory instead of the CMB
63+
in cases where the NVMe cards in use do not have CMB support.
64+
65+
66+
Provider Drivers
67+
----------------
68+
69+
A provider simply needs to register a BAR (or a portion of a BAR)
70+
as a P2P DMA resource using :c:func:`pci_p2pdma_add_resource()`.
71+
This will register struct pages for all the specified memory.
72+
73+
After that it may optionally publish all of its resources as
74+
P2P memory using :c:func:`pci_p2pmem_publish()`. This will allow
75+
any orchestrator drivers to find and use the memory. When marked in
76+
this way, the resource must be regular memory with no side effects.
77+
78+
For the time being this is fairly rudimentary in that all resources
79+
are typically going to be P2P memory. Future work will likely expand
80+
this to include other types of resources like doorbells.
81+
82+
83+
Client Drivers
84+
--------------
85+
86+
A client driver typically only has to conditionally change its DMA map
87+
routine to use the mapping function :c:func:`pci_p2pdma_map_sg()` instead
88+
of the usual :c:func:`dma_map_sg()` function. Memory mapped in this
89+
way does not need to be unmapped.
90+
91+
The client may also, optionally, make use of
92+
:c:func:`is_pci_p2pdma_page()` to determine when to use the P2P mapping
93+
functions and when to use the regular mapping functions. In some
94+
situations, it may be more appropriate to use a flag to indicate a
95+
given request is P2P memory and map appropriately. It is important to
96+
ensure that struct pages that back P2P memory stay out of code that
97+
does not have support for them as other code may treat the pages as
98+
regular memory which may not be appropriate.
99+
100+
101+
Orchestrator Drivers
102+
--------------------
103+
104+
The first task an orchestrator driver must do is compile a list of
105+
all client devices that will be involved in a given transaction. For
106+
example, the NVMe Target driver creates a list including the namespace
107+
block device and the RNIC in use. If the orchestrator has access to
108+
a specific P2P provider to use it may check compatibility using
109+
:c:func:`pci_p2pdma_distance()` otherwise it may find a memory provider
110+
that's compatible with all clients using :c:func:`pci_p2pmem_find()`.
111+
If more than one provider is supported, the one nearest to all the clients will
112+
be chosen first. If more than one provider is an equal distance away, the
113+
one returned will be chosen at random (it is not an arbitrary but
114+
truely random). This function returns the PCI device to use for the provider
115+
with a reference taken and therefore when it's no longer needed it should be
116+
returned with pci_dev_put().
117+
118+
Once a provider is selected, the orchestrator can then use
119+
:c:func:`pci_alloc_p2pmem()` and :c:func:`pci_free_p2pmem()` to
120+
allocate P2P memory from the provider. :c:func:`pci_p2pmem_alloc_sgl()`
121+
and :c:func:`pci_p2pmem_free_sgl()` are convenience functions for
122+
allocating scatter-gather lists with P2P memory.
123+
124+
Struct Page Caveats
125+
-------------------
126+
127+
Driver writers should be very careful about not passing these special
128+
struct pages to code that isn't prepared for it. At this time, the kernel
129+
interfaces do not have any checks for ensuring this. This obviously
130+
precludes passing these pages to userspace.
131+
132+
P2P memory is also technically IO memory but should never have any side
133+
effects behind it. Thus, the order of loads and stores should not be important
134+
and ioreadX(), iowriteX() and friends should not be necessary.
135+
However, as the memory is not cache coherent, if access ever needs to
136+
be protected by a spinlock then :c:func:`mmiowb()` must be used before
137+
unlocking the lock. (See ACQUIRES VS I/O ACCESSES in
138+
Documentation/memory-barriers.txt)
139+
140+
141+
P2P DMA Support Library
142+
=======================
143+
144+
.. kernel-doc:: drivers/pci/p2pdma.c
145+
:export:

0 commit comments

Comments
 (0)