Skip to content

Commit 2870f6c

Browse files
committed
Merge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm fixes from Dan Williams: - three fixes tagged for -stable including a crash fix, simple performance tweak, and an invalid i/o error. - build regression fix for the nvdimm unit tests - nvdimm documentation update * 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: dax: fix __dax_pmd_fault crash libnvdimm: documentation clarifications libnvdimm, pmem: fix size trim in pmem_direct_access() libnvdimm, e820: fix numa node for e820-type-12 pmem ranges tools/testing/nvdimm, acpica: fix flag rename build breakage
2 parents 934f98d + 152d7bd commit 2870f6c

File tree

5 files changed

+52
-36
lines changed

5 files changed

+52
-36
lines changed

Documentation/nvdimm/nvdimm.txt

Lines changed: 28 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,12 @@ DAX: File system extensions to bypass the page cache and block layer to
6262
mmap persistent memory, from a PMEM block device, directly into a
6363
process address space.
6464

65+
DSM: Device Specific Method: ACPI method to to control specific
66+
device - in this case the firmware.
67+
68+
DCR: NVDIMM Control Region Structure defined in ACPI 6 Section 5.2.25.5.
69+
It defines a vendor-id, device-id, and interface format for a given DIMM.
70+
6571
BTT: Block Translation Table: Persistent memory is byte addressable.
6672
Existing software may have an expectation that the power-fail-atomicity
6773
of writes is at least one sector, 512 bytes. The BTT is an indirection
@@ -133,16 +139,16 @@ device driver:
133139
registered, can be immediately attached to nd_pmem.
134140

135141
2. BLK (nd_blk.ko): This driver performs I/O using a set of platform
136-
defined apertures. A set of apertures will all access just one DIMM.
137-
Multiple windows allow multiple concurrent accesses, much like
142+
defined apertures. A set of apertures will access just one DIMM.
143+
Multiple windows (apertures) allow multiple concurrent accesses, much like
138144
tagged-command-queuing, and would likely be used by different threads or
139145
different CPUs.
140146

141147
The NFIT specification defines a standard format for a BLK-aperture, but
142148
the spec also allows for vendor specific layouts, and non-NFIT BLK
143-
implementations may other designs for BLK I/O. For this reason "nd_blk"
144-
calls back into platform-specific code to perform the I/O. One such
145-
implementation is defined in the "Driver Writer's Guide" and "DSM
149+
implementations may have other designs for BLK I/O. For this reason
150+
"nd_blk" calls back into platform-specific code to perform the I/O.
151+
One such implementation is defined in the "Driver Writer's Guide" and "DSM
146152
Interface Example".
147153

148154

@@ -152,7 +158,7 @@ Why BLK?
152158
While PMEM provides direct byte-addressable CPU-load/store access to
153159
NVDIMM storage, it does not provide the best system RAS (recovery,
154160
availability, and serviceability) model. An access to a corrupted
155-
system-physical-address address causes a cpu exception while an access
161+
system-physical-address address causes a CPU exception while an access
156162
to a corrupted address through an BLK-aperture causes that block window
157163
to raise an error status in a register. The latter is more aligned with
158164
the standard error model that host-bus-adapter attached disks present.
@@ -162,7 +168,7 @@ data could be interleaved in an opaque hardware specific manner across
162168
several DIMMs.
163169

164170
PMEM vs BLK
165-
BLK-apertures solve this RAS problem, but their presence is also the
171+
BLK-apertures solve these RAS problems, but their presence is also the
166172
major contributing factor to the complexity of the ND subsystem. They
167173
complicate the implementation because PMEM and BLK alias in DPA space.
168174
Any given DIMM's DPA-range may contribute to one or more
@@ -220,8 +226,8 @@ socket. Each unique interface (BLK or PMEM) to DPA space is identified
220226
by a region device with a dynamically assigned id (REGION0 - REGION5).
221227

222228
1. The first portion of DIMM0 and DIMM1 are interleaved as REGION0. A
223-
single PMEM namespace is created in the REGION0-SPA-range that spans
224-
DIMM0 and DIMM1 with a user-specified name of "pm0.0". Some of that
229+
single PMEM namespace is created in the REGION0-SPA-range that spans most
230+
of DIMM0 and DIMM1 with a user-specified name of "pm0.0". Some of that
225231
interleaved system-physical-address range is reclaimed as BLK-aperture
226232
accessed space starting at DPA-offset (a) into each DIMM. In that
227233
reclaimed space we create two BLK-aperture "namespaces" from REGION2 and
@@ -230,13 +236,13 @@ by a region device with a dynamically assigned id (REGION0 - REGION5).
230236

231237
2. In the last portion of DIMM0 and DIMM1 we have an interleaved
232238
system-physical-address range, REGION1, that spans those two DIMMs as
233-
well as DIMM2 and DIMM3. Some of REGION1 allocated to a PMEM namespace
234-
named "pm1.0" the rest is reclaimed in 4 BLK-aperture namespaces (for
239+
well as DIMM2 and DIMM3. Some of REGION1 is allocated to a PMEM namespace
240+
named "pm1.0", the rest is reclaimed in 4 BLK-aperture namespaces (for
235241
each DIMM in the interleave set), "blk2.1", "blk3.1", "blk4.0", and
236242
"blk5.0".
237243

238244
3. The portion of DIMM2 and DIMM3 that do not participate in the REGION1
239-
interleaved system-physical-address range (i.e. the DPA address below
245+
interleaved system-physical-address range (i.e. the DPA address past
240246
offset (b) are also included in the "blk4.0" and "blk5.0" namespaces.
241247
Note, that this example shows that BLK-aperture namespaces don't need to
242248
be contiguous in DPA-space.
@@ -252,15 +258,15 @@ LIBNVDIMM Kernel Device Model and LIBNDCTL Userspace API
252258

253259
What follows is a description of the LIBNVDIMM sysfs layout and a
254260
corresponding object hierarchy diagram as viewed through the LIBNDCTL
255-
api. The example sysfs paths and diagrams are relative to the Example
261+
API. The example sysfs paths and diagrams are relative to the Example
256262
NVDIMM Platform which is also the LIBNVDIMM bus used in the LIBNDCTL unit
257263
test.
258264

259265
LIBNDCTL: Context
260-
Every api call in the LIBNDCTL library requires a context that holds the
266+
Every API call in the LIBNDCTL library requires a context that holds the
261267
logging parameters and other library instance state. The library is
262268
based on the libabc template:
263-
https://git.kernel.org/cgit/linux/kernel/git/kay/libabc.git/
269+
https://git.kernel.org/cgit/linux/kernel/git/kay/libabc.git
264270

265271
LIBNDCTL: instantiate a new library context example
266272

@@ -409,7 +415,7 @@ Bit 31:28 Reserved
409415
LIBNVDIMM/LIBNDCTL: Region
410416
----------------------
411417

412-
A generic REGION device is registered for each PMEM range orBLK-aperture
418+
A generic REGION device is registered for each PMEM range or BLK-aperture
413419
set. Per the example there are 6 regions: 2 PMEM and 4 BLK-aperture
414420
sets on the "nfit_test.0" bus. The primary role of regions are to be a
415421
container of "mappings". A mapping is a tuple of <DIMM,
@@ -509,7 +515,7 @@ At first glance it seems since NFIT defines just PMEM and BLK interface
509515
types that we should simply name REGION devices with something derived
510516
from those type names. However, the ND subsystem explicitly keeps the
511517
REGION name generic and expects userspace to always consider the
512-
region-attributes for 4 reasons:
518+
region-attributes for four reasons:
513519

514520
1. There are already more than two REGION and "namespace" types. For
515521
PMEM there are two subtypes. As mentioned previously we have PMEM where
@@ -698,8 +704,8 @@ static int configure_namespace(struct ndctl_region *region,
698704

699705
Why the Term "namespace"?
700706

701-
1. Why not "volume" for instance? "volume" ran the risk of confusing ND
702-
as a volume manager like device-mapper.
707+
1. Why not "volume" for instance? "volume" ran the risk of confusing
708+
ND (libnvdimm subsystem) to a volume manager like device-mapper.
703709

704710
2. The term originated to describe the sub-devices that can be created
705711
within a NVME controller (see the nvme specification:
@@ -774,13 +780,14 @@ block" needs to be destroyed. Note, that to destroy a BTT the media
774780
needs to be written in raw mode. By default, the kernel will autodetect
775781
the presence of a BTT and disable raw mode. This autodetect behavior
776782
can be suppressed by enabling raw mode for the namespace via the
777-
ndctl_namespace_set_raw_mode() api.
783+
ndctl_namespace_set_raw_mode() API.
778784

779785

780786
Summary LIBNDCTL Diagram
781787
------------------------
782788

783-
For the given example above, here is the view of the objects as seen by the LIBNDCTL api:
789+
For the given example above, here is the view of the objects as seen by the
790+
LIBNDCTL API:
784791
+---+
785792
|CTX| +---------+ +--------------+ +---------------+
786793
+-+-+ +-> REGION0 +---> NAMESPACE0.0 +--> PMEM8 "pm0.0" |

drivers/nvdimm/e820.c

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
* Copyright (c) 2015, Intel Corporation.
44
*/
55
#include <linux/platform_device.h>
6+
#include <linux/memory_hotplug.h>
67
#include <linux/libnvdimm.h>
78
#include <linux/module.h>
89

@@ -25,6 +26,18 @@ static int e820_pmem_remove(struct platform_device *pdev)
2526
return 0;
2627
}
2728

29+
#ifdef CONFIG_MEMORY_HOTPLUG
30+
static int e820_range_to_nid(resource_size_t addr)
31+
{
32+
return memory_add_physaddr_to_nid(addr);
33+
}
34+
#else
35+
static int e820_range_to_nid(resource_size_t addr)
36+
{
37+
return NUMA_NO_NODE;
38+
}
39+
#endif
40+
2841
static int e820_pmem_probe(struct platform_device *pdev)
2942
{
3043
static struct nvdimm_bus_descriptor nd_desc;
@@ -48,7 +61,7 @@ static int e820_pmem_probe(struct platform_device *pdev)
4861
memset(&ndr_desc, 0, sizeof(ndr_desc));
4962
ndr_desc.res = p;
5063
ndr_desc.attr_groups = e820_pmem_region_attribute_groups;
51-
ndr_desc.numa_node = NUMA_NO_NODE;
64+
ndr_desc.numa_node = e820_range_to_nid(p->start);
5265
set_bit(ND_REGION_PAGEMAP, &ndr_desc.flags);
5366
if (!nvdimm_pmem_region_create(nvdimm_bus, &ndr_desc))
5467
goto err;

drivers/nvdimm/pmem.c

Lines changed: 2 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -105,22 +105,11 @@ static long pmem_direct_access(struct block_device *bdev, sector_t sector,
105105
{
106106
struct pmem_device *pmem = bdev->bd_disk->private_data;
107107
resource_size_t offset = sector * 512 + pmem->data_offset;
108-
resource_size_t size;
109-
110-
if (pmem->data_offset) {
111-
/*
112-
* Limit the direct_access() size to what is covered by
113-
* the memmap
114-
*/
115-
size = (pmem->size - offset) & ~ND_PFN_MASK;
116-
} else
117-
size = pmem->size - offset;
118-
119-
/* FIXME convert DAX to comprehend that this mapping has a lifetime */
108+
120109
*kaddr = pmem->virt_addr + offset;
121110
*pfn = (pmem->phys_addr + offset) >> PAGE_SHIFT;
122111

123-
return size;
112+
return pmem->size - offset;
124113
}
125114

126115
static const struct block_device_operations pmem_fops = {

fs/dax.c

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -629,6 +629,13 @@ int __dax_pmd_fault(struct vm_area_struct *vma, unsigned long address,
629629
if ((length < PMD_SIZE) || (pfn & PG_PMD_COLOUR))
630630
goto fallback;
631631

632+
/*
633+
* TODO: teach vmf_insert_pfn_pmd() to support
634+
* 'pte_special' for pmds
635+
*/
636+
if (pfn_valid(pfn))
637+
goto fallback;
638+
632639
if (buffer_unwritten(&bh) || buffer_new(&bh)) {
633640
int i;
634641
for (i = 0; i < PTRS_PER_PMD; i++)

tools/testing/nvdimm/test/nfit.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1135,7 +1135,7 @@ static void nfit_test1_setup(struct nfit_test *t)
11351135
memdev->interleave_ways = 1;
11361136
memdev->flags = ACPI_NFIT_MEM_SAVE_FAILED | ACPI_NFIT_MEM_RESTORE_FAILED
11371137
| ACPI_NFIT_MEM_FLUSH_FAILED | ACPI_NFIT_MEM_HEALTH_OBSERVED
1138-
| ACPI_NFIT_MEM_ARMED;
1138+
| ACPI_NFIT_MEM_NOT_ARMED;
11391139

11401140
offset += sizeof(*memdev);
11411141
/* dcr-descriptor0 */

0 commit comments

Comments
 (0)