Skip to content

Commit 493b0e9

Browse files
Daniel Colascionetorvalds
authored andcommitted
mm: add /proc/pid/smaps_rollup
/proc/pid/smaps_rollup is a new proc file that improves the performance of user programs that determine aggregate memory statistics (e.g., total PSS) of a process. Android regularly "samples" the memory usage of various processes in order to balance its memory pool sizes. This sampling process involves opening /proc/pid/smaps and summing certain fields. For very large processes, sampling memory use this way can take several hundred milliseconds, due mostly to the overhead of the seq_printf calls in task_mmu.c. smaps_rollup improves the situation. It contains most of the fields of /proc/pid/smaps, but instead of a set of fields for each VMA, smaps_rollup instead contains one synthetic smaps-format entry representing the whole process. In the single smaps_rollup synthetic entry, each field is the summation of the corresponding field in all of the real-smaps VMAs. Using a common format for smaps_rollup and smaps allows userspace parsers to repurpose parsers meant for use with non-rollup smaps for smaps_rollup, and it allows userspace to switch between smaps_rollup and smaps at runtime (say, based on the availability of smaps_rollup in a given kernel) with minimal fuss. By using smaps_rollup instead of smaps, a caller can avoid the significant overhead of formatting, reading, and parsing each of a large process's potentially very numerous memory mappings. For sampling system_server's PSS in Android, we measured a 12x speedup, representing a savings of several hundred milliseconds. One alternative to a new per-process proc file would have been including PSS information in /proc/pid/status. We considered this option but thought that PSS would be too expensive (by a few orders of magnitude) to collect relative to what's already emitted as part of /proc/pid/status, and slowing every user of /proc/pid/status for the sake of readers that happen to want PSS feels wrong. The code itself works by reusing the existing VMA-walking framework we use for regular smaps generation and keeping the mem_size_stats structure around between VMA walks instead of using a fresh one for each VMA. In this way, summation happens automatically. We let seq_file walk over the VMAs just as it does for regular smaps and just emit nothing to the seq_file until we hit the last VMA. Benchmarks: using smaps: iterations:1000 pid:1163 pss:220023808 0m29.46s real 0m08.28s user 0m20.98s system using smaps_rollup: iterations:1000 pid:1163 pss:220702720 0m04.39s real 0m00.03s user 0m04.31s system We're using the PSS samples we collect asynchronously for system-management tasks like fine-tuning oom_adj_score, memory use tracking for debugging, application-level memory-use attribution, and deciding whether we want to kill large processes during system idle maintenance windows. Android has been using PSS for these purposes for a long time; as the average process VMA count has increased and and devices become more efficiency-conscious, PSS-collection inefficiency has started to matter more. IMHO, it'd be a lot safer to optimize the existing PSS-collection model, which has been fine-tuned over the years, instead of changing the memory tracking approach entirely to work around smaps-generation inefficiency. Tim said: : There are two main reasons why Android gathers PSS information: : : 1. Android devices can show the user the amount of memory used per : application via the settings app. This is a less important use case. : : 2. We log PSS to help identify leaks in applications. We have found : an enormous number of bugs (in the Android platform, in Google's own : apps, and in third-party applications) using this data. : : To do this, system_server (the main process in Android userspace) will : sample the PSS of a process three seconds after it changes state (for : example, app is launched and becomes the foreground application) and about : every ten minutes after that. The net result is that PSS collection is : regularly running on at least one process in the system (usually a few : times a minute while the screen is on, less when screen is off due to : suspend). PSS of a process is an incredibly useful stat to track, and we : aren't going to get rid of it. We've looked at some very hacky approaches : using RSS ("take the RSS of the target process, subtract the RSS of the : zygote process that is the parent of all Android apps") to reduce the : accounting time, but it regularly overestimated the memory used by 20+ : percent. Accordingly, I don't think that there's a good alternative to : using PSS. : : We started looking into PSS collection performance after we noticed random : frequency spikes while a phone's screen was off; occasionally, one of the : CPU clusters would ramp to a high frequency because there was 200-300ms of : constant CPU work from a single thread in the main Android userspace : process. The work causing the spike (which is reasonable governor : behavior given the amount of CPU time needed) was always PSS collection. : As a result, Android is burning more power than we should be on PSS : collection. : : The other issue (and why I'm less sure about improving smaps as a : long-term solution) is that the number of VMAs per process has increased : significantly from release to release. After trying to figure out why we : were seeing these 200-300ms PSS collection times on Android O but had not : noticed it in previous versions, we found that the number of VMAs in the : main system process increased by 50% from Android N to Android O (from : ~1800 to ~2700) and varying increases in every userspace process. Android : M to N also had an increase in the number of VMAs, although not as much. : I'm not sure why this is increasing so much over time, but thinking about : ASLR and ways to make ASLR better, I expect that this will continue to : increase going forward. I would not be surprised if we hit 5000 VMAs on : the main Android process (system_server) by 2020. : : If we assume that the number of VMAs is going to increase over time, then : doing anything we can do to reduce the overhead of each VMA during PSS : collection seems like the right way to go, and that means outputting an : aggregate statistic (to avoid whatever overhead there is per line in : writing smaps and in reading each line from userspace). Link: http://lkml.kernel.org/r/20170812022148.178293-1-dancol@google.com Signed-off-by: Daniel Colascione <dancol@google.com> Cc: Tim Murray <timmurray@google.com> Cc: Joel Fernandes <joelaf@google.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Sonny Rao <sonnyrao@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
1 parent c79b57e commit 493b0e9

File tree

4 files changed

+170
-62
lines changed

4 files changed

+170
-62
lines changed
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
What: /proc/pid/smaps_rollup
2+
Date: August 2017
3+
Contact: Daniel Colascione <dancol@google.com>
4+
Description:
5+
This file provides pre-summed memory information for a
6+
process. The format is identical to /proc/pid/smaps,
7+
except instead of an entry for each VMA in a process,
8+
smaps_rollup has a single entry (tagged "[rollup]")
9+
for which each field is the sum of the corresponding
10+
fields from all the maps in /proc/pid/smaps.
11+
For more details, see the procfs man page.
12+
13+
Typical output looks like this:
14+
15+
00100000-ff709000 ---p 00000000 00:00 0 [rollup]
16+
Rss: 884 kB
17+
Pss: 385 kB
18+
Shared_Clean: 696 kB
19+
Shared_Dirty: 0 kB
20+
Private_Clean: 120 kB
21+
Private_Dirty: 68 kB
22+
Referenced: 884 kB
23+
Anonymous: 68 kB
24+
LazyFree: 0 kB
25+
AnonHugePages: 0 kB
26+
ShmemPmdMapped: 0 kB
27+
Shared_Hugetlb: 0 kB
28+
Private_Hugetlb: 0 kB
29+
Swap: 0 kB
30+
SwapPss: 0 kB
31+
Locked: 385 kB

fs/proc/base.c

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2931,6 +2931,7 @@ static const struct pid_entry tgid_base_stuff[] = {
29312931
#ifdef CONFIG_PROC_PAGE_MONITOR
29322932
REG("clear_refs", S_IWUSR, proc_clear_refs_operations),
29332933
REG("smaps", S_IRUGO, proc_pid_smaps_operations),
2934+
REG("smaps_rollup", S_IRUGO, proc_pid_smaps_rollup_operations),
29342935
REG("pagemap", S_IRUSR, proc_pagemap_operations),
29352936
#endif
29362937
#ifdef CONFIG_SECURITY
@@ -3324,6 +3325,7 @@ static const struct pid_entry tid_base_stuff[] = {
33243325
#ifdef CONFIG_PROC_PAGE_MONITOR
33253326
REG("clear_refs", S_IWUSR, proc_clear_refs_operations),
33263327
REG("smaps", S_IRUGO, proc_tid_smaps_operations),
3328+
REG("smaps_rollup", S_IRUGO, proc_pid_smaps_rollup_operations),
33273329
REG("pagemap", S_IRUSR, proc_pagemap_operations),
33283330
#endif
33293331
#ifdef CONFIG_SECURITY

fs/proc/internal.h

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -269,10 +269,12 @@ extern int proc_remount(struct super_block *, int *, char *);
269269
/*
270270
* task_[no]mmu.c
271271
*/
272+
struct mem_size_stats;
272273
struct proc_maps_private {
273274
struct inode *inode;
274275
struct task_struct *task;
275276
struct mm_struct *mm;
277+
struct mem_size_stats *rollup;
276278
#ifdef CONFIG_MMU
277279
struct vm_area_struct *tail_vma;
278280
#endif
@@ -288,6 +290,7 @@ extern const struct file_operations proc_tid_maps_operations;
288290
extern const struct file_operations proc_pid_numa_maps_operations;
289291
extern const struct file_operations proc_tid_numa_maps_operations;
290292
extern const struct file_operations proc_pid_smaps_operations;
293+
extern const struct file_operations proc_pid_smaps_rollup_operations;
291294
extern const struct file_operations proc_tid_smaps_operations;
292295
extern const struct file_operations proc_clear_refs_operations;
293296
extern const struct file_operations proc_pagemap_operations;

fs/proc/task_mmu.c

Lines changed: 134 additions & 62 deletions
Original file line numberDiff line numberDiff line change
@@ -253,6 +253,7 @@ static int proc_map_release(struct inode *inode, struct file *file)
253253
if (priv->mm)
254254
mmdrop(priv->mm);
255255

256+
kfree(priv->rollup);
256257
return seq_release_private(inode, file);
257258
}
258259

@@ -279,6 +280,23 @@ static int is_stack(struct proc_maps_private *priv,
279280
vma->vm_end >= vma->vm_mm->start_stack;
280281
}
281282

283+
static void show_vma_header_prefix(struct seq_file *m,
284+
unsigned long start, unsigned long end,
285+
vm_flags_t flags, unsigned long long pgoff,
286+
dev_t dev, unsigned long ino)
287+
{
288+
seq_setwidth(m, 25 + sizeof(void *) * 6 - 1);
289+
seq_printf(m, "%08lx-%08lx %c%c%c%c %08llx %02x:%02x %lu ",
290+
start,
291+
end,
292+
flags & VM_READ ? 'r' : '-',
293+
flags & VM_WRITE ? 'w' : '-',
294+
flags & VM_EXEC ? 'x' : '-',
295+
flags & VM_MAYSHARE ? 's' : 'p',
296+
pgoff,
297+
MAJOR(dev), MINOR(dev), ino);
298+
}
299+
282300
static void
283301
show_map_vma(struct seq_file *m, struct vm_area_struct *vma, int is_pid)
284302
{
@@ -301,17 +319,7 @@ show_map_vma(struct seq_file *m, struct vm_area_struct *vma, int is_pid)
301319

302320
start = vma->vm_start;
303321
end = vma->vm_end;
304-
305-
seq_setwidth(m, 25 + sizeof(void *) * 6 - 1);
306-
seq_printf(m, "%08lx-%08lx %c%c%c%c %08llx %02x:%02x %lu ",
307-
start,
308-
end,
309-
flags & VM_READ ? 'r' : '-',
310-
flags & VM_WRITE ? 'w' : '-',
311-
flags & VM_EXEC ? 'x' : '-',
312-
flags & VM_MAYSHARE ? 's' : 'p',
313-
pgoff,
314-
MAJOR(dev), MINOR(dev), ino);
322+
show_vma_header_prefix(m, start, end, flags, pgoff, dev, ino);
315323

316324
/*
317325
* Print the dentry name for named mappings, and a
@@ -430,6 +438,7 @@ const struct file_operations proc_tid_maps_operations = {
430438

431439
#ifdef CONFIG_PROC_PAGE_MONITOR
432440
struct mem_size_stats {
441+
bool first;
433442
unsigned long resident;
434443
unsigned long shared_clean;
435444
unsigned long shared_dirty;
@@ -443,7 +452,9 @@ struct mem_size_stats {
443452
unsigned long swap;
444453
unsigned long shared_hugetlb;
445454
unsigned long private_hugetlb;
455+
unsigned long first_vma_start;
446456
u64 pss;
457+
u64 pss_locked;
447458
u64 swap_pss;
448459
bool check_shmem_swap;
449460
};
@@ -719,18 +730,36 @@ void __weak arch_show_smap(struct seq_file *m, struct vm_area_struct *vma)
719730

720731
static int show_smap(struct seq_file *m, void *v, int is_pid)
721732
{
733+
struct proc_maps_private *priv = m->private;
722734
struct vm_area_struct *vma = v;
723-
struct mem_size_stats mss;
735+
struct mem_size_stats mss_stack;
736+
struct mem_size_stats *mss;
724737
struct mm_walk smaps_walk = {
725738
.pmd_entry = smaps_pte_range,
726739
#ifdef CONFIG_HUGETLB_PAGE
727740
.hugetlb_entry = smaps_hugetlb_range,
728741
#endif
729742
.mm = vma->vm_mm,
730-
.private = &mss,
731743
};
744+
int ret = 0;
745+
bool rollup_mode;
746+
bool last_vma;
747+
748+
if (priv->rollup) {
749+
rollup_mode = true;
750+
mss = priv->rollup;
751+
if (mss->first) {
752+
mss->first_vma_start = vma->vm_start;
753+
mss->first = false;
754+
}
755+
last_vma = !m_next_vma(priv, vma);
756+
} else {
757+
rollup_mode = false;
758+
memset(&mss_stack, 0, sizeof(mss_stack));
759+
mss = &mss_stack;
760+
}
732761

733-
memset(&mss, 0, sizeof mss);
762+
smaps_walk.private = mss;
734763

735764
#ifdef CONFIG_SHMEM
736765
if (vma->vm_file && shmem_mapping(vma->vm_file->f_mapping)) {
@@ -748,64 +777,81 @@ static int show_smap(struct seq_file *m, void *v, int is_pid)
748777

749778
if (!shmem_swapped || (vma->vm_flags & VM_SHARED) ||
750779
!(vma->vm_flags & VM_WRITE)) {
751-
mss.swap = shmem_swapped;
780+
mss->swap = shmem_swapped;
752781
} else {
753-
mss.check_shmem_swap = true;
782+
mss->check_shmem_swap = true;
754783
smaps_walk.pte_hole = smaps_pte_hole;
755784
}
756785
}
757786
#endif
758787

759788
/* mmap_sem is held in m_start */
760789
walk_page_vma(vma, &smaps_walk);
790+
if (vma->vm_flags & VM_LOCKED)
791+
mss->pss_locked += mss->pss;
792+
793+
if (!rollup_mode) {
794+
show_map_vma(m, vma, is_pid);
795+
} else if (last_vma) {
796+
show_vma_header_prefix(
797+
m, mss->first_vma_start, vma->vm_end, 0, 0, 0, 0);
798+
seq_pad(m, ' ');
799+
seq_puts(m, "[rollup]\n");
800+
} else {
801+
ret = SEQ_SKIP;
802+
}
761803

762-
show_map_vma(m, vma, is_pid);
763-
764-
seq_printf(m,
765-
"Size: %8lu kB\n"
766-
"Rss: %8lu kB\n"
767-
"Pss: %8lu kB\n"
768-
"Shared_Clean: %8lu kB\n"
769-
"Shared_Dirty: %8lu kB\n"
770-
"Private_Clean: %8lu kB\n"
771-
"Private_Dirty: %8lu kB\n"
772-
"Referenced: %8lu kB\n"
773-
"Anonymous: %8lu kB\n"
774-
"LazyFree: %8lu kB\n"
775-
"AnonHugePages: %8lu kB\n"
776-
"ShmemPmdMapped: %8lu kB\n"
777-
"Shared_Hugetlb: %8lu kB\n"
778-
"Private_Hugetlb: %7lu kB\n"
779-
"Swap: %8lu kB\n"
780-
"SwapPss: %8lu kB\n"
781-
"KernelPageSize: %8lu kB\n"
782-
"MMUPageSize: %8lu kB\n"
783-
"Locked: %8lu kB\n",
784-
(vma->vm_end - vma->vm_start) >> 10,
785-
mss.resident >> 10,
786-
(unsigned long)(mss.pss >> (10 + PSS_SHIFT)),
787-
mss.shared_clean >> 10,
788-
mss.shared_dirty >> 10,
789-
mss.private_clean >> 10,
790-
mss.private_dirty >> 10,
791-
mss.referenced >> 10,
792-
mss.anonymous >> 10,
793-
mss.lazyfree >> 10,
794-
mss.anonymous_thp >> 10,
795-
mss.shmem_thp >> 10,
796-
mss.shared_hugetlb >> 10,
797-
mss.private_hugetlb >> 10,
798-
mss.swap >> 10,
799-
(unsigned long)(mss.swap_pss >> (10 + PSS_SHIFT)),
800-
vma_kernel_pagesize(vma) >> 10,
801-
vma_mmu_pagesize(vma) >> 10,
802-
(vma->vm_flags & VM_LOCKED) ?
803-
(unsigned long)(mss.pss >> (10 + PSS_SHIFT)) : 0);
804-
805-
arch_show_smap(m, vma);
806-
show_smap_vma_flags(m, vma);
804+
if (!rollup_mode)
805+
seq_printf(m,
806+
"Size: %8lu kB\n"
807+
"KernelPageSize: %8lu kB\n"
808+
"MMUPageSize: %8lu kB\n",
809+
(vma->vm_end - vma->vm_start) >> 10,
810+
vma_kernel_pagesize(vma) >> 10,
811+
vma_mmu_pagesize(vma) >> 10);
812+
813+
814+
if (!rollup_mode || last_vma)
815+
seq_printf(m,
816+
"Rss: %8lu kB\n"
817+
"Pss: %8lu kB\n"
818+
"Shared_Clean: %8lu kB\n"
819+
"Shared_Dirty: %8lu kB\n"
820+
"Private_Clean: %8lu kB\n"
821+
"Private_Dirty: %8lu kB\n"
822+
"Referenced: %8lu kB\n"
823+
"Anonymous: %8lu kB\n"
824+
"LazyFree: %8lu kB\n"
825+
"AnonHugePages: %8lu kB\n"
826+
"ShmemPmdMapped: %8lu kB\n"
827+
"Shared_Hugetlb: %8lu kB\n"
828+
"Private_Hugetlb: %7lu kB\n"
829+
"Swap: %8lu kB\n"
830+
"SwapPss: %8lu kB\n"
831+
"Locked: %8lu kB\n",
832+
mss->resident >> 10,
833+
(unsigned long)(mss->pss >> (10 + PSS_SHIFT)),
834+
mss->shared_clean >> 10,
835+
mss->shared_dirty >> 10,
836+
mss->private_clean >> 10,
837+
mss->private_dirty >> 10,
838+
mss->referenced >> 10,
839+
mss->anonymous >> 10,
840+
mss->lazyfree >> 10,
841+
mss->anonymous_thp >> 10,
842+
mss->shmem_thp >> 10,
843+
mss->shared_hugetlb >> 10,
844+
mss->private_hugetlb >> 10,
845+
mss->swap >> 10,
846+
(unsigned long)(mss->swap_pss >> (10 + PSS_SHIFT)),
847+
(unsigned long)(mss->pss >> (10 + PSS_SHIFT)));
848+
849+
if (!rollup_mode) {
850+
arch_show_smap(m, vma);
851+
show_smap_vma_flags(m, vma);
852+
}
807853
m_cache_vma(m, vma);
808-
return 0;
854+
return ret;
809855
}
810856

811857
static int show_pid_smap(struct seq_file *m, void *v)
@@ -837,6 +883,25 @@ static int pid_smaps_open(struct inode *inode, struct file *file)
837883
return do_maps_open(inode, file, &proc_pid_smaps_op);
838884
}
839885

886+
static int pid_smaps_rollup_open(struct inode *inode, struct file *file)
887+
{
888+
struct seq_file *seq;
889+
struct proc_maps_private *priv;
890+
int ret = do_maps_open(inode, file, &proc_pid_smaps_op);
891+
892+
if (ret < 0)
893+
return ret;
894+
seq = file->private_data;
895+
priv = seq->private;
896+
priv->rollup = kzalloc(sizeof(*priv->rollup), GFP_KERNEL);
897+
if (!priv->rollup) {
898+
proc_map_release(inode, file);
899+
return -ENOMEM;
900+
}
901+
priv->rollup->first = true;
902+
return 0;
903+
}
904+
840905
static int tid_smaps_open(struct inode *inode, struct file *file)
841906
{
842907
return do_maps_open(inode, file, &proc_tid_smaps_op);
@@ -849,6 +914,13 @@ const struct file_operations proc_pid_smaps_operations = {
849914
.release = proc_map_release,
850915
};
851916

917+
const struct file_operations proc_pid_smaps_rollup_operations = {
918+
.open = pid_smaps_rollup_open,
919+
.read = seq_read,
920+
.llseek = seq_lseek,
921+
.release = proc_map_release,
922+
};
923+
852924
const struct file_operations proc_tid_smaps_operations = {
853925
.open = tid_smaps_open,
854926
.read = seq_read,

0 commit comments

Comments
 (0)