Skip to content

Commit 8c7c6e3

Browse files
hkamezawatorvalds
authored andcommitted
memcg: mem+swap controller core
This patch implements per cgroup limit for usage of memory+swap. However there are SwapCache, double counting of swap-cache and swap-entry is avoided. Mem+Swap controller works as following. - memory usage is limited by memory.limit_in_bytes. - memory + swap usage is limited by memory.memsw_limit_in_bytes. This has following benefits. - A user can limit total resource usage of mem+swap. Without this, because memory resource controller doesn't take care of usage of swap, a process can exhaust all the swap (by memory leak.) We can avoid this case. And Swap is shared resource but it cannot be reclaimed (goes back to memory) until it's used. This characteristic can be trouble when the memory is divided into some parts by cpuset or memcg. Assume group A and group B. After some application executes, the system can be.. Group A -- very large free memory space but occupy 99% of swap. Group B -- under memory shortage but cannot use swap...it's nearly full. Ability to set appropriate swap limit for each group is required. Maybe someone wonder "why not swap but mem+swap ?" - The global LRU(kswapd) can swap out arbitrary pages. Swap-out means to move account from memory to swap...there is no change in usage of mem+swap. In other words, when we want to limit the usage of swap without affecting global LRU, mem+swap limit is better than just limiting swap. Accounting target information is stored in swap_cgroup which is per swap entry record. Charge is done as following. map - charge page and memsw. unmap - uncharge page/memsw if not SwapCache. swap-out (__delete_from_swap_cache) - uncharge page - record mem_cgroup information to swap_cgroup. swap-in (do_swap_page) - charged as page and memsw. record in swap_cgroup is cleared. memsw accounting is decremented. swap-free (swap_free()) - if swap entry is freed, memsw is uncharged by PAGE_SIZE. There are people work under never-swap environments and consider swap as something bad. For such people, this mem+swap controller extension is just an overhead. This overhead is avoided by config or boot option. (see Kconfig. detail is not in this patch.) TODO: - maybe more optimization can be don in swap-in path. (but not very safe.) But we just do simple accounting at this stage. [nishimura@mxp.nes.nec.co.jp: make resize limit hold mutex] [hugh@veritas.com: memswap controller core swapcache fixes] Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
1 parent 27a7faa commit 8c7c6e3

File tree

8 files changed

+440
-54
lines changed

8 files changed

+440
-54
lines changed

Documentation/controllers/memory.txt

Lines changed: 27 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -137,12 +137,32 @@ behind this approach is that a cgroup that aggressively uses a shared
137137
page will eventually get charged for it (once it is uncharged from
138138
the cgroup that brought it in -- this will happen on memory pressure).
139139

140-
Exception: When you do swapoff and make swapped-out pages of shmem(tmpfs) to
140+
Exception: If CONFIG_CGROUP_CGROUP_MEM_RES_CTLR_SWAP is not used..
141+
When you do swapoff and make swapped-out pages of shmem(tmpfs) to
141142
be backed into memory in force, charges for pages are accounted against the
142143
caller of swapoff rather than the users of shmem.
143144

144145

145-
2.4 Reclaim
146+
2.4 Swap Extension (CONFIG_CGROUP_MEM_RES_CTLR_SWAP)
147+
Swap Extension allows you to record charge for swap. A swapped-in page is
148+
charged back to original page allocator if possible.
149+
150+
When swap is accounted, following files are added.
151+
- memory.memsw.usage_in_bytes.
152+
- memory.memsw.limit_in_bytes.
153+
154+
usage of mem+swap is limited by memsw.limit_in_bytes.
155+
156+
Note: why 'mem+swap' rather than swap.
157+
The global LRU(kswapd) can swap out arbitrary pages. Swap-out means
158+
to move account from memory to swap...there is no change in usage of
159+
mem+swap.
160+
161+
In other words, when we want to limit the usage of swap without affecting
162+
global LRU, mem+swap limit is better than just limiting swap from OS point
163+
of view.
164+
165+
2.5 Reclaim
146166

147167
Each cgroup maintains a per cgroup LRU that consists of an active
148168
and inactive list. When a cgroup goes over its limit, we first try
@@ -246,6 +266,11 @@ Such charges are freed(at default) or moved to its parent. When moved,
246266
both of RSS and CACHES are moved to parent.
247267
If both of them are busy, rmdir() returns -EBUSY. See 5.1 Also.
248268

269+
Charges recorded in swap information is not updated at removal of cgroup.
270+
Recorded information is discarded and a cgroup which uses swap (swapcache)
271+
will be charged as a new owner of it.
272+
273+
249274
5. Misc. interfaces.
250275

251276
5.1 force_empty

include/linux/memcontrol.h

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,8 @@ extern int mem_cgroup_newpage_charge(struct page *page, struct mm_struct *mm,
3232
/* for swap handling */
3333
extern int mem_cgroup_try_charge(struct mm_struct *mm,
3434
gfp_t gfp_mask, struct mem_cgroup **ptr);
35+
extern int mem_cgroup_try_charge_swapin(struct mm_struct *mm,
36+
struct page *page, gfp_t mask, struct mem_cgroup **ptr);
3537
extern void mem_cgroup_commit_charge_swapin(struct page *page,
3638
struct mem_cgroup *ptr);
3739
extern void mem_cgroup_cancel_charge_swapin(struct mem_cgroup *ptr);
@@ -80,7 +82,6 @@ extern long mem_cgroup_calc_reclaim(struct mem_cgroup *mem, struct zone *zone,
8082
#ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
8183
extern int do_swap_account;
8284
#endif
83-
8485
#else /* CONFIG_CGROUP_MEM_RES_CTLR */
8586
struct mem_cgroup;
8687

@@ -97,7 +98,13 @@ static inline int mem_cgroup_cache_charge(struct page *page,
9798
}
9899

99100
static inline int mem_cgroup_try_charge(struct mm_struct *mm,
100-
gfp_t gfp_mask, struct mem_cgroup **ptr)
101+
gfp_t gfp_mask, struct mem_cgroup **ptr)
102+
{
103+
return 0;
104+
}
105+
106+
static inline int mem_cgroup_try_charge_swapin(struct mm_struct *mm,
107+
struct page *page, gfp_t gfp_mask, struct mem_cgroup **ptr)
101108
{
102109
return 0;
103110
}

include/linux/swap.h

Lines changed: 11 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -214,7 +214,7 @@ static inline void lru_cache_add_active_file(struct page *page)
214214
extern unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
215215
gfp_t gfp_mask);
216216
extern unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem,
217-
gfp_t gfp_mask);
217+
gfp_t gfp_mask, bool noswap);
218218
extern int __isolate_lru_page(struct page *page, int mode, int file);
219219
extern unsigned long shrink_all_memory(unsigned long nr_pages);
220220
extern int vm_swappiness;
@@ -336,15 +336,23 @@ static inline void disable_swap_token(void)
336336
#ifdef CONFIG_CGROUP_MEM_RES_CTLR
337337
extern int mem_cgroup_cache_charge_swapin(struct page *page,
338338
struct mm_struct *mm, gfp_t mask, bool locked);
339-
extern void mem_cgroup_uncharge_swapcache(struct page *page);
339+
extern void mem_cgroup_uncharge_swapcache(struct page *page, swp_entry_t ent);
340340
#else
341341
static inline
342342
int mem_cgroup_cache_charge_swapin(struct page *page,
343343
struct mm_struct *mm, gfp_t mask, bool locked)
344344
{
345345
return 0;
346346
}
347-
static inline void mem_cgroup_uncharge_swapcache(struct page *page)
347+
static inline void
348+
mem_cgroup_uncharge_swapcache(struct page *page, swp_entry_t ent)
349+
{
350+
}
351+
#endif
352+
#ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
353+
extern void mem_cgroup_uncharge_swap(swp_entry_t ent);
354+
#else
355+
static inline void mem_cgroup_uncharge_swap(swp_entry_t ent)
348356
{
349357
}
350358
#endif

0 commit comments

Comments
 (0)