• KAMEZAWA Hiroyuki's avatar
    memcg: use new logic for page stat accounting · 89c06bd5
    KAMEZAWA Hiroyuki authored
    Now, page-stat-per-memcg is recorded into per page_cgroup flag by
    duplicating page's status into the flag.  The reason is that memcg has a
    feature to move a page from a group to another group and we have race
    between "move" and "page stat accounting",
    Under current logic, assume CPU-A and CPU-B.  CPU-A does "move" and CPU-B
    does "page stat accounting".
    When CPU-A goes 1st,
                CPU-A                           CPU-B
                                        update "struct page" info.
        see pc->flags
        copy page stat to new group
        overwrite pc->mem_cgroup.
                                        set pc->flags
                                        update page stat accounting
    stat accounting is guarded by move_lock_mem_cgroup() and "move" logic
    (CPU-A) doesn't see changes in "struct page" information.
    But it's costly to have the same information both in 'struct page' and
    'struct page_cgroup'.  And, there is a potential problem.
    For example, assume we have PG_dirty accounting in memcg.
    PG_..is a flag for struct page.
    PCG_ is a flag for struct page_cgroup.
    (This is just an example. The same problem can be found in any
     kind of page stat accounting.)
    	  CPU-A                               CPU-B
          TestSet PG_dirty
          (delay)                        TestClear PG_dirty
                                         if (TestClear(PCG_dirty))
          if (TestSet(PCG_dirty))
    Here, memcg->nr_dirty = +1, this is wrong.  This race was reported by Greg
    Thelen <gthelen@google.com>.  Now, only FILE_MAPPED is supported but
    fortunately, it's serialized by page table lock and this is not real bug,
    If this potential problem is caused by having duplicated information in
    struct page and struct page_cgroup, we may be able to fix this by using
    original 'struct page' information.  But we'll have a problem in "move
    Assume we use only PG_dirty.
             CPU-A                   CPU-B
        TestSet PG_dirty
        (delay)                    move_lock_mem_cgroup()
                                   if (PageDirty(page))
                                   pc->mem_cgroup = new_memcg;
        memcg = pc->mem_cgroup
    accounting information may be double-counted.  This was original reason to
    have PCG_xxx flags but it seems PCG_xxx has another problem.
    I think we need a bigger lock as
         update page stats (without any checks)
    This fixes both of problems and we don't have to duplicate page flag into
    page_cgroup.  Please note: move_lock_mem_cgroup() is held only when there
    are possibility of "account move" under the system.  So, in most path,
    status update will go without atomic locks.
    This patch introduces mem_cgroup_begin_update_page_stat() and
    mem_cgroup_end_update_page_stat() both should be called at modifying
    'struct page' information if memcg takes care of it.  as
         modify page information
         => never check any 'struct page' info, just update counters.
    This patch is slow because we need to call begin_update_page_stat()/
    end_update_page_stat() regardless of accounted will be changed or not.  A
    following patch adds an easy optimization and reduces the cost.
    [akpm@linux-foundation.org: s/lock/locked/]
    [hughd@google.com: fix deadlock by avoiding stat lock when anon]
    Signed-off-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
    Cc: Greg Thelen <gthelen@google.com>
    Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
    Cc: Michal Hocko <mhocko@suse.cz>
    Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
    Cc: Ying Han <yinghan@google.com>
    Signed-off-by: default avatarHugh Dickins <hughd@google.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
rmap.c 52.6 KB