Skip to content
  • Vlastimil Babka's avatar
    mm: munlock: remove unnecessary call to lru_add_drain() · 586a32ac
    Vlastimil Babka authored
    In munlock_vma_range(), lru_add_drain() is currently called in a loop
    before each munlock_vma_page() call.
    
    This is suboptimal for performance when munlocking many pages.  The
    benefits of per-cpu pagevec for batching the LRU putback are removed since
    the pagevec only holds at most one page from the previous loop's
    iteration.
    
    The lru_add_drain() call also does not serve any purposes for correctness
    - it does not even drain pagavecs of all cpu's.  The munlock code already
    expects and handles situations where a page cannot be isolated from the
    LRU (e.g.  because it is on some per-cpu pagevec).
    
    The history of the (not commented) call also suggest that it appears there
    as an oversight rather than intentionally.  Before commit ff6a6da6 ("mm:
    accelerate munlock() treatment of THP pages") the call happened only once
    upon entering the function.  The commit has moved the call into the while
    loope.  So while the other changes in the commit improved munlock
    performance for THP pages, it introduced the abovementioned suboptimal
    per-cpu pagevec usage.
    
    Further in history, before commit 408e82b7 ("mm: munlock use
    follow_page"), munlock_vma_pages_range() was just a wrapper around
    __mlock_vma_pages_range which performed both mlock and munlock depending
    on a flag.  However, before ba470de4 ("mmap: handle mlocked pages during
    map, remap, unmap") the function handled only mlock, not munlock.  The
    lru_add_drain call thus comes from the implementation in commit b291f000
    
    
    ("mlock: mlocked pages are unevictable" and was intended only for
    mlocking, not munlocking.  The original intention of draining the LRU
    pagevec at mlock time was to ensure the pages were on the LRU before the
    lock operation so that they could be placed on the unevictable list
    immediately.  There is very little motivation to do the same in the
    munlock path this, particularly for every single page.
    
    This patch therefore removes the call completely.  After removing the
    call, a 10% speedup was measured for munlock() of a 56GB large memory area
    with THP disabled.
    
    Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
    Reviewed-by: default avatarJörn Engel <joern@logfs.org>
    Acked-by: default avatarMel Gorman <mgorman@suse.de>
    Cc: Michel Lespinasse <walken@google.com>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: Rik van Riel <riel@redhat.com>
    Cc: Johannes Weiner <hannes@cmpxchg.org>
    Cc: Michal Hocko <mhocko@suse.cz>
    Cc: Vlastimil Babka <vbabka@suse.cz>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    586a32ac