Skip to content
  • Vlastimil Babka's avatar
    mm: munlock: fix potential race with THP page split · 01cc2e58
    Vlastimil Babka authored
    Since commit ff6a6da6 ("mm: accelerate munlock() treatment of THP
    pages") munlock skips tail pages of a munlocked THP page.  There is some
    attempt to prevent bad consequences of racing with a THP page split, but
    code inspection indicates that there are two problems that may lead to a
    non-fatal, yet wrong outcome.
    
    First, __split_huge_page_refcount() copies flags including PageMlocked
    from the head page to the tail pages.  Clearing PageMlocked by
    munlock_vma_page() in the middle of this operation might result in part
    of tail pages left with PageMlocked flag.  As the head page still
    appears to be a THP page until all tail pages are processed,
    munlock_vma_page() might think it munlocked the whole THP page and skip
    all the former tail pages.  Before ff6a6da6, those pages would be
    cleared in further iterations of munlock_vma_pages_range(), but NR_MLOCK
    would still become undercounted (related the next point).
    
    Second, NR_MLOCK accounting is based on call to hpage_nr_pages() after
    the PageMlocked is cleared.  The accounting might also become
    inconsistent due to race with __split_huge_page_refcount()
    
    - undercount when HUGE_PMD_NR is subtracted, but some tail pages are
      left with PageMlocked set and counted again (only possible before
      ff6a6da6
    
    )
    
    - overcount when hpage_nr_pages() sees a normal page (split has already
      finished), but the parallel split has meanwhile cleared PageMlocked from
      additional tail pages
    
    This patch prevents both problems via extending the scope of lru_lock in
    munlock_vma_page().  This is convenient because:
    
    - __split_huge_page_refcount() takes lru_lock for its whole operation
    
    - munlock_vma_page() typically takes lru_lock anyway for page isolation
    
    As this becomes a second function where page isolation is done with
    lru_lock already held, factor this out to a new
    __munlock_isolate_lru_page() function and clean up the code around.
    
    [akpm@linux-foundation.org: avoid a coding-style ugly]
    Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
    Cc: Sasha Levin <sasha.levin@oracle.com>
    Cc: Michel Lespinasse <walken@google.com>
    Cc: Andrea Arcangeli <aarcange@redhat.com>
    Cc: Rik van Riel <riel@redhat.com>
    Cc: Mel Gorman <mgorman@suse.de>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    01cc2e58