• Mel Gorman's avatar
    mremap: remove LATENCY_LIMIT from mremap to reduce the number of TLB shootdowns · 37a4094e
    Mel Gorman authored
    Commit 5d190420 ("mremap: fix race between mremap() and page
    cleanning") fixed races between mremap and other operations for both
    file-backed and anonymous mappings.  The file-backed was the most
    critical as it allowed the possibility that data could be changed on a
    physical page after page_mkclean returned which could trigger data loss
    or data integrity issues.
    
    A customer reported that the cost of the TLBs for anonymous regressions
    was excessive and resulting in a 30-50% drop in performance overall
    since this commit on a microbenchmark.  Unfortunately I neither have
    access to the test-case nor can I describe what it does other than
    saying that mremap operations dominate heavily.
    
    This patch removes the LATENCY_LIMIT to handle TLB flushes on a PMD
    boundary instead of every 64 pages to reduce the number of TLB
    shootdowns by a factor of 8 in the ideal case.  LATENCY_LIMIT was almost
    certainly used originally to limit the PTL hold times but the latency
    savings are likely offset by the cost of IPIs in many cases.  This patch
    is not reported to completely restore performance but gets it within an
    acceptable percentage.  The given metric here is simply described as
    "higher is better".
    
    Baseline that was known good
    002:  Metric:       91.05
    004:  Metric:      109.45
    008:  Metric:       73.08
    016:  Metric:       58.14
    032:  Metric:       61.09
    064:  Metric:       57.76
    128:  Metric:       55.43
    
    Current
    001:  Metric:       54.98
    002:  Metric:       56.56
    004:  Metric:       41.22
    008:  Metric:       35.96
    016:  Metric:       36.45
    032:  Metric:       35.71
    064:  Metric:       35.73
    128:  Metric:       34.96
    
    With patch
    001:  Metric:       61.43
    002:  Metric:       81.64
    004:  Metric:       67.92
    008:  Metric:       51.67
    016:  Metric:       50.47
    032:  Metric:       52.29
    064:  Metric:       50.01
    128:  Metric:       49.04
    
    So for low threads, it's not restored but for larger number of threads,
    it's closer to the "known good" baseline.
    
    Using a different mremap-intensive workload that is not representative
    of the real workload there is little difference observed outside of
    noise in the headline metrics However, the TLB shootdowns are reduced by
    11% on average and at the peak, TLB shootdowns were reduced by 21%.
    Interrupts were sampled every second while the workload ran to get those
    figures.  It's known that the figures will vary as the
    non-representative load is non-deterministic.
    
    An alternative patch was posted that should have significantly reduced
    the TLB flushes but unfortunately it does not perform as well as this
    version on the customer test case.  If revisited, the two patches can
    stack on top of each other.
    
    Link: http://lkml.kernel.org/r/20180606183803.k7qaw2xnbvzshv34@techsingularity.netSigned-off-by: 's avatarMel Gorman <mgorman@techsingularity.net>
    Reviewed-by: 's avatarAndrew Morton <akpm@linux-foundation.org>
    Cc: Nadav Amit <nadav.amit@gmail.com>
    Cc: Dave Hansen <dave.hansen@intel.com>
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: Vlastimil Babka <vbabka@suse.cz>
    Cc: Aaron Lu <aaron.lu@intel.com>
    Cc: Hugh Dickins <hughd@google.com>
    Signed-off-by: 's avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: 's avatarLinus Torvalds <torvalds@linux-foundation.org>
    37a4094e