Skip to content
  • Roman Penyaev's avatar
    epoll: use rwlock in order to reduce ep_poll_callback() contention · fe1743ef
    Roman Penyaev authored
    The goal of this patch is to reduce contention of ep_poll_callback() which
    can be called concurrently from different CPUs in case of high events
    rates and many fds per epoll.  Problem can be very well reproduced by
    generating events (write to pipe or eventfd) from many threads, while
    consumer thread does polling.  In other words this patch increases the
    bandwidth of events which can be delivered from sources to the poller by
    adding poll items in a lockless way to the list.
    
    The main change is in replacement of the spinlock with a rwlock, which is
    taken on read in ep_poll_callback(), and then by adding poll items to the
    tail of the list using xchg atomic instruction.  Write lock is taken
    everywhere else in order to stop list modifications and guarantee that
    list updates are fully completed (I assume that write side of a rwlock
    does not starve, it seems qrwlock implementation has these guarantees).
    
    The following are some microbenchmark results based on the test [1] which
    starts threads which generate N events each.  The test ends when all
    events are successfully fetched by the poller thread:
    
     spinlock
     ========
    
     threads  events/ms  run-time ms
           8       6402        12495
          16       7045        22709
          32       7395        43268
    
     rwlock + xchg
     =============
    
     threads  events/ms  run-time ms
           8      10038         7969
          16      12178        13138
          32      13223        24199
    
    According to the results bandwidth of delivered events is significantly
    increased, thus execution time is reduced.
    
    This patch was tested with different sort of microbenchmarks and
    artificial delays (e.g.  "udelay(get_random_int() & 0xff)") introduced in
    kernel on paths where items are added to lists.
    
    [1] https://github.com/rouming/test-tools/blob/master/stress-epoll.c
    
    Link: http://lkml.kernel.org/r/20190103150104.17128-5-rpenyaev@suse.de
    
    
    Signed-off-by: default avatarRoman Penyaev <rpenyaev@suse.de>
    Cc: Davidlohr Bueso <dbueso@suse.de>
    Cc: Jason Baron <jbaron@akamai.com>
    Cc: Al Viro <viro@zeniv.linux.org.uk>
    Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarStephen Rothwell <sfr@canb.auug.org.au>
    fe1743ef