• Manfred Spraul's avatar
    ipc/sem.c: cacheline align the semaphore structures · f5c936c0
    Manfred Spraul authored
    As now each semaphore has its own spinlock and parallel operations are
    possible, give each semaphore its own cacheline.
    
    On a i3 laptop, this gives up to 28% better performance:
    
      #semscale 10 | grep "interleave 2"
      - before:
      Cpus 1, interleave 2 delay 0: 36109234 in 10 secs
      Cpus 2, interleave 2 delay 0: 55276317 in 10 secs
      Cpus 3, interleave 2 delay 0: 62411025 in 10 secs
      Cpus 4, interleave 2 delay 0: 81963928 in 10 secs
    
      -after:
      Cpus 1, interleave 2 delay 0: 35527306 in 10 secs
      Cpus 2, interleave 2 delay 0: 70922909 in 10 secs <<< + 28%
      Cpus 3, interleave 2 delay 0: 80518538 in 10 secs
      Cpus 4, interleave 2 delay 0: 89115148 in 10 secs <<< + 8.7%
    
    i3, with 2 cores and with hyperthreading enabled.  Interleave 2 in order
    use first the full cores.  HT partially hides the delay from cacheline
    trashing, thus the improvement is "only" 8.7% if 4 threads are running.
    Signed-off-by: 's avatarManfred Spraul <manfred@colorfullife.com>
    Cc: Rik van Riel <riel@redhat.com>
    Cc: Davidlohr Bueso <davidlohr.bueso@hp.com>
    Signed-off-by: 's avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: 's avatarLinus Torvalds <torvalds@linux-foundation.org>
    f5c936c0
sem.c 47.3 KB