Skip to content
  • Emilio G. Cota's avatar
    qsp: QEMU's Synchronization Profiler · fe9959a2
    Emilio G. Cota authored
    
    
    The goal of this module is to profile synchronization primitives (i.e.
    mutexes, recursive mutexes and condition variables) so that scalability
    issues can be quickly diagnosed.
    
    Sync primitives are profiled by QSP based on the vaddr of the object accessed
    as well as the call site (file:line_nr). That means the same object called
    from two different call sites will be tracked in separate entries, which
    might be reported together or separately (see subsequent commit on
    call site coalescing).
    
    Some perf numbers:
    
    Host: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
    Command: taskset -c 0 tests/atomic_add-bench -d 5 -m
    
    - Before: 54.80 Mops/s
    - After:  54.75 Mops/s
    
    That is, a negligible slowdown due to the now indirect call to
    qemu_mutex_lock. Note that using a branch instead of an indirect
    call introduces a more severe slowdown (53.65 Mops/s, i.e. 2% slowdown).
    
    Enabling the profiler (with -p, added in this series) is more interesting:
    
    - No profiling: 54.75 Mops/s
    - W/ profiling: 12.53 Mops/s
    
    That is, a 4.36X slowdown.
    
    We can break down this slowdown by removing the get_clock calls or
    the entry lookup:
    
    - No profiling:     54.75 Mops/s
    - W/o get_clock:    25.37 Mops/s
    - W/o entry lookup: 19.30 Mops/s
    - W/ profiling:     12.53 Mops/s
    
    Signed-off-by: default avatarEmilio G. Cota <cota@braap.org>
    Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
    fe9959a2