Skip to content
  • Jiri Olsa's avatar
    perf tools: Fix struct comm_str removal crash · 46b3722c
    Jiri Olsa authored
    We occasionaly hit following assert failure in 'perf top', when processing the
    /proc info in multiple threads.
    
      perf: ...include/linux/refcount.h:109: refcount_inc:
            Assertion `!(!refcount_inc_not_zero(r))' failed.
    
    The gdb backtrace looks like this:
    
      [Switching to Thread 0x7ffff11ba700 (LWP 13749)]
      0x00007ffff50839fb in raise () from /lib64/libc.so.6
      (gdb)
      #0  0x00007ffff50839fb in raise () from /lib64/libc.so.6
      #1  0x00007ffff5085800 in abort () from /lib64/libc.so.6
      #2  0x00007ffff507c0da in __assert_fail_base () from /lib64/libc.so.6
      #3  0x00007ffff507c152 in __assert_fail () from /lib64/libc.so.6
      #4  0x0000000000535373 in refcount_inc (r=0x7fffdc009be0)
          at ...include/linux/refcount.h:109
      #5  0x00000000005354f1 in comm_str__get (cs=0x7fffdc009bc0)
          at util/comm.c:24
      #6  0x00000000005356bd in __comm_str__findnew (str=0x7fffd000b260 ":2",
          root=0xbed5c0 <comm_str_root>) at util/comm.c:72
      #7  0x000000000053579e in comm_str__findnew (str=0x7fffd000b260 ":2",
          root=0xbed5c0 <comm_str_root>) at util/comm.c:95
      #8  0x000000000053582e in comm__new (str=0x7fffd000b260 ":2",
          timestamp=0, exec=false) at util/comm.c:111
      #9  0x00000000005363bc in thread__new (pid=2, tid=2) at util/thread.c:57
      #10 0x0000000000523da0 in ____machine__findnew_thread (machine=0xbfde38,
          threads=0xbfdf28, pid=2, tid=2, create=true) at util/machine.c:457
      #11
    
     0x0000000000523eb4 in __machine__findnew_thread (machine=0xbfde38,
      ...
    
    The failing assertion is this one:
    
      REFCOUNT_WARN(!refcount_inc_not_zero(r), ...
    
    The problem is that we keep global comm_str_root list, which
    is accessed by multiple threads during the 'perf top' startup
    and following 2 paths can race:
    
      thread 1:
        ...
        thread__new
          comm__new
            comm_str__findnew
              down_write(&comm_str_lock);
              __comm_str__findnew
                comm_str__get
    
      thread 2:
        ...
        comm__override or comm__free
          comm_str__put
            refcount_dec_and_test
              down_write(&comm_str_lock);
              rb_erase(&cs->rb_node, &comm_str_root);
    
    Because thread 2 first decrements the refcnt and only after then it removes the
    struct comm_str from the list, the thread 1 can find this object on the list
    with refcnt equls to 0 and hit the assert.
    
    This patch fixes the thread 1 __comm_str__findnew path, by ignoring objects
    that already dropped the refcnt to 0. For the rest of the objects we take the
    refcnt before comparing its name and release it afterwards with comm_str__put,
    which can also release the object completely.
    
    Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
    Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
    Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
    Cc: Andi Kleen <ak@linux.intel.com>
    Cc: David Ahern <dsahern@gmail.com>
    Cc: Kan Liang <kan.liang@linux.intel.com>
    Cc: Lukasz Odzioba <lukasz.odzioba@intel.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Wang Nan <wangnan0@huawei.com>
    Cc: kernel-team@lge.com
    Link: http://lkml.kernel.org/r/20180720101740.GA27176@krava
    
    
    Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
    46b3722c