• Tejun Heo's avatar
    workqueue: dump workqueues on sysrq-t · 3494fc30
    Tejun Heo authored
    
    
    Workqueues are used extensively throughout the kernel but sometimes
    it's difficult to debug stalls involving work items because visibility
    into its inner workings is fairly limited.  Although sysrq-t task dump
    annotates each active worker task with the information on the work
    item being executed, it is challenging to find out which work items
    are pending or delayed on which queues and how pools are being
    managed.
    
    This patch implements show_workqueue_state() which dumps all busy
    workqueues and pools and is called from the sysrq-t handler.  At the
    end of sysrq-t dump, something like the following is printed.
    
     Showing busy workqueues and worker pools:
     ...
     workqueue filler_wq: flags=0x0
       pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=2/256
         in-flight: 491:filler_workfn, 507:filler_workfn
       pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=2/256
         in-flight: 501:filler_workfn
         pending: filler_workfn
     ...
     workqueue test_wq: flags=0x8
       pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/1
         in-flight: 510(RESCUER):test_workfn BAR(69) BAR(500)
         delayed: test_workfn1 BAR(492), test_workfn2
     ...
     pool 0: cpus=0 node=0 flags=0x0 nice=0 workers=2 manager: 137
     pool 2: cpus=1 node=0 flags=0x0 nice=0 workers=3 manager: 469
     pool 3: cpus=1 node=0 flags=0x0 nice=-20 workers=2 idle: 16
     pool 8: cpus=0-3 flags=0x4 nice=0 workers=2 manager: 62
    
    The above shows that test_wq is executing test_workfn() on pid 510
    which is the rescuer and also that there are two tasks 69 and 500
    waiting for the work item to finish in flush_work().  As test_wq has
    max_active of 1, there are two work items for test_workfn1() and
    test_workfn2() which are delayed till the current work item is
    finished.  In addition, pid 492 is flushing test_workfn1().
    
    The work item for test_workfn() is being executed on pwq of pool 2
    which is the normal priority per-cpu pool for CPU 1.  The pool has
    three workers, two of which are executing filler_workfn() for
    filler_wq and the last one is assuming the manager role trying to
    create more workers.
    
    This extra workqueue state dump will hopefully help chasing down hangs
    involving workqueues.
    
    v3: cpulist_pr_cont() replaced with "%*pbl" printf formatting.
    
    v2: As suggested by Andrew, minor formatting change in pr_cont_work(),
        printk()'s replaced with pr_info()'s, and cpumask printing now
        uses cpulist_pr_cont().
    Signed-off-by: default avatarTejun Heo <tj@kernel.org>
    Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Andrew Morton <akpm@linux-foundation.org>
    CC: Ingo Molnar <mingo@redhat.com>
    3494fc30
sysrq.c 26.1 KB