Skip to content
  • Suresh Siddha's avatar
    sched: Fix SMT scheduler regression in find_busiest_queue() · 9000f05c
    Suresh Siddha authored
    Fix a SMT scheduler performance regression that is leading to a scenario
    where SMT threads in one core are completely idle while both the SMT threads
    in another core (on the same socket) are busy.
    
    This is caused by this commit (with the problematic code highlighted)
    
       commit bdb94aa5
    
    
       Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
       Date:   Tue Sep 1 10:34:38 2009 +0200
    
       sched: Try to deal with low capacity
    
       @@ -4203,15 +4223,18 @@ find_busiest_queue()
       ...
    	for_each_cpu(i, sched_group_cpus(group)) {
       +	unsigned long power = power_of(i);
    
       ...
    
       -	wl = weighted_cpuload(i);
       +	wl = weighted_cpuload(i) * SCHED_LOAD_SCALE;
       +	wl /= power;
    
       -	if (rq->nr_running == 1 && wl > imbalance)
       +	if (capacity && rq->nr_running == 1 && wl > imbalance)
    		continue;
    
    On a SMT system, power of the HT logical cpu will be 589 and
    the scheduler load imbalance (for scenarios like the one mentioned above)
    can be approximately 1024 (SCHED_LOAD_SCALE). The above change of scaling
    the weighted load with the power will result in "wl > imbalance" and
    ultimately resulting in find_busiest_queue() return NULL, causing
    load_balance() to think that the load is well balanced. But infact
    one of the tasks can be moved to the idle core for optimal performance.
    
    We don't need to use the weighted load (wl) scaled by the cpu power to
    compare with  imabalance. In that condition, we already know there is only a
    single task "rq->nr_running == 1" and the comparison between imbalance,
    wl is to make sure that we select the correct priority thread which matches
    imbalance. So we really need to compare the imabalnce with the original
    weighted load of the cpu and not the scaled load.
    
    But in other conditions where we want the most hammered(busiest) cpu, we can
    use scaled load to ensure that we consider the cpu power in addition to the
    actual load on that cpu, so that we can move the load away from the
    guy that is getting most hammered with respect to the actual capacity,
    as compared with the rest of the cpu's in that busiest group.
    
    Fix it.
    
    Reported-by: default avatarMa Ling <ling.ma@intel.com>
    Initial-Analysis-by: default avatarZhang, Yanmin <yanmin_zhang@linux.intel.com>
    Signed-off-by: default avatarSuresh Siddha <suresh.b.siddha@intel.com>
    Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
    LKML-Reference: <1266023662.2808.118.camel@sbs-t61.sc.intel.com>
    Cc: stable@kernel.org [2.6.32.x]
    Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
    9000f05c