1. 02 Mar, 2017 4 commits
  2. 27 Feb, 2017 1 commit
    • Rafael J. Wysocki's avatar
      cpuidle: menu: Avoid taking spinlock for accessing QoS values · 6dbf5cea
      Rafael J. Wysocki authored
      After commit 9908859a (cpuidle/menu: add per CPU PM QoS resume
      latency consideration) the cpuidle menu governor calls
      dev_pm_qos_read_value() on CPU devices to read the current resume
      latency QoS constraint values for them.  That function takes a spinlock
      to prevent the device's power.qos pointer from becoming NULL during
      the access which is a problem for the RT patchset where spinlocks are
      converted into mutexes and the idle loop stops working.
      
      However, it is not even necessary for the menu governor to take
      that spinlock, because the power.qos pointer accessed under it
      cannot be modified during the access anyway.
      
      For this reason, introduce a "raw" routine for accessing device
      QoS resume latency constraints without locking and use it in the
      menu governor.
      
      Fixes: 9908859a (cpuidle/menu: add per CPU PM QoS resume latency consideration)
      Acked-by: default avatarAlex Shi <alex.shi@linaro.org>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      6dbf5cea
  3. 30 Jan, 2017 4 commits
  4. 24 Dec, 2016 1 commit
  5. 06 Dec, 2016 2 commits
  6. 29 Nov, 2016 1 commit
  7. 24 Nov, 2016 1 commit
  8. 23 Nov, 2016 1 commit
    • Sudeep Holla's avatar
      cpuidle: dt: assign ->enter_freeze to same as ->enter callback function · a94e502c
      Sudeep Holla authored
      enter_freeze() callback is expected atleast to do the same as enter()
      but it has to guarantee that interrupts aren't enabled at any point
      in its execution, as the tick is frozen.
      
      CPUs execute ->enter_freeze with the local tick or entire timekeeping
      suspended, so it must not re-enable interrupts at any point (even
      temporarily) or attempt to change states of clock event devices.
      
      It will be called when the system goes to suspend-to-idle and will
      reduce power usage because CPUs won't be awaken for unnecessary IRQs
      (i.e. woken up only on IRQs from "wakeup sources")
      
      We can reuse the same code for both the enter() and enter_freeze()
      callbacks as along as they don't re-enable interrupts. Only "coupled"
      cpuidle mechanism enables interrupts and doing that with timekeeping
      suspended is generally not safe.
      
      Since this generic DT based idle driver doesn't support "coupled"
      states, it is safe to assume that the interrupts are not re-enabled.
      
      This patch assign enter_freeze to same as enter callback function which
      helps to save power without any intermittent spurious wakeups from
      suspend-to-idle.
      Signed-off-by: default avatarSudeep Holla <sudeep.holla@arm.com>
      Tested-by: default avatarAndy Gross <andy.gross@linaro.org>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      a94e502c
  9. 21 Oct, 2016 1 commit
  10. 08 Oct, 2016 1 commit
  11. 04 Oct, 2016 1 commit
  12. 06 Sep, 2016 3 commits
  13. 12 Aug, 2016 1 commit
  14. 21 Jul, 2016 1 commit
  15. 15 Jul, 2016 3 commits
  16. 04 Jul, 2016 1 commit
    • Shreyas B. Prabhu's avatar
      cpuidle: Fix last_residency division · dbd1b8ea
      Shreyas B. Prabhu authored
      Snooze is a poll idle state in powernv and pseries platforms. Snooze
      has a timeout so that if a CPU stays in snooze for more than target
      residency of the next available idle state, then it would exit
      thereby giving chance to the cpuidle governor to re-evaluate and
      promote the CPU to a deeper idle state. Therefore whenever snooze
      exits due to this timeout, its last_residency will be target_residency
      of the next deeper state.
      
      Commit e93e59ce "cpuidle: Replace ktime_get() with local_clock()"
      changed the math around last_residency calculation. Specifically,
      while converting last_residency value from nano- to microseconds, it
      carries out right shift by 10. Because of that, in snooze timeout
      exit scenarios last_residency calculated is roughly 2.3% less than
      target_residency of the next available state. This pattern is picked
      up by get_typical_interval() in the menu governor and therefore
      expected_interval in menu_select() is frequently less than the
      target_residency of any state other than snooze.
      
      Due to this we are entering snooze at a higher rate, thereby
      affecting the single thread performance.
      
      Fix this by using more precise division via ktime_us_delta().
      
      Fixes: e93e59ce "cpuidle: Replace ktime_get() with local_clock()"
      Reported-by: default avatarAnton Blanchard <anton@samba.org>
      Bisected-by: default avatarShilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
      Signed-off-by: default avatarShreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
      Acked-by: default avatarDaniel Lezcano <daniel.lezcano@linaro.org>
      Acked-by: default avatarBalbir Singh <bsingharora@gmail.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      dbd1b8ea
  17. 18 May, 2016 1 commit
    • Daniel Lezcano's avatar
      cpuidle: Fix cpuidle_state_is_coupled() argument in cpuidle_enter() · e7387da5
      Daniel Lezcano authored
      Commit 0b89e9aa (cpuidle: delay enabling interrupts until all
      coupled CPUs leave idle) rightfully fixed a regression by letting
      the coupled idle state framework to handle local interrupt enabling
      when the CPU is exiting an idle state.
      
      The current code checks if the idle state is coupled and, if so, it
      will let the coupled code to enable interrupts. This way, it can
      decrement the ready-count before handling the interrupt. This
      mechanism prevents the other CPUs from waiting for a CPU which is
      handling interrupts.
      
      But the check is done against the state index returned by the back
      end driver's ->enter functions which could be different from the
      initial index passed as parameter to the cpuidle_enter_state()
      function.
      
       entered_state = target_state->enter(dev, drv, index);
      
       [ ... ]
      
       if (!cpuidle_state_is_coupled(drv, entered_state))
      	local_irq_enable();
      
       [ ... ]
      
      If the 'index' is referring to a coupled idle state but the
      'entered_state' is *not* coupled, then the interrupts are enabled
      again. All CPUs blocked on the sync barrier may busy loop longer
      if the CPU has interrupts to handle before decrementing the
      ready-count. That's consuming more energy than saving.
      
      Fixes: 0b89e9aa (cpuidle: delay enabling interrupts until all coupled CPUs leave idle)
      Signed-off-by: default avatarDaniel Lezcano <daniel.lezcano@linaro.org>
      Cc: 3.15+ <stable@vger.kernel.org> # 3.15+
      [ rjw: Subject & changelog ]
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      e7387da5
  18. 28 Apr, 2016 1 commit
  19. 26 Apr, 2016 1 commit
    • Daniel Lezcano's avatar
      cpuidle: Replace ktime_get() with local_clock() · e93e59ce
      Daniel Lezcano authored
      The ktime_get() can have a non negligeable overhead, use local_clock()
      instead.
      
      In order to test the difference between ktime_get() and local_clock(),
      a quick hack has been added to trigger, via debugfs, 10000 times a
      call to ktime_get() and local_clock() and measure the elapsed time.
      
      Then the average value, the min and max is computed for each call.
      
      From userspace, the test above was called 100 times every 2 seconds.
      
      So, ktime_get() and local_clock() have been called 1000000 times in
      total.
      
      The results are:
      
      ktime_get():
      ============
       * average: 101 ns (stddev: 27.4)
       * maximum: 38313 ns
       * minimum: 65 ns
      
      local_clock():
      ==============
       * average: 60 ns (stddev: 9.8)
       * maximum: 13487 ns
       * minimum: 46 ns
      
      The local_clock() is faster and more stable.
      
      Even if it is a drop in the ocean, changing the ktime_get() by the
      local_clock() allows to save 80ns at idle time (entry + exit). And
      in some circumstances, especially when there are several CPUs racing
      for the clock access, we save tens of microseconds.
      
      The idle duration resulting from a diff is converted from nanosec to
      microsec. This could be done with integer division (div 1000) - which is
      an expensive operation or by 10 bits shifting (div 1024) - which is fast
      but unprecise.
      
      The following table gives some results at the limits.
      
       ------------------------------------------
      |   nsec   |   div(1000)   |   div(1024)   |
       ------------------------------------------
      |   1e3    |        1 usec |      976 nsec |
       ------------------------------------------
      |   1e6    |     1000 usec |      976 usec |
       ------------------------------------------
      |   1e9    |  1000000 usec |   976562 usec |
       ------------------------------------------
      
      There is a linear deviation of 2.34%. This loss of precision is acceptable
      in the context of the resulting diff which is used for statistics. These
      ones are processed to guess estimate an approximation of the duration of the
      next idle period which ends up into an idle state selection. The selection
      criteria takes into account the next duration based on large intervals,
      represented by the idle state's target residency.
      
      The 2^10 division is enough because the approximation regarding the 1e3
      division is lost in all the approximations done for the next idle duration
      computation.
      Signed-off-by: default avatarDaniel Lezcano <daniel.lezcano@linaro.org>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      [ rjw: Subject ]
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      e93e59ce
  20. 09 Apr, 2016 1 commit
  21. 21 Mar, 2016 1 commit
    • Rafael J. Wysocki's avatar
      cpuidle: menu: Fall back to polling if next timer event is near · 0c313cb2
      Rafael J. Wysocki authored
      Commit a9ceb78b (cpuidle,menu: use interactivity_req to disable
      polling) changed the behavior of the fallback state selection part
      of menu_select() so it looks at interactivity_req instead of
      data->next_timer_us when it makes its decision.  That effectively
      caused polling to be used more often as fallback idle which led to
      significant increases of energy consumption in some cases.
      
      Commit e132b9b3 (cpuidle: menu: use high confidence factors
      only when considering polling) changed that logic again to be more
      predictable, but that didn't help with the increased energy
      consumption problem.
      
      For this reason, go back to making decisions on which state to fall
      back to based on data->next_timer_us which is the time we know for
      sure something will happen rather than a prediction (which may be
      inaccurate and turns out to be so often enough to be problematic).
      However, take the target residency of the first proper idle state
      (C1) into account, so that state is not used as the fallback one
      if its target residency is greater than data->next_timer_us.
      
      Fixes: a9ceb78b (cpuidle,menu: use interactivity_req to disable polling)
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Reported-and-tested-by: default avatarDoug Smythies <dsmythies@telus.net>
      0c313cb2
  22. 17 Mar, 2016 1 commit
    • Rik van Riel's avatar
      cpuidle: menu: use high confidence factors only when considering polling · e132b9b3
      Rik van Riel authored
      The menu governor uses five different factors to pick the
      idle state:
       - the user configured latency_req
       - the time until the next timer (next_timer_us)
       - the typical sleep interval, as measured recently
       - an estimate of sleep time by dividing next_timer_us by an observed factor
       - a load corrected version of the above, divided again by load
      
      Only the first three items are known with enough confidence that
      we can use them to consider polling, instead of an actual CPU
      idle state, because the cost of being wrong about polling can be
      excessive power use.
      
      The latter two are used in the menu governor's main selection
      loop, and can result in choosing a shallower idle state when
      the system is expected to be busy again soon.
      
      This pushes a busy system in the "performance" direction of
      the performance<>power tradeoff, when choosing between idle
      states, but stays more strictly on the "power" state when
      deciding between polling and C1.
      Signed-off-by: default avatarRik van Riel <riel@redhat.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      e132b9b3
  23. 16 Feb, 2016 2 commits
  24. 27 Jan, 2016 1 commit
  25. 22 Jan, 2016 1 commit
  26. 19 Jan, 2016 2 commits
  27. 15 Jan, 2016 1 commit