1. 15 Dec, 2009 1 commit
    • David Miller's avatar
      sched: Fix cpu_clock() in NMIs, on !CONFIG_HAVE_UNSTABLE_SCHED_CLOCK · b9f8fcd5
      David Miller authored
      Relax stable-sched-clock architectures to not save/disable/restore
      hardirqs in cpu_clock().
      
      The background is that I was trying to resolve a sparc64 perf
      issue when I discovered this problem.
      
      On sparc64 I implement pseudo NMIs by simply running the kernel
      at IRQ level 14 when local_irq_disable() is called, this allows
      performance counter events to still come in at IRQ level 15.
      
      This doesn't work if any code in an NMI handler does
      local_irq_save() or local_irq_disable() since the "disable" will
      kick us back to cpu IRQ level 14 thus letting NMIs back in and
      we recurse.
      
      The only path which that does that in the perf event IRQ
      handling path is the code supporting frequency based events.  It
      uses cpu_clock().
      
      cpu_clock() simply invokes sched_clock() with IRQs disabled.
      
      And that's a fundamental bug all on it's own, particularly for
      the HAVE_UNSTABLE_SCHED_CLOCK case.  NMIs can thus get into the
      sched_clock() code interrupting the local IRQ disable code
      sections of it.
      
      Furthermore, for the not-HAVE_UNSTABLE_SCHED_CLOCK case, the IRQ
      disabling done by cpu_clock() is just pure overhead and
      completely unnecessary.
      
      So the core problem is that sched_clock() is not NMI safe, but
      we are invoking it from NMI contexts in the perf events code
      (via cpu_clock()).
      
      A less important issue is the overhead of IRQ disabling when it
      isn't necessary in cpu_clock().
      
      CONFIG_HAVE_UNSTABLE_SCHED_CLOCK architectures are not
      affected by this patch.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Acked-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      LKML-Reference: <20091213.182502.215092085.davem@davemloft.net>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      b9f8fcd5
  2. 30 Sep, 2009 1 commit
  3. 18 Sep, 2009 1 commit
    • Peter Zijlstra's avatar
      sched_clock: Make it NMI safe · def0a9b2
      Peter Zijlstra authored
      Arjan complained about the suckyness of TSC on modern machines, and
      asked if we could do something about that for PERF_SAMPLE_TIME.
      
      Make cpu_clock() NMI safe by removing the spinlock and using
      cmpxchg. This also makes it smaller and more robust.
      
      Affects architectures that use HAVE_UNSTABLE_SCHED_CLOCK, i.e. IA64
      and x86.
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <new-submission>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      def0a9b2
  4. 09 May, 2009 1 commit
  5. 26 Feb, 2009 3 commits
  6. 31 Dec, 2008 1 commit
    • Thomas Gleixner's avatar
      sched_clock: prevent scd->clock from moving backwards, take #2 · 1c5745aa
      Thomas Gleixner authored
      Redo:
      
        5b7dba4f: sched_clock: prevent scd->clock from moving backwards
      
      which had to be reverted due to s2ram hangs:
      
        ca7e716c: Revert "sched_clock: prevent scd->clock from moving backwards"
      
      ... this time with resume restoring GTOD later in the sequence
      taken into account as well.
      
      The "timekeeping_suspended" flag is not very nice but we cannot call into
      GTOD before it has been properly resumed and the scheduler will run very
      early in the resume sequence.
      
      Cc: <stable@kernel.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      1c5745aa
  7. 15 Dec, 2008 1 commit
  8. 10 Oct, 2008 1 commit
  9. 25 Aug, 2008 1 commit
    • Peter Zijlstra's avatar
      sched_clock: fix cpu_clock() · 354879bb
      Peter Zijlstra authored
      This patch fixes 3 issues:
      
      a) it removes the dependency on jiffies, because jiffies are incremented
         by a single CPU, and the tick is not synchronized between CPUs. Therefore
         relying on it to calculate a window to clip whacky TSC values doesn't work
         as it can drift around.
      
         So instead use [GTOD, GTOD+TICK_NSEC) as the window.
      
      b) __update_sched_clock() did (roughly speaking):
      
         delta = sched_clock() - scd->tick_raw;
         clock += delta;
      
         Which gives exponential growth, instead of linear.
      
      c) allows the sched_clock_cpu() value to warp the u64 without breaking.
      
      the results are more reliable sched_clock() deltas:
      
                 before       after   sched_clock
      
      cpu_clock: 15750        51312   51488
      cpu_clock: 59719        51052   50947
      cpu_clock: 15879        51249   51061
      cpu_clock: 1            50933   51198
      cpu_clock: 1            50931   51039
      cpu_clock: 1            51093   50981
      cpu_clock: 1            51043   51040
      cpu_clock: 1            50959   50938
      cpu_clock: 1            50981   51011
      cpu_clock: 1            51364   51212
      cpu_clock: 1            51219   51273
      cpu_clock: 1            51389   51048
      cpu_clock: 1            51285   51611
      cpu_clock: 1            50964   51137
      cpu_clock: 1            50973   50968
      cpu_clock: 1            50967   50972
      cpu_clock: 1            58910   58485
      cpu_clock: 1            51082   51025
      cpu_clock: 1            50957   50958
      cpu_clock: 1            50958   50957
      cpu_clock: 1006128      51128   50971
      cpu_clock: 1            51107   51155
      cpu_clock: 1            51371   51081
      cpu_clock: 1            51104   51365
      cpu_clock: 1            51363   51309
      cpu_clock: 1            51107   51160
      cpu_clock: 1            51139   51100
      cpu_clock: 1            51216   51136
      cpu_clock: 1            51207   51215
      cpu_clock: 1            51087   51263
      cpu_clock: 1            51249   51177
      cpu_clock: 1            51519   51412
      cpu_clock: 1            51416   51255
      cpu_clock: 1            51591   51594
      cpu_clock: 1            50966   51374
      cpu_clock: 1            50966   50966
      cpu_clock: 1            51291   50948
      cpu_clock: 1            50973   50867
      cpu_clock: 1            50970   50970
      cpu_clock: 998306       50970   50971
      cpu_clock: 1            50971   50970
      cpu_clock: 1            50970   50970
      cpu_clock: 1            50971   50971
      cpu_clock: 1            50970   50970
      cpu_clock: 1            51351   50970
      cpu_clock: 1            50970   51352
      cpu_clock: 1            50971   50970
      cpu_clock: 1            50970   50970
      cpu_clock: 1            51321   50971
      cpu_clock: 1            50974   51324
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      354879bb
  10. 11 Aug, 2008 1 commit
  11. 31 Jul, 2008 5 commits
  12. 28 Jul, 2008 1 commit
  13. 11 Jul, 2008 7 commits
    • Steven Rostedt's avatar
      sched_clock: and multiplier for TSC to gtod drift · c300ba25
      Steven Rostedt authored
      The sched_clock code currently tries to keep all CPU clocks of all CPUS
      somewhat in sync. At every clock tick it records the gtod clock and
      uses that and jiffies and the TSC to calculate a CPU clock that tries to
      stay in sync with all the other CPUs.
      
      ftrace depends heavily on this timer and it detects when this timer
      "jumps".  One problem is that the TSC and the gtod also drift.
      When the TSC is 0.1% faster or slower than the gtod it is very noticeable
      in ftrace. To help compensate for this, I've added a multiplier that
      tries to keep the CPU clock updating at the same rate as the gtod.
      
      I've tried various ways to get it to be in sync and this ended up being
      the most reliable. At every scheduler tick we calculate the new multiplier:
      
        multi = delta_gtod / delta_TSC
      
      This means we perform a 64 bit divide at the tick (once a HZ). A shift
      is used to handle the accuracy.
      
      Other methods that failed due to dynamic HZ are:
      
      (not used)  multi += (gtod - tsc) / delta_gtod
      (not used)  multi += (gtod - (last_tsc + delta_tsc)) / delta_gtod
      
      as well as other variants.
      
      This code still allows for a slight drift between TSC and gtod, but
      it keeps the damage down to a minimum.
      Signed-off-by: default avatarSteven Rostedt <srostedt@redhat.com>
      Cc: Steven Rostedt <srostedt@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: john stultz <johnstul@us.ibm.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      c300ba25
    • Steven Rostedt's avatar
      sched_clock: record TSC after gtod · a83bc47c
      Steven Rostedt authored
      To read the gtod we need to grab the xtime lock for read. Reading the gtod
      before the TSC can cause a bigger gab if the xtime lock is contended.
      
      This patch simply reverses the order to read the TSC after the gtod.
      The locking in the reading of the gtod handles any barriers one might
      think is needed.
      Signed-off-by: default avatarSteven Rostedt <srostedt@redhat.com>
      Cc: Steven Rostedt <srostedt@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: john stultz <johnstul@us.ibm.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      a83bc47c
    • Steven Rostedt's avatar
      sched_clock: only update deltas with local reads. · c0c87734
      Steven Rostedt authored
      Reading the CPU clock should try to stay accurate within the CPU.
      By reading the CPU clock from another CPU and updating the deltas can
      cause unneeded jumps when reading from the local CPU.
      
      This patch changes the code to update the last read TSC only when read
      from the local CPU.
      Signed-off-by: default avatarSteven Rostedt <srostedt@redhat.com>
      Cc: Steven Rostedt <srostedt@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: john stultz <johnstul@us.ibm.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      c0c87734
    • Steven Rostedt's avatar
      sched_clock: fix calculation of other CPU · 2b8a0cf4
      Steven Rostedt authored
      The algorithm to calculate the 'now' of another CPU is not correct.
      At each scheduler tick, each CPU records the last sched_clock and
      gtod (tick_raw and tick_gtod respectively). If the TSC is somewhat the
      same in speed between two clocks the algorithm would be:
      
        tick_gtod1 + (now1 - tick_raw1) = tick_gtod2 + (now2 - tick_raw2)
      
      To calculate now2 we would have:
      
        now2 = (tick_gtod1 - tick_gtod2) + (tick_raw2 - tick_raw1) + now1
      
      Currently the algorithm is:
      
        now2 = (tick_gtod1 - tick_gtod2) + (tick_raw1 - tick_raw2) + now1
      
      This solves most of the rest of the issues I've had with timestamps in
      ftace.
      Signed-off-by: default avatarSteven Rostedt <srostedt@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: john stultz <johnstul@us.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      2b8a0cf4
    • Steven Rostedt's avatar
      sched_clock: stop maximum check on NO HZ · af52a90a
      Steven Rostedt authored
      Working with ftrace I would get large jumps of 11 millisecs or more with
      the clock tracer. This killed the latencing timings of ftrace and also
      caused the irqoff self tests to fail.
      
      What was happening is with NO_HZ the idle would stop the jiffy counter and
      before the jiffy counter was updated the sched_clock would have a bad
      delta jiffies to compare with the gtod with the maximum.
      
      The jiffies would stop and the last sched_tick would record the last gtod.
      On wakeup, the sched clock update would compare the gtod + delta jiffies
      (which would be zero) and compare it to the TSC. The TSC would have
      correctly (with a stable TSC) moved forward several jiffies. But because the
      jiffies has not been updated yet the clock would be prevented from moving
      forward because it would appear that the TSC jumped too far ahead.
      
      The clock would then virtually stop, until the jiffies are updated. Then
      the next sched clock update would see that the clock was very much behind
      since the delta jiffies is now correct. This would then jump the clock
      forward by several jiffies.
      
      This caused ftrace to report several milliseconds of interrupts off
      latency at every resume from NO_HZ idle.
      
      This patch adds hooks into the nohz code to disable the checking of the
      maximum clock update when nohz is in effect. It resumes the max check
      when nohz has updated the jiffies again.
      Signed-off-by: default avatarSteven Rostedt <srostedt@redhat.com>
      Cc: Steven Rostedt <srostedt@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      af52a90a
    • Steven Rostedt's avatar
      sched_clock: widen the max and min time · f7cce27f
      Steven Rostedt authored
      With keeping the max and min sched time within one jiffy of the gtod clock
      was too tight. Just before a schedule tick the max could easily be hit, as
      well as just after a schedule_tick the min could be hit. This caused the
      clock to jump around by a jiffy.
      
      This patch widens the minimum to
         last gtod + (delta_jiffies ? delta_jiffies - 1 : 0) * TICK_NSECS
      
      and the maximum to
          last gtod + (2 + delta_jiffies) * TICK_NSECS
      
      This keeps the minum to gtod or if one jiffy less than delta jiffies
      and the maxim 2 jiffies ahead of gtod. This may cause unstable TSCs to be
      a bit more sporadic, but it helps keep a clock with a stable TSC working well.
      Signed-off-by: default avatarSteven Rostedt <srostedt@redhat.com>
      Cc: Steven Rostedt <srostedt@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      f7cce27f
    • Steven Rostedt's avatar
      sched_clock: record from last tick · 62c43dd9
      Steven Rostedt authored
      The sched_clock code tries to keep within the gtod time by one tick (jiffy).
      The current code mistakenly keeps track of the delta jiffies between
      updates of the clock, where the the delta is used to compare with the
      number of jiffies that have past since an update of the gtod. The gtod is
      updated at each schedule tick not each sched_clock update. After one
      jiffy passes the clock is updated fine. But the delta is taken from the
      last update so if the next update happens before the next tick the delta
      jiffies used will be incorrect.
      
      This patch changes the code to check the delta of jiffies between ticks
      and not updates to match the comparison of the updates with the gtod.
      Signed-off-by: default avatarSteven Rostedt <srostedt@redhat.com>
      Cc: Steven Rostedt <srostedt@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      62c43dd9
  14. 29 Jun, 2008 1 commit
  15. 27 Jun, 2008 2 commits
  16. 29 May, 2008 1 commit
  17. 05 May, 2008 1 commit
    • Peter Zijlstra's avatar
      sched: add optional support for CONFIG_HAVE_UNSTABLE_SCHED_CLOCK · 3e51f33f
      Peter Zijlstra authored
      this replaces the rq->clock stuff (and possibly cpu_clock()).
      
       - architectures that have an 'imperfect' hardware clock can set
         CONFIG_HAVE_UNSTABLE_SCHED_CLOCK
      
       - the 'jiffie' window might be superfulous when we update tick_gtod
         before the __update_sched_clock() call in sched_clock_tick()
      
       - cpu_clock() might be implemented as:
      
           sched_clock_cpu(smp_processor_id())
      
         if the accuracy proves good enough - how far can TSC drift in a
         single jiffie when considering the filtering and idle hooks?
      
      [ mingo@elte.hu: various fixes and cleanups ]
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      3e51f33f