1. 11 Feb, 2018 1 commit
    • Linus Torvalds's avatar
      vfs: do bulk POLL* -> EPOLL* replacement · a9a08845
      Linus Torvalds authored
      This is the mindless scripted replacement of kernel use of POLL*
      variables as described by Al, done by this script:
      
          for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do
              L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'`
              for f in $L; do sed -i "-es/^\([^\"]*\)\(\<POLL$V\>\)/\\1E\\2/" $f; done
          done
      
      with de-mangling cleanups yet to come.
      
      NOTE! On almost all architectures, the EPOLL* constants have the same
      values as the POLL* constants do.  But they keyword here is "almost".
      For various bad reasons they aren't the same, and epoll() doesn't
      actually work quite correctly in some cases due to this on Sparc et al.
      
      The next patch from Al will sort out the final differences, and we
      should be all done.
      Scripted-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a9a08845
  2. 07 Feb, 2018 1 commit
  3. 27 Jan, 2018 1 commit
    • Thomas Gleixner's avatar
      hrtimer: Reset hrtimer cpu base proper on CPU hotplug · d5421ea4
      Thomas Gleixner authored
      The hrtimer interrupt code contains a hang detection and mitigation
      mechanism, which prevents that a long delayed hrtimer interrupt causes a
      continous retriggering of interrupts which prevent the system from making
      progress. If a hang is detected then the timer hardware is programmed with
      a certain delay into the future and a flag is set in the hrtimer cpu base
      which prevents newly enqueued timers from reprogramming the timer hardware
      prior to the chosen delay. The subsequent hrtimer interrupt after the delay
      clears the flag and resumes normal operation.
      
      If such a hang happens in the last hrtimer interrupt before a CPU is
      unplugged then the hang_detected flag is set and stays that way when the
      CPU is plugged in again. At that point the timer hardware is not armed and
      it cannot be armed because the hang_detected flag is still active, so
      nothing clears that flag. As a consequence the CPU does not receive hrtimer
      interrupts and no timers expire on that CPU which results in RCU stalls and
      other malfunctions.
      
      Clear the flag along with some other less critical members of the hrtimer
      cpu base to ensure starting from a clean state when a CPU is plugged in.
      
      Thanks to Paul, Sebastian and Anna-Maria for their help to get down to the
      root cause of that hard to reproduce heisenbug. Once understood it's
      trivial and certainly justifies a brown paperbag.
      
      Fixes: 41d2e494 ("hrtimer: Tune hrtimer_interrupt hang logic")
      Reported-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sebastian Sewior <bigeasy@linutronix.de>
      Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801261447590.2067@nanos
      d5421ea4
  4. 23 Jan, 2018 1 commit
  5. 16 Jan, 2018 25 commits
    • Anna-Maria Gleixner's avatar
      hrtimer: Implement SOFT/HARD clock base selection · 42f42da4
      Anna-Maria Gleixner authored
      All prerequisites to handle hrtimers for expiry in either hard or soft
      interrupt context are in place.
      
      Add the missing bit in hrtimer_init() which associates the timer to the
      hard or the softirq clock base.
      Signed-off-by: default avatarAnna-Maria Gleixner <anna-maria@linutronix.de>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: keescook@chromium.org
      Link: http://lkml.kernel.org/r/20171221104205.7269-30-anna-maria@linutronix.deSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      42f42da4
    • Anna-Maria Gleixner's avatar
      hrtimer: Implement support for softirq based hrtimers · 5da70160
      Anna-Maria Gleixner authored
      hrtimer callbacks are always invoked in hard interrupt context. Several
      users in tree require soft interrupt context for their callbacks and
      achieve this by combining a hrtimer with a tasklet. The hrtimer schedules
      the tasklet in hard interrupt context and the tasklet callback gets invoked
      in softirq context later.
      
      That's suboptimal and aside of that the real-time patch moves most of the
      hrtimers into softirq context. So adding native support for hrtimers
      expiring in softirq context is a valuable extension for both mainline and
      the RT patch set.
      
      Each valid hrtimer clock id has two associated hrtimer clock bases: one for
      timers expiring in hardirq context and one for timers expiring in softirq
      context.
      
      Implement the functionality to associate a hrtimer with the hard or softirq
      related clock bases and update the relevant functions to take them into
      account when the next expiry time needs to be evaluated.
      
      Add a check into the hard interrupt context handler functions to check
      whether the first expiring softirq based timer has expired. If it's expired
      the softirq is raised and the accounting of softirq based timers to
      evaluate the next expiry time for programming the timer hardware is skipped
      until the softirq processing has finished. At the end of the softirq
      processing the regular processing is resumed.
      Suggested-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Suggested-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarAnna-Maria Gleixner <anna-maria@linutronix.de>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: keescook@chromium.org
      Link: http://lkml.kernel.org/r/20171221104205.7269-29-anna-maria@linutronix.deSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      5da70160
    • Anna-Maria Gleixner's avatar
      hrtimer: Prepare handling of hard and softirq based hrtimers · c458b1d1
      Anna-Maria Gleixner authored
      The softirq based hrtimer can utilize most of the existing hrtimers
      functions, but need to operate on a different data set.
      
      Add an 'active_mask' parameter to various functions so the hard and soft bases
      can be selected. Fixup the existing callers and hand in the ACTIVE_HARD
      mask.
      Signed-off-by: default avatarAnna-Maria Gleixner <anna-maria@linutronix.de>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: keescook@chromium.org
      Link: http://lkml.kernel.org/r/20171221104205.7269-28-anna-maria@linutronix.deSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      c458b1d1
    • Anna-Maria Gleixner's avatar
      hrtimer: Add clock bases and hrtimer mode for softirq context · 98ecadd4
      Anna-Maria Gleixner authored
      Currently hrtimer callback functions are always executed in hard interrupt
      context. Users of hrtimers, which need their timer function to be executed
      in soft interrupt context, make use of tasklets to get the proper context.
      
      Add additional hrtimer clock bases for timers which must expire in softirq
      context, so the detour via the tasklet can be avoided. This is also
      required for RT, where the majority of hrtimer is moved into softirq
      hrtimer context.
      
      The selection of the expiry mode happens via a mode bit. Introduce
      HRTIMER_MODE_SOFT and the matching combinations with the ABS/REL/PINNED
      bits and update the decoding of hrtimer_mode in tracepoints.
      Signed-off-by: default avatarAnna-Maria Gleixner <anna-maria@linutronix.de>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: keescook@chromium.org
      Link: http://lkml.kernel.org/r/20171221104205.7269-27-anna-maria@linutronix.deSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      98ecadd4
    • Anna-Maria Gleixner's avatar
      hrtimer: Use irqsave/irqrestore around __run_hrtimer() · dd934aa8
      Anna-Maria Gleixner authored
      __run_hrtimer() is called with the hrtimer_cpu_base.lock held and
      interrupts disabled. Before invoking the timer callback the base lock is
      dropped, but interrupts stay disabled.
      
      The upcoming support for softirq based hrtimers requires that interrupts
      are enabled before the timer callback is invoked.
      
      To avoid code duplication, take hrtimer_cpu_base.lock with
      raw_spin_lock_irqsave(flags) at the call site and hand in the flags as
      a parameter. So raw_spin_unlock_irqrestore() before the callback invocation
      will either keep interrupts disabled in interrupt context or restore to
      interrupt enabled state when called from softirq context.
      Suggested-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarAnna-Maria Gleixner <anna-maria@linutronix.de>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: keescook@chromium.org
      Link: http://lkml.kernel.org/r/20171221104205.7269-26-anna-maria@linutronix.deSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      dd934aa8
    • Anna-Maria Gleixner's avatar
      hrtimer: Factor out __hrtimer_next_event_base() · ad38f596
      Anna-Maria Gleixner authored
      Preparatory patch for softirq based hrtimers to avoid code duplication.
      
      No functional change.
      Signed-off-by: default avatarAnna-Maria Gleixner <anna-maria@linutronix.de>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: keescook@chromium.org
      Link: http://lkml.kernel.org/r/20171221104205.7269-25-anna-maria@linutronix.deSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      ad38f596
    • Anna-Maria Gleixner's avatar
      hrtimer: Factor out __hrtimer_start_range_ns() · 138a6b7a
      Anna-Maria Gleixner authored
      Preparatory patch for softirq based hrtimers to avoid code duplication,
      factor out the __hrtimer_start_range_ns() function from hrtimer_start_range_ns().
      
      No functional change.
      Signed-off-by: default avatarAnna-Maria Gleixner <anna-maria@linutronix.de>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: keescook@chromium.org
      Link: http://lkml.kernel.org/r/20171221104205.7269-24-anna-maria@linutronix.deSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      138a6b7a
    • Anna-Maria Gleixner's avatar
      hrtimer: Remove the 'base' parameter from hrtimer_reprogram() · 3ec7a3ee
      Anna-Maria Gleixner authored
      hrtimer_reprogram() must have access to the hrtimer_clock_base of the new
      first expiring timer to access hrtimer_clock_base.offset for adjusting the
      expiry time to CLOCK_MONOTONIC. This is required to evaluate whether the
      new left most timer in the hrtimer_clock_base is the first expiring timer
      of all clock bases in a hrtimer_cpu_base.
      
      The only user of hrtimer_reprogram() is hrtimer_start_range_ns(), which has
      a pointer to hrtimer_clock_base() already and hands it in as a parameter. But
      hrtimer_start_range_ns() will be split for the upcoming support for softirq
      based hrtimers to avoid code duplication and will lose the direct access to
      the clock base pointer.
      
      Instead of handing in timer and timer->base as a parameter remove the base
      parameter from hrtimer_reprogram() instead and retrieve the clock base internally.
      Signed-off-by: default avatarAnna-Maria Gleixner <anna-maria@linutronix.de>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: keescook@chromium.org
      Link: http://lkml.kernel.org/r/20171221104205.7269-23-anna-maria@linutronix.deSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      3ec7a3ee
    • Anna-Maria Gleixner's avatar
      hrtimer: Make remote enqueue decision less restrictive · 2ac2dccc
      Anna-Maria Gleixner authored
      The current decision whether a timer can be queued on a remote CPU checks
      for timer->expiry <= remote_cpu_base.expires_next.
      
      This is too restrictive because a timer with the same expiry time as an
      existing timer will be enqueued on right-hand size of the existing timer
      inside the rbtree, i.e. behind the first expiring timer.
      
      So its safe to allow enqueuing timers with the same expiry time as the
      first expiring timer on a remote CPU base.
      Signed-off-by: default avatarAnna-Maria Gleixner <anna-maria@linutronix.de>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: keescook@chromium.org
      Link: http://lkml.kernel.org/r/20171221104205.7269-22-anna-maria@linutronix.deSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      2ac2dccc
    • Anna-Maria Gleixner's avatar
      hrtimer: Unify remote enqueue handling · 14c80341
      Anna-Maria Gleixner authored
      hrtimer_reprogram() is conditionally invoked from hrtimer_start_range_ns()
      when hrtimer_cpu_base.hres_active is true.
      
      In the !hres_active case there is a special condition for the nohz_active
      case:
      
        If the newly enqueued timer expires before the first expiring timer on a
        remote CPU then the remote CPU needs to be notified and woken up from a
        NOHZ idle sleep to take the new first expiring timer into account.
      
      Previous changes have already established the prerequisites to make the
      remote enqueue behaviour the same whether high resolution mode is active or
      not:
      
        If the to be enqueued timer expires before the first expiring timer on a
        remote CPU, then it cannot be enqueued there.
      
      This was done for the high resolution mode because there is no way to
      access the remote CPU timer hardware. The same is true for NOHZ, but was
      handled differently by unconditionally enqueuing the timer and waking up
      the remote CPU so it can reprogram its timer. Again there is no compelling
      reason for this difference.
      
      hrtimer_check_target(), which makes the 'can remote enqueue' decision is
      already unconditional, but not yet functional because nothing updates
      hrtimer_cpu_base.expires_next in the !hres_active case.
      
      To unify this the following changes are required:
      
       1) Make the store of the new first expiry time unconditonal in
          hrtimer_reprogram() and check __hrtimer_hres_active() before proceeding
          to the actual hardware access. This check also lets the compiler
          eliminate the rest of the function in case of CONFIG_HIGH_RES_TIMERS=n.
      
       2) Invoke hrtimer_reprogram() unconditionally from
          hrtimer_start_range_ns()
      
       3) Remove the remote wakeup special case for the !high_res && nohz_active
          case.
      
      Confine the timers_nohz_active static key to timer.c which is the only user
      now.
      Signed-off-by: default avatarAnna-Maria Gleixner <anna-maria@linutronix.de>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: keescook@chromium.org
      Link: http://lkml.kernel.org/r/20171221104205.7269-21-anna-maria@linutronix.deSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      14c80341
    • Anna-Maria Gleixner's avatar
      hrtimer: Unify hrtimer removal handling · 61bb4bcb
      Anna-Maria Gleixner authored
      When the first hrtimer on the current CPU is removed,
      hrtimer_force_reprogram() is invoked but only when
      CONFIG_HIGH_RES_TIMERS=y and hrtimer_cpu_base.hres_active is set.
      
      hrtimer_force_reprogram() updates hrtimer_cpu_base.expires_next and
      reprograms the clock event device. When CONFIG_HIGH_RES_TIMERS=y and
      hrtimer_cpu_base.hres_active is set, a pointless hrtimer interrupt can be
      prevented.
      
      hrtimer_check_target() makes the 'can remote enqueue' decision. As soon as
      hrtimer_check_target() is unconditionally available and
      hrtimer_cpu_base.expires_next is updated by hrtimer_reprogram(),
      hrtimer_force_reprogram() needs to be available unconditionally as well to
      prevent the following scenario with CONFIG_HIGH_RES_TIMERS=n:
      
      - the first hrtimer on this CPU is removed and hrtimer_force_reprogram() is
        not executed
      
      - CPU goes idle (next timer is calculated and hrtimers are taken into
        account)
      
      - a hrtimer is enqueued remote on the idle CPU: hrtimer_check_target()
        compares expiry value and hrtimer_cpu_base.expires_next. The expiry value
        is after expires_next, so the hrtimer is enqueued. This timer will fire
        late, if it expires before the effective first hrtimer on this CPU and
        the comparison was with an outdated expires_next value.
      
      To prevent this scenario, make hrtimer_force_reprogram() unconditional
      except the effective reprogramming part, which gets eliminated by the
      compiler in the CONFIG_HIGH_RES_TIMERS=n case.
      Signed-off-by: default avatarAnna-Maria Gleixner <anna-maria@linutronix.de>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: keescook@chromium.org
      Link: http://lkml.kernel.org/r/20171221104205.7269-20-anna-maria@linutronix.deSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      61bb4bcb
    • Anna-Maria Gleixner's avatar
      hrtimer: Make hrtimer_force_reprogramm() unconditionally available · ebba2c72
      Anna-Maria Gleixner authored
      hrtimer_force_reprogram() needs to be available unconditionally for softirq
      based hrtimers. Move the function and all required struct members out of
      the CONFIG_HIGH_RES_TIMERS #ifdef.
      
      There is no functional change because hrtimer_force_reprogram() is only
      invoked when hrtimer_cpu_base.hres_active is true and
      CONFIG_HIGH_RES_TIMERS=y.
      
      Making it unconditional increases the text size for the
      CONFIG_HIGH_RES_TIMERS=n case slightly, but avoids replication of that code
      for the upcoming softirq based hrtimers support. Most of the code gets
      eliminated in the CONFIG_HIGH_RES_TIMERS=n case by the compiler.
      Signed-off-by: default avatarAnna-Maria Gleixner <anna-maria@linutronix.de>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: keescook@chromium.org
      Link: http://lkml.kernel.org/r/20171221104205.7269-19-anna-maria@linutronix.de
      [ Made it build on !CONFIG_HIGH_RES_TIMERS ]
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      ebba2c72
    • Anna-Maria Gleixner's avatar
      hrtimer: Make hrtimer_reprogramm() unconditional · 11a9fe06
      Anna-Maria Gleixner authored
      hrtimer_reprogram() needs to be available unconditionally for softirq based
      hrtimers. Move the function and all required struct members out of the
      CONFIG_HIGH_RES_TIMERS #ifdef.
      
      There is no functional change because hrtimer_reprogram() is only invoked
      when hrtimer_cpu_base.hres_active is true. Making it unconditional
      increases the text size for the CONFIG_HIGH_RES_TIMERS=n case, but avoids
      replication of that code for the upcoming softirq based hrtimers support.
      Signed-off-by: default avatarAnna-Maria Gleixner <anna-maria@linutronix.de>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: keescook@chromium.org
      Link: http://lkml.kernel.org/r/20171221104205.7269-18-anna-maria@linutronix.deSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      11a9fe06
    • Anna-Maria Gleixner's avatar
      hrtimer: Make hrtimer_cpu_base.next_timer handling unconditional · eb27926b
      Anna-Maria Gleixner authored
      hrtimer_cpu_base.next_timer stores the pointer to the next expiring timer
      in a CPU base.
      
      This pointer cannot be dereferenced and is solely used to check whether a
      hrtimer which is removed is the hrtimer which is the first to expire in the
      CPU base. If this is the case, then the timer hardware needs to be
      reprogrammed to avoid an extra interrupt for nothing.
      
      Again, this is conditional functionality, but there is no compelling reason
      to make this conditional. As a preparation, hrtimer_cpu_base.next_timer
      needs to be available unconditonally.
      
      Aside of that the upcoming support for softirq based hrtimers requires access
      to this pointer unconditionally as well, so our motivation is not entirely
      simplicity based.
      
      Make the update of hrtimer_cpu_base.next_timer unconditional and remove the
      #ifdef cruft. The impact on CONFIG_HIGH_RES_TIMERS=n && CONFIG_NOHZ=n is
      marginal as it's just a store on an already dirtied cacheline.
      
      No functional change.
      Signed-off-by: default avatarAnna-Maria Gleixner <anna-maria@linutronix.de>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: keescook@chromium.org
      Link: http://lkml.kernel.org/r/20171221104205.7269-17-anna-maria@linutronix.deSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      eb27926b
    • Anna-Maria Gleixner's avatar
      hrtimer: Make the remote enqueue check unconditional · 07a9a7ea
      Anna-Maria Gleixner authored
      hrtimer_cpu_base.expires_next is used to cache the next event armed in the
      timer hardware. The value is used to check whether an hrtimer can be
      enqueued remotely. If the new hrtimer is expiring before expires_next, then
      remote enqueue is not possible as the remote hrtimer hardware cannot be
      accessed for reprogramming to an earlier expiry time.
      
      The remote enqueue check is currently conditional on
      CONFIG_HIGH_RES_TIMERS=y and hrtimer_cpu_base.hres_active. There is no
      compelling reason to make this conditional.
      
      Move hrtimer_cpu_base.expires_next out of the CONFIG_HIGH_RES_TIMERS=y
      guarded area and remove the conditionals in hrtimer_check_target().
      
      The check is currently a NOOP for the CONFIG_HIGH_RES_TIMERS=n and the
      !hrtimer_cpu_base.hres_active case because in these cases nothing updates
      hrtimer_cpu_base.expires_next yet. This will be changed with later patches
      which further reduce the #ifdef zoo in this code.
      Signed-off-by: default avatarAnna-Maria Gleixner <anna-maria@linutronix.de>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: keescook@chromium.org
      Link: http://lkml.kernel.org/r/20171221104205.7269-16-anna-maria@linutronix.deSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      07a9a7ea
    • Anna-Maria Gleixner's avatar
      hrtimer: Use accesor functions instead of direct access · 851cff8c
      Anna-Maria Gleixner authored
      __hrtimer_hres_active() is now available unconditionally, so replace open
      coded direct accesses to hrtimer_cpu_base.hres_active.
      
      No functional change.
      Signed-off-by: default avatarAnna-Maria Gleixner <anna-maria@linutronix.de>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: keescook@chromium.org
      Link: http://lkml.kernel.org/r/20171221104205.7269-15-anna-maria@linutronix.deSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      851cff8c
    • Anna-Maria Gleixner's avatar
      hrtimer: Make the hrtimer_cpu_base::hres_active field unconditional, to simplify the code · 28bfd18b
      Anna-Maria Gleixner authored
      The hrtimer_cpu_base::hres_active_member field depends on CONFIG_HIGH_RES_TIMERS=y
      currently, and all related functions to this member are conditional as well.
      
      To simplify the code make it unconditional and set it to zero during initialization.
      
      (This will also help with the upcoming softirq based hrtimers code.)
      
      The conditional code sections can be avoided by adding IS_ENABLED(HIGHRES)
      conditionals into common functions, which ensures dead code elimination.
      
      There is no functional change.
      Suggested-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarAnna-Maria Gleixner <anna-maria@linutronix.de>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: keescook@chromium.org
      Link: http://lkml.kernel.org/r/20171221104205.7269-14-anna-maria@linutronix.deSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      28bfd18b
    • Anna-Maria Gleixner's avatar
      hrtimer: Store running timer in hrtimer_clock_base · 3f0b9e8e
      Anna-Maria Gleixner authored
      The pointer to the currently running timer is stored in hrtimer_cpu_base
      before the base lock is dropped and the callback is invoked.
      
      This results in two levels of indirections and the upcoming support for
      softirq based hrtimer requires splitting the "running" storage into soft
      and hard IRQ context expiry.
      
      Storing both in the cpu base would require conditionals in all code paths
      accessing that information.
      
      It's possible to have a per clock base sequence count and running pointer
      without changing the semantics of the related mechanisms because the timer
      base pointer cannot be changed while a timer is running the callback.
      
      Unfortunately this makes cpu_clock base larger than 32 bytes on 32-bit
      kernels. Instead of having huge gaps due to alignment, remove the alignment
      and let the compiler pack CPU base for 32-bit kernels. The resulting cache access
      patterns are fortunately not really different from the current
      behaviour. On 64-bit kernels the 64-byte alignment stays and the behaviour is
      unchanged. This was determined by analyzing the resulting layout and
      looking at the number of cache lines involved for the frequently used
      clocks.
      Signed-off-by: default avatarAnna-Maria Gleixner <anna-maria@linutronix.de>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: keescook@chromium.org
      Link: http://lkml.kernel.org/r/20171221104205.7269-12-anna-maria@linutronix.deSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      3f0b9e8e
    • Anna-Maria Gleixner's avatar
      hrtimer: Switch 'for' loop to _ffs() evaluation · c272ca58
      Anna-Maria Gleixner authored
      Looping over all clock bases to find active bits is suboptimal if not all
      bases are active.
      
      Avoid this by converting it to a __ffs() evaluation. The functionallity is
      outsourced into its own function and is called via a macro as suggested by
      Peter Zijlstra.
      Suggested-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarAnna-Maria Gleixner <anna-maria@linutronix.de>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: keescook@chromium.org
      Link: http://lkml.kernel.org/r/20171221104205.7269-11-anna-maria@linutronix.deSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      c272ca58
    • Anna-Maria Gleixner's avatar
      tracing/hrtimer: Print the hrtimer mode in the 'hrtimer_start' tracepoint · 63e2ed36
      Anna-Maria Gleixner authored
      The 'hrtimer_start' tracepoint lacks the mode information. The mode is
      important because consecutive starts can switch from ABS to REL or from
      PINNED to non PINNED.
      
      Append the mode field.
      Signed-off-by: default avatarAnna-Maria Gleixner <anna-maria@linutronix.de>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: keescook@chromium.org
      Link: http://lkml.kernel.org/r/20171221104205.7269-10-anna-maria@linutronix.deSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      63e2ed36
    • Anna-Maria Gleixner's avatar
      hrtimer: Ensure POSIX compliance (relative CLOCK_REALTIME hrtimers) · 48d0c9be
      Anna-Maria Gleixner authored
      The POSIX specification defines that relative CLOCK_REALTIME timers are not
      affected by clock modifications. Those timers have to use CLOCK_MONOTONIC
      to ensure POSIX compliance.
      
      The introduction of the additional HRTIMER_MODE_PINNED mode broke this
      requirement for pinned timers.
      
      There is no user space visible impact because user space timers are not
      using pinned mode, but for consistency reasons this needs to be fixed.
      
      Check whether the mode has the HRTIMER_MODE_REL bit set instead of
      comparing with HRTIMER_MODE_ABS.
      Signed-off-by: default avatarAnna-Maria Gleixner <anna-maria@linutronix.de>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: keescook@chromium.org
      Fixes: 597d0275 ("timers: Framework for identifying pinned timers")
      Link: http://lkml.kernel.org/r/20171221104205.7269-7-anna-maria@linutronix.deSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      48d0c9be
    • Anna-Maria Gleixner's avatar
      hrtimer: Fix hrtimer_start[_range_ns]() function descriptions · 6de6250c
      Anna-Maria Gleixner authored
      The hrtimer_start[_range_ns]() functions start a timer reliably on this CPU only
      when HRTIMER_MODE_PINNED is set.
      
      Furthermore the HRTIMER_MODE_PINNED mode is not considered when a hrtimer is initialized.
      Signed-off-by: default avatarAnna-Maria Gleixner <anna-maria@linutronix.de>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: keescook@chromium.org
      Link: http://lkml.kernel.org/r/20171221104205.7269-6-anna-maria@linutronix.deSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      6de6250c
    • Anna-Maria Gleixner's avatar
      hrtimer: Clean up the 'int clock' parameter of schedule_hrtimeout_range_clock() · 90777713
      Anna-Maria Gleixner authored
      schedule_hrtimeout_range_clock() uses an 'int clock' parameter for the
      clock ID, instead of the customary predefined "clockid_t" type.
      
      In hrtimer coding style the canonical variable name for the clock ID is
      'clock_id', therefore change the name of the parameter here as well
      to make it all consistent.
      
      While at it, clean up the description for the 'clock_id' and 'mode'
      function parameters. The clock modes and the clock IDs are not
      restricted as the comment suggests.
      
      Fix the mode description as well for the callers of schedule_hrtimeout_range_clock().
      
      No functional changes intended.
      Signed-off-by: default avatarAnna-Maria Gleixner <anna-maria@linutronix.de>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: keescook@chromium.org
      Link: http://lkml.kernel.org/r/20171221104205.7269-5-anna-maria@linutronix.deSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      90777713
    • Thomas Gleixner's avatar
      hrtimer: Correct blatantly incorrect comment · d05ca13b
      Thomas Gleixner authored
      The protection of a hrtimer which runs its callback against migration to a
      different CPU has nothing to do with hard interrupt context.
      
      The protection against migration of a hrtimer running the expiry callback
      is the pointer in the cpu_base which holds a pointer to the currently
      running timer. This pointer is evaluated in the code which potentially
      switches the timer base and makes sure it's kept on the CPU on which the
      callback is running.
      Reported-by: default avatarAnna-Maria Gleixner <anna-maria@linutronix.de>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarAnna-Maria Gleixner <anna-maria@linutronix.de>
      Reviewed-by: default avatarFrederic Weisbecker <frederic@kernel.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: keescook@chromium.org
      Link: http://lkml.kernel.org/r/20171221104205.7269-3-anna-maria@linutronix.deSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      d05ca13b
    • Thomas Gleixner's avatar
      hrtimer: Optimize the hrtimer code by using static keys for migration_enable/nohz_active · ae67bada
      Thomas Gleixner authored
      The hrtimer_cpu_base::migration_enable and ::nohz_active fields
      were originally introduced to avoid accessing global variables
      for these decisions.
      
      Still that results in a (cache hot) load and conditional branch,
      which can be avoided by using static keys.
      
      Implement it with static keys and optimize for the most critical
      case of high performance networking which tends to disable the
      timer migration functionality.
      
      No change in functionality.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: keescook@chromium.org
      Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1801142327490.2371@nanos
      Link: https://lkml.kernel.org/r/20171221104205.7269-2-anna-maria@linutronix.deSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      ae67bada
  6. 14 Jan, 2018 2 commits
  7. 10 Jan, 2018 1 commit
  8. 04 Jan, 2018 1 commit
  9. 29 Dec, 2017 4 commits
    • Thomas Gleixner's avatar
      timers: Invoke timer_start_debug() where it makes sense · fd45bb77
      Thomas Gleixner authored
      The timer start debug function is called before the proper timer base is
      set. As a consequence the trace data contains the stale CPU and flags
      values.
      
      Call the debug function after setting the new base and flags.
      
      Fixes: 500462a9 ("timers: Switch to a non-cascading wheel")
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: stable@vger.kernel.org
      Cc: rt@linutronix.de
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
      Link: https://lkml.kernel.org/r/20171222145337.792907137@linutronix.de
      fd45bb77
    • Thomas Gleixner's avatar
      nohz: Prevent a timer interrupt storm in tick_nohz_stop_sched_tick() · 5d62c183
      Thomas Gleixner authored
      The conditions in irq_exit() to invoke tick_nohz_irq_exit() which
      subsequently invokes tick_nohz_stop_sched_tick() are:
      
        if ((idle_cpu(cpu) && !need_resched()) || tick_nohz_full_cpu(cpu))
      
      If need_resched() is not set, but a timer softirq is pending then this is
      an indication that the softirq code punted and delegated the execution to
      softirqd. need_resched() is not true because the current interrupted task
      takes precedence over softirqd.
      
      Invoking tick_nohz_irq_exit() in this case can cause an endless loop of
      timer interrupts because the timer wheel contains an expired timer, but
      softirqs are not yet executed. So it returns an immediate expiry request,
      which causes the timer to fire immediately again. Lather, rinse and
      repeat....
      
      Prevent that by adding a check for a pending timer soft interrupt to the
      conditions in tick_nohz_stop_sched_tick() which avoid calling
      get_next_timer_interrupt(). That keeps the tick sched timer on the tick and
      prevents a repetitive programming of an already expired timer.
      Reported-by: default avatarSebastian Siewior <bigeasy@linutronix.d>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Acked-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1712272156050.2431@nanos
      5d62c183
    • Thomas Gleixner's avatar
      timers: Reinitialize per cpu bases on hotplug · 26456f87
      Thomas Gleixner authored
      The timer wheel bases are not (re)initialized on CPU hotplug. That leaves
      them with a potentially stale clk and next_expiry valuem, which can cause
      trouble then the CPU is plugged.
      
      Add a prepare callback which forwards the clock, sets next_expiry to far in
      the future and reset the control flags to a known state.
      
      Set base->must_forward_clk so the first timer which is queued will try to
      forward the clock to current jiffies.
      
      Fixes: 500462a9 ("timers: Switch to a non-cascading wheel")
      Reported-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1712272152200.2431@nanos
      26456f87
    • Anna-Maria Gleixner's avatar
      timers: Use deferrable base independent of base::nohz_active · ced6d5c1
      Anna-Maria Gleixner authored
      During boot and before base::nohz_active is set in the timer bases, deferrable
      timers are enqueued into the standard timer base. This works correctly as
      long as base::nohz_active is false.
      
      Once it base::nohz_active is set and a timer which was enqueued before that
      is accessed the lock selector code choses the lock of the deferred
      base. This causes unlocked access to the standard base and in case the
      timer is removed it does not clear the pending flag in the standard base
      bitmap which causes get_next_timer_interrupt() to return bogus values.
      
      To prevent that, the deferrable timers must be enqueued in the deferrable
      base, even when base::nohz_active is not set. Those deferrable timers also
      need to be expired unconditional.
      
      Fixes: 500462a9 ("timers: Switch to a non-cascading wheel")
      Signed-off-by: default avatarAnna-Maria Gleixner <anna-maria@linutronix.de>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: stable@vger.kernel.org
      Cc: rt@linutronix.de
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      Link: https://lkml.kernel.org/r/20171222145337.633328378@linutronix.de
      ced6d5c1
  10. 28 Dec, 2017 1 commit
  11. 18 Dec, 2017 1 commit
    • Paul E. McKenney's avatar
      sched/isolation: Make CONFIG_NO_HZ_FULL select CONFIG_CPU_ISOLATION · bf29cb23
      Paul E. McKenney authored
      CONFIG_NO_HZ_FULL doesn't make sense without CONFIG_CPU_ISOLATION. In
      fact enabling the first without the second is a regression as nohz_full=
      boot parameter gets silently ignored.
      
      Besides this unnatural combination hangs RCU gp kthread when running
      rcutorture for reasons that are not yet fully understood:
      
      	rcu_preempt kthread starved for 9974 jiffies! g4294967208
      	+c4294967207 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x402 ->cpu=0
      	rcu_preempt     I 7464     8      2 0x80000000
      	Call Trace:
      		__schedule+0x493/0x620
      		schedule+0x24/0x40
      		schedule_timeout+0x330/0x3b0
      		? preempt_count_sub+0xea/0x140
      		? collect_expired_timers+0xb0/0xb0
      		rcu_gp_kthread+0x6bf/0xef0
      
      This commit therefore makes NO_HZ_FULL select CPU_ISOLATION, which
      prevents all these bad behaviours.
      Reported-by: default avatarkernel test robot <xiaolong.ye@intel.com>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarFrederic Weisbecker <frederic@kernel.org>
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luiz Capitulino <lcapitulino@redhat.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wanpeng Li <kernellwp@gmail.com>
      Fixes: 5c4991e2 ("sched/isolation: Split out new CONFIG_CPU_ISOLATION=y config from CONFIG_NO_HZ_FULL")
      Link: http://lkml.kernel.org/r/1513275507-29200-2-git-send-email-frederic@kernel.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      bf29cb23
  12. 15 Dec, 2017 1 commit
    • Thomas Gleixner's avatar
      posix-timer: Properly check sigevent->sigev_notify · cef31d9a
      Thomas Gleixner authored
      timer_create() specifies via sigevent->sigev_notify the signal delivery for
      the new timer. The valid modes are SIGEV_NONE, SIGEV_SIGNAL, SIGEV_THREAD
      and (SIGEV_SIGNAL | SIGEV_THREAD_ID).
      
      The sanity check in good_sigevent() is only checking the valid combination
      for the SIGEV_THREAD_ID bit, i.e. SIGEV_SIGNAL, but if SIGEV_THREAD_ID is
      not set it accepts any random value.
      
      This has no real effects on the posix timer and signal delivery code, but
      it affects show_timer() which handles the output of /proc/$PID/timers. That
      function uses a string array to pretty print sigev_notify. The access to
      that array has no bound checks, so random sigev_notify cause access beyond
      the array bounds.
      
      Add proper checks for the valid notify modes and remove the SIGEV_THREAD_ID
      masking from various code pathes as SIGEV_NONE can never be set in
      combination with SIGEV_THREAD_ID.
      Reported-by: default avatarEric Biggers <ebiggers3@gmail.com>
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Reported-by: default avatarAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: stable@vger.kernel.org
      cef31d9a