1. 02 Mar, 2017 2 commits
  2. 01 Feb, 2017 2 commits
  3. 08 Oct, 2016 1 commit
  4. 02 Aug, 2016 1 commit
  5. 13 Dec, 2014 1 commit
    • Thomas Gleixner's avatar
      genirq: Prevent proc race against freeing of irq descriptors · c291ee62
      Thomas Gleixner authored
      Since the rework of the sparse interrupt code to actually free the
      unused interrupt descriptors there exists a race between the /proc
      interfaces to the irq subsystem and the code which frees the interrupt
      descriptor.
      
      CPU0				CPU1
      				show_interrupts()
      				  desc = irq_to_desc(X);
      free_desc(desc)
        remove_from_radix_tree();
        kfree(desc);
      				  raw_spinlock_irq(&desc->lock);
      
      /proc/interrupts is the only interface which can actively corrupt
      kernel memory via the lock access. /proc/stat can only read from freed
      memory. Extremly hard to trigger, but possible.
      
      The interfaces in /proc/irq/N/ are not affected by this because the
      removal of the proc file is serialized in procfs against concurrent
      readers/writers. The removal happens before the descriptor is freed.
      
      For architectures which have CONFIG_SPARSE_IRQ=n this is a non issue
      as the descriptor is never freed. It's merely cleared out with the irq
      descriptor lock held. So any concurrent proc access will either see
      the old correct value or the cleared out ones.
      
      Protect the lookup and access to the irq descriptor in
      show_interrupts() with the sparse_irq_lock.
      
      Provide kstat_irqs_usr() which is protecting the lookup and access
      with sparse_irq_lock and switch /proc/stat to use it.
      
      Document the existing kstat_irqs interfaces so it's clear that the
      caller needs to take care about protection. The users of these
      interfaces are either not affected due to SPARSE_IRQ=n or already
      protected against removal.
      
      Fixes: 1f5a5b87 "genirq: Implement a sane sparse_irq allocator"
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      c291ee62
  6. 03 Jul, 2014 1 commit
    • Heiko Carstens's avatar
      /proc/stat: convert to single_open_size() · f74373a5
      Heiko Carstens authored
      These two patches are supposed to "fix" failed order-4 memory
      allocations which have been observed when reading /proc/stat.  The
      problem has been observed on s390 as well as on x86.
      
      To address the problem change the seq_file memory allocations to
      fallback to use vmalloc, so that allocations also work if memory is
      fragmented.
      
      This approach seems to be simpler and less intrusive than changing
      /proc/stat to use an interator.  Also it "fixes" other users as well,
      which use seq_file's single_open() interface.
      
      This patch (of 2):
      
      Use seq_file's single_open_size() to preallocate a buffer that is large
      enough to hold the whole output, instead of open coding it.  Also
      calculate the requested size using the number of online cpus instead of
      possible cpus, since the size of the output only depends on the number
      of online cpus.
      Signed-off-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: Ian Kent <raven@themaw.net>
      Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
      Cc: Thorsten Diehl <thorsten.diehl@de.ibm.com>
      Cc: Andrea Righi <andrea@betterlinux.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Stefan Bader <stefan.bader@canonical.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f74373a5
  7. 13 Mar, 2014 1 commit
    • Frederic Weisbecker's avatar
      cputime: Default implementation of nsecs -> cputime conversion · bfc3f028
      Frederic Weisbecker authored
      The architectures that override cputime_t (s390, ppc) don't provide
      any version of nsecs_to_cputime(). Indeed this cputime_t implementation
      by backend only happens when CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y under
      which the core code doesn't make any use of nsecs_to_cputime().
      
      At least for now.
      
      We are going to make a broader use of it so lets provide a default
      version with a per usecs granularity. It should be good enough for most
      usecases.
      
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      bfc3f028
  8. 24 Jan, 2014 1 commit
    • Paul Gortmaker's avatar
      fs/proc: don't use module_init for non-modular core code · abaf3787
      Paul Gortmaker authored
      PROC_FS is a bool, so this code is either present or absent.  It will
      never be modular, so using module_init as an alias for __initcall is
      rather misleading.
      
      Fix this up now, so that we can relocate module_init from init.h into
      module.h in the future.  If we don't do this, we'd have to add module.h to
      obviously non-modular code, and that would be ugly at best.
      
      Note that direct use of __initcall is discouraged, vs.  one of the
      priority categorized subgroups.  As __initcall gets mapped onto
      device_initcall, our use of fs_initcall (which makes sense for fs code)
      will thus change these registrations from level 6-device to level 5-fs
      (i.e.  slightly earlier).  However no observable impact of that small
      difference has been observed during testing, or is expected.
      
      Also note that this change uncovers a missing semicolon bug in the
      registration of vmcore_init as an initcall.
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      abaf3787
  9. 01 Feb, 2013 1 commit
  10. 10 Oct, 2012 1 commit
  11. 30 Mar, 2012 1 commit
  12. 23 Mar, 2012 2 commits
    • KAMEZAWA Hiroyuki's avatar
      procfs: add num_to_str() to speed up /proc/stat · 1ac101a5
      KAMEZAWA Hiroyuki authored
      == stat_check.py
      num = 0
      with open("/proc/stat") as f:
              while num < 1000 :
                      data = f.read()
                      f.seek(0, 0)
                      num = num + 1
      ==
      
      perf shows
      
          20.39%  stat_check.py  [kernel.kallsyms]    [k] format_decode
          13.41%  stat_check.py  [kernel.kallsyms]    [k] number
          12.61%  stat_check.py  [kernel.kallsyms]    [k] vsnprintf
          10.85%  stat_check.py  [kernel.kallsyms]    [k] memcpy
           4.85%  stat_check.py  [kernel.kallsyms]    [k] radix_tree_lookup
           4.43%  stat_check.py  [kernel.kallsyms]    [k] seq_printf
      
      This patch removes most of calls to vsnprintf() by adding num_to_str()
      and seq_print_decimal_ull(), which prints decimal numbers without rich
      functions provided by printf().
      
      On my 8cpu box.
      == Before patch ==
      [root@bluextal test]# time ./stat_check.py
      
      real    0m0.150s
      user    0m0.026s
      sys     0m0.121s
      
      == After patch ==
      [root@bluextal test]# time ./stat_check.py
      
      real    0m0.055s
      user    0m0.022s
      sys     0m0.030s
      
      [akpm@linux-foundation.org: remove incorrect comment, use less statck in num_to_str(), move comment from .h to .c, simplify seq_put_decimal_ull()]
      [andrea@betterlinux.com: avoid breaking the ABI in /proc/stat]
      Signed-off-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: default avatarAndrea Righi <andrea@betterlinux.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Glauber Costa <glommer@parallels.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Paul Turner <pjt@google.com>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1ac101a5
    • Eric Dumazet's avatar
      proc: speed up /proc/stat handling · 59a32e2c
      Eric Dumazet authored
      On a typical 16 cpus machine, "cat /proc/stat" gives more than 4096 bytes,
      and is slow :
      
        # strace -T -o /tmp/STRACE cat /proc/stat | wc -c
        5826
        # grep "cpu " /tmp/STRACE
        read(0, "cpu  1949310 19 2144714 12117253"..., 32768) = 5826 <0.001504>
      
      Thats partly because show_stat() must be called twice since initial
      buffer size is too small (4096 bytes for less than 32 possible cpus)
      
      Fix this by :
      
       1) Taking into account nr_irqs in the initial buffer sizing.
      
       2) Using ksize() to allow better filling of initial buffer.
      Signed-off-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Cc: Glauber Costa <glommer@parallels.com>
      Cc: Russell King - ARM Linux <linux@arm.linux.org.uk>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Paul Turner <pjt@google.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      59a32e2c
  13. 16 Jan, 2012 1 commit
  14. 30 Dec, 2011 1 commit
    • Andreas Schwab's avatar
      procfs: do not confuse jiffies with cputime64_t · 34845636
      Andreas Schwab authored
      Commit 2a95ea6c ("procfs: do not overflow get_{idle,iowait}_time
      for nohz") did not take into account that one some architectures jiffies
      and cputime use different units.
      
      This causes get_idle_time() to return numbers in the wrong units, making
      the idle time fields in /proc/stat wrong.
      
      Instead of converting the usec value returned by
      get_cpu_{idle,iowait}_time_us to units of jiffies, use the new function
      usecs_to_cputime64 to convert it to the correct unit of cputime64_t.
      Signed-off-by: default avatarAndreas Schwab <schwab@linux-m68k.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.cz>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: "Artem S. Tashkinov" <t.artem@mailcity.com>
      Cc: Dave Jones <davej@redhat.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      34845636
  15. 15 Dec, 2011 1 commit
  16. 09 Dec, 2011 1 commit
    • Michal Hocko's avatar
      procfs: do not overflow get_{idle,iowait}_time for nohz · 2a95ea6c
      Michal Hocko authored
      Since commit a25cac51 ("proc: Consider NO_HZ when printing idle and
      iowait times") we are reporting idle/io_wait time also while a CPU is
      tickless.  We rely on get_{idle,iowait}_time functions to retrieve
      proper data.
      
      These functions, however, use usecs_to_cputime to translate micro
      seconds time to cputime64_t.  This is just an alias to usecs_to_jiffies
      which reduces the data type from u64 to unsigned int and also checks
      whether the given parameter overflows jiffies_to_usecs(MAX_JIFFY_OFFSET)
      and returns MAX_JIFFY_OFFSET in that case.
      
      When we overflow depends on CONFIG_HZ but especially for CONFIG_HZ_300
      it is quite low (1431649781) so we are getting MAX_JIFFY_OFFSET for
      >3000s! until we overflow unsigned int.  Just for reference
      CONFIG_HZ_100 has an overflow window around 20s, CONFIG_HZ_250 ~8s and
      CONFIG_HZ_1000 ~2s.
      
      This results in a bug when people saw [h]top going mad reporting 100%
      CPU usage even though there was basically no CPU load.  The reason was
      simply that /proc/stat stopped reporting idle/io_wait changes (and
      reported MAX_JIFFY_OFFSET) and so the only change happening was for user
      system time.
      
      Let's use nsecs_to_jiffies64 instead which doesn't reduce the precision
      to 32b type and it is much more appropriate for cumulative time values
      (unlike usecs_to_jiffies which intended for timeout calculations).
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.cz>
      Tested-by: default avatarArtem S. Tashkinov <t.artem@mailcity.com>
      Cc: Dave Jones <davej@redhat.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2a95ea6c
  17. 06 Dec, 2011 1 commit
  18. 08 Sep, 2011 1 commit
    • Michal Hocko's avatar
      proc: Consider NO_HZ when printing idle and iowait times · a25cac51
      Michal Hocko authored
      show_stat handler of the /proc/stat file relies on kstat_cpu(cpu)
      statistics when priting information about idle and iowait times.
      This is OK if we are not using tickless kernel (CONFIG_NO_HZ) because
      counters are updated periodically.
      With NO_HZ things got more tricky because we are not doing idle/iowait
      accounting while we are tickless so the value might get outdated.
      Users of /proc/stat will notice that by unchanged idle/iowait values
      which is then interpreted as 0% idle/iowait time. From the user space
      POV this is an unexpected behavior and a change of the interface.
      
      Let's fix this by using get_cpu_{idle,iowait}_time_us which accounts the
      total idle/iowait time since boot and it doesn't rely on sampling or any
      other periodic activity. Fall back to the previous behavior if NO_HZ is
      disabled or not configured.
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.cz>
      Cc: Dave Jones <davej@redhat.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Link: http://lkml.kernel.org/r/39181366adac1b39cb6aa3cd53ff0f7c78d32676.1314172057.git.mhocko@suse.czSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      a25cac51
  19. 27 May, 2011 1 commit
  20. 13 Jan, 2011 1 commit
  21. 28 Oct, 2010 2 commits
  22. 30 Mar, 2010 1 commit
    • Tejun Heo's avatar
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo authored
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Guess-its-ok-by: default avatarChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  23. 25 Oct, 2009 1 commit
  24. 18 Jun, 2009 1 commit
  25. 23 Apr, 2009 1 commit
    • Martin Schwidefsky's avatar
      [S390] /proc/stat idle field for idle cpus · e1c80530
      Martin Schwidefsky authored
      The cpu idle field in the output of /proc/stat is too small for cpus
      that have been idle for more than a tick. Add the architecture hook
      arch_idle_time that allows to add the not accounted idle time of a
      sleeping cpu without waking the cpu.
      
      The s390 implementation of arch_idle_time uses the already existing
      s390_idle_data per_cpu variable to find the sleep time of a neighboring
      idle cpu.
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      e1c80530
  26. 26 Dec, 2008 1 commit
  27. 16 Dec, 2008 1 commit
  28. 09 Dec, 2008 1 commit
    • Yinghai Lu's avatar
      sparseirq: fix Alpha build failure · 240d367b
      Yinghai Lu authored
      Impact: build fix on Alpha
      
      -tip testing found this build failure on the Alpha defconfig:
      
      /home/mingo/tip/fs/proc/stat.c: In function 'show_stat':
      /home/mingo/tip/fs/proc/stat.c:48: error: implicit declaration of function 'for_each_irq_desc'
      /home/mingo/tip/fs/proc/stat.c:48: error: expected ';' before '{' token
      
      can not use irq_desc() in stat.c on older architectures.
      Signed-off-by: default avatarYinghai Lu <yinghai@kernel.orgg>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      240d367b
  29. 08 Dec, 2008 1 commit
    • Yinghai Lu's avatar
      sparse irq_desc[] array: core kernel and x86 changes · 0b8f1efa
      Yinghai Lu authored
      Impact: new feature
      
      Problem on distro kernels: irq_desc[NR_IRQS] takes megabytes of RAM with
      NR_CPUS set to large values. The goal is to be able to scale up to much
      larger NR_IRQS value without impacting the (important) common case.
      
      To solve this, we generalize irq_desc[NR_IRQS] to an (optional) array of
      irq_desc pointers.
      
      When CONFIG_SPARSE_IRQ=y is used, we use kzalloc_node to get irq_desc,
      this also makes the IRQ descriptors NUMA-local (to the site that calls
      request_irq()).
      
      This gets rid of the irq_cfg[] static array on x86 as well: irq_cfg now
      uses desc->chip_data for x86 to store irq_cfg.
      Signed-off-by: default avatarYinghai Lu <yinghai@kernel.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      0b8f1efa
  30. 23 Oct, 2008 1 commit