1. 01 Sep, 2006 1 commit
    • Shailabh Nagar's avatar
      [PATCH] task delay accounting fixes · 35df17c5
      Shailabh Nagar authored
      Cleanup allocation and freeing of tsk->delays used by delay accounting.
      This solves two problems reported for delay accounting:
      1. oops in __delayacct_blkio_ticks
      Currently tsk->delays is getting freed too early in task exit which can
      cause a NULL tsk->delays to get accessed via reading of /proc/<tgid>/stats.
       The patch fixes this problem by freeing tsk->delays closer to when
      task_struct itself is freed up.  As a result, it also eliminates the use of
      tsk->delays_lock which was only being used (inadequately) to safeguard
      access to tsk->delays while a task was exiting.
      2. Possible memory leak in kernel/delayacct.c
      The patch cleans up tsk->delays allocations after a bad fork which was
      missing earlier.
      The patch has been tested to fix the problems listed above and stress
      tested with rapid calls to delay accounting's taskstats command interface
      (which is the other path that can access the same data, besides the /proc
      interface causing the oops above).
      Signed-off-by: default avatarShailabh Nagar <nagar@watson.ibm.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
  2. 06 Aug, 2006 1 commit
  3. 15 Jul, 2006 2 commits
    • Shailabh Nagar's avatar
      [PATCH] delay accounting taskstats interface send tgid once · ad4ecbcb
      Shailabh Nagar authored
      Send per-tgid data only once during exit of a thread group instead of once
      with each member thread exit.
      Currently, when a thread exits, besides its per-tid data, the per-tgid data
      of its thread group is also sent out, if its thread group is non-empty.
      The per-tgid data sent consists of the sum of per-tid stats for all
      *remaining* threads of the thread group.
      This patch modifies this sending in two ways:
      - the per-tgid data is sent only when the last thread of a thread group
        exits.  This cuts down heavily on the overhead of sending/receiving
        per-tgid data, especially when other exploiters of the taskstats
        interface aren't interested in per-tgid stats
      - the semantics of the per-tgid data sent are changed.  Instead of being
        the sum of per-tid data for remaining threads, the value now sent is the
        true total accumalated statistics for all threads that are/were part of
        the thread group.
      The patch also addresses a minor issue where failure of one accounting
      subsystem to fill in the taskstats structure was causing the send of
      taskstats to not be sent at all.
      The patch has been tested for stability and run cerberus for over 4 hours
      on an SMP.
      [akpm@osdl.org: bugfixes]
      Signed-off-by: default avatarShailabh Nagar <nagar@watson.ibm.com>
      Signed-off-by: default avatarBalbir Singh <balbir@in.ibm.com>
      Cc: Jay Lan <jlan@engr.sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
    • Shailabh Nagar's avatar
      [PATCH] per-task-delay-accounting: setup · ca74e92b
      Shailabh Nagar authored
      Initialization code related to collection of per-task "delay" statistics which
      measure how long it had to wait for cpu, sync block io, swapping etc.  The
      collection of statistics and the interface are in other patches.  This patch
      sets up the data structures and allows the statistics collection to be
      disabled through a kernel boot parameter.
      Signed-off-by: default avatarShailabh Nagar <nagar@watson.ibm.com>
      Signed-off-by: default avatarBalbir Singh <balbir@in.ibm.com>
      Cc: Jes Sorensen <jes@sgi.com>
      Cc: Peter Chubb <peterc@gelato.unsw.edu.au>
      Cc: Erich Focht <efocht@ess.nec.de>
      Cc: Levent Serinol <lserinol@gmail.com>
      Cc: Jay Lan <jlan@engr.sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
  4. 10 Jul, 2006 1 commit
  5. 03 Jul, 2006 5 commits
    • Ingo Molnar's avatar
      [PATCH] sched: cleanup, remove task_t, convert to struct task_struct · 36c8b586
      Ingo Molnar authored
      cleanup: remove task_t and convert all the uses to struct task_struct. I
      introduced it for the scheduler anno and it was a mistake.
      Conversion was mostly scripted, the result was reviewed and all
      secondary whitespace and style impact (if any) was fixed up by hand.
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
    • Ingo Molnar's avatar
      [PATCH] lockdep: annotate ->mmap_sem · ad339451
      Ingo Molnar authored
      Teach special (recursive) locking code to the lock validator.  Has no effect
      on non-lockdep kernels.
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarArjan van de Ven <arjan@linux.intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
    • Ingo Molnar's avatar
      [PATCH] lockdep: core · fbb9ce95
      Ingo Molnar authored
      Do 'make oldconfig' and accept all the defaults for new config options -
      reboot into the kernel and if everything goes well it should boot up fine and
      you should have /proc/lockdep and /proc/lockdep_stats files.
      Typically if the lock validator finds some problem it will print out
      voluminous debug output that begins with "BUG: ..." and which syslog output
      can be used by kernel developers to figure out the precise locking scenario.
      What does the lock validator do?  It "observes" and maps all locking rules as
      they occur dynamically (as triggered by the kernel's natural use of spinlocks,
      rwlocks, mutexes and rwsems).  Whenever the lock validator subsystem detects a
      new locking scenario, it validates this new rule against the existing set of
      rules.  If this new rule is consistent with the existing set of rules then the
      new rule is added transparently and the kernel continues as normal.  If the
      new rule could create a deadlock scenario then this condition is printed out.
      When determining validity of locking, all possible "deadlock scenarios" are
      considered: assuming arbitrary number of CPUs, arbitrary irq context and task
      context constellations, running arbitrary combinations of all the existing
      locking scenarios.  In a typical system this means millions of separate
      scenarios.  This is why we call it a "locking correctness" validator - for all
      rules that are observed the lock validator proves it with mathematical
      certainty that a deadlock could not occur (assuming that the lock validator
      implementation itself is correct and its internal data structures are not
      corrupted by some other kernel subsystem).  [see more details and conditionals
      of this statement in include/linux/lockdep.h and
      Furthermore, this "all possible scenarios" property of the validator also
      enables the finding of complex, highly unlikely multi-CPU multi-context races
      via single single-context rules, increasing the likelyhood of finding bugs
      drastically.  In practical terms: the lock validator already found a bug in
      the upstream kernel that could only occur on systems with 3 or more CPUs, and
      which needed 3 very unlikely code sequences to occur at once on the 3 CPUs.
      That bug was found and reported on a single-CPU system (!).  So in essence a
      race will be found "piecemail-wise", triggering all the necessary components
      for the race, without having to reproduce the race scenario itself!  In its
      short existence the lock validator found and reported many bugs before they
      actually caused a real deadlock.
      To further increase the efficiency of the validator, the mapping is not per
      "lock instance", but per "lock-class".  For example, all struct inode objects
      in the kernel have inode->inotify_mutex.  If there are 10,000 inodes cached,
      then there are 10,000 lock objects.  But ->inotify_mutex is a single "lock
      type", and all locking activities that occur against ->inotify_mutex are
      "unified" into this single lock-class.  The advantage of the lock-class
      approach is that all historical ->inotify_mutex uses are mapped into a single
      (and as narrow as possible) set of locking rules - regardless of how many
      different tasks or inode structures it took to build this set of rules.  The
      set of rules persist during the lifetime of the kernel.
      To see the rough magnitude of checking that the lock validator does, here's a
      portion of /proc/lockdep_stats, fresh after bootup:
       lock-classes:                            694 [max: 2048]
       direct dependencies:                  1598 [max: 8192]
       indirect dependencies:               17896
       all direct dependencies:             16206
       dependency chains:                    1910 [max: 8192]
       in-hardirq chains:                      17
       in-softirq chains:                     105
       in-process chains:                    1065
       stack-trace entries:                 38761 [max: 131072]
       combined max dependencies:         2033928
       hardirq-safe locks:                     24
       hardirq-unsafe locks:                  176
       softirq-safe locks:                     53
       softirq-unsafe locks:                  137
       irq-safe locks:                         59
       irq-unsafe locks:                      176
      The lock validator has observed 1598 actual single-thread locking patterns,
      and has validated all possible 2033928 distinct locking scenarios.
      More details about the design of the lock validator can be found in
      Documentation/lockdep-design.txt, which can also found at:
      [bunk@stusta.de: cleanups]
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarArjan van de Ven <arjan@linux.intel.com>
      Signed-off-by: default avatarAdrian Bunk <bunk@stusta.de>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
    • Ingo Molnar's avatar
      [PATCH] lockdep: irqtrace subsystem, core · de30a2b3
      Ingo Molnar authored
      Accurate hard-IRQ-flags and softirq-flags state tracing.
      This allows us to attach extra functionality to IRQ flags on/off
      events (such as trace-on/off).
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarArjan van de Ven <arjan@linux.intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
    • Ingo Molnar's avatar
      [PATCH] lockdep: better lock debugging · 9a11b49a
      Ingo Molnar authored
      Generic lock debugging:
       - generalized lock debugging framework. For example, a bug in one lock
         subsystem turns off debugging in all lock subsystems.
       - got rid of the caller address passing (__IP__/__IP_DECL__/etc.) from
         the mutex/rtmutex debugging code: it caused way too much prototype
         hackery, and lockdep will give the same information anyway.
       - ability to do silent tests
       - check lock freeing in vfree too.
       - more finegrained debugging options, to allow distributions to
         turn off more expensive debugging features.
      There's no separate 'held mutexes' list anymore - but there's a 'held locks'
      stack within lockdep, which unifies deadlock detection across all lock
      classes.  (this is independent of the lockdep validation stuff - lockdep first
      checks whether we are holding a lock already)
      Here are the current debugging options:
      which do:
       config DEBUG_MUTEXES
                bool "Mutex debugging, basic checks"
       config DEBUG_LOCK_ALLOC
               bool "Detect incorrect freeing of live mutexes"
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarArjan van de Ven <arjan@linux.intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
  6. 30 Jun, 2006 1 commit
  7. 28 Jun, 2006 2 commits
  8. 26 Jun, 2006 2 commits
  9. 25 Jun, 2006 1 commit
    • KaiGai Kohei's avatar
      [PATCH] pacct: add pacct_struct to fix some pacct bugs. · 0e464814
      KaiGai Kohei authored
      The pacct facility need an i/o operation when an accounting record is
      generated.  There is a possibility to wake OOM killer up.  If OOM killer is
      activated, it kills some processes to make them release process memory
      But acct_process() is called in the killed processes context before calling
      exit_mm(), so those processes cannot release own memory.  In the results, any
      processes stop in this point and it finally cause a system stall.
  10. 23 Jun, 2006 2 commits
  11. 01 May, 2006 1 commit
  12. 20 Apr, 2006 1 commit
  13. 19 Apr, 2006 1 commit
  14. 15 Apr, 2006 1 commit
  15. 31 Mar, 2006 3 commits
    • Kirill Korotaev's avatar
      [PATCH] wrong error path in dup_fd() leading to oopses in RCU · 42862298
      Kirill Korotaev authored
      Wrong error path in dup_fd() - it should return NULL on error,
      not an address of already freed memory :/
      Triggered by OpenVZ stress test suite.
      What is interesting is that it was causing different oopses in RCU like
      Call Trace:
         [<c013492c>] rcu_do_batch+0x2c/0x80
         [<c0134bdd>] rcu_process_callbacks+0x3d/0x70
         [<c0126cf3>] tasklet_action+0x73/0xe0
         [<c01269aa>] __do_softirq+0x10a/0x130
         [<c01058ff>] do_softirq+0x4f/0x60
         [<c0113817>] smp_apic_timer_interrupt+0x77/0x110
         [<c0103b54>] apic_timer_interrupt+0x1c/0x24
        Code:  Bad EIP value.
         <0>Kernel panic - not syncing: Fatal exception in interrupt
      Signed-Off-By: default avatarPavel Emelianov <xemul@sw.ru>
      Signed-Off-By: default avatarDmitry Mishin <dim@openvz.org>
      Signed-Off-By: default avatarKirill Korotaev <dev@openvz.org>
      Signed-Off-By: default avatarLinus Torvalds <torvalds@osdl.org>
    • Eric W. Biederman's avatar
      [PATCH] pidhash: Refactor the pid hash table · 92476d7f
      Eric W. Biederman authored
      Simplifies the code, reduces the need for 4 pid hash tables, and makes the
      code more capable.
      In the discussions I had with Oleg it was felt that to a large extent the
      cleanup itself justified the work.  With struct pid being dynamically
      allocated meant we could create the hash table entry when the pid was
      allocated and free the hash table entry when the pid was freed.  Instead of
      playing with the hash lists when ever a process would attach or detach to a
      For myself the fact that it gave what my previous task_ref patch gave for free
      with simpler code was a big win.  The problem is that if you hold a reference
      to struct task_struct you lock in 10K of low memory.  If you do that in a user
      controllable way like /proc does, with an unprivileged but hostile user space
      application with typical resource limits of 1000 fds and 100 processes I can
      trigger the OOM killer by consuming all of low memory with task structs, on a
      machine wight 1GB of low memory.
      If I instead hold a reference to struct pid which holds a pointer to my
      task_struct, I don't suffer from that problem because struct pid is 2 orders
      of magnitude smaller.  In fact struct pid is small enough that most other
      kernel data structures dwarf it, so simply limiting the number of referring
      data structures is enough to prevent exhaustion of low memory.
      This splits the current struct pid into two structures, struct pid and struct
      pid_link, and reduces our number of hash tables from PIDTYPE_MAX to just one.
      struct pid_link is the per process linkage into the hash tables and lives in
      struct task_struct.  struct pid is given an indepedent lifetime, and holds
      pointers to each of the pid types.
      The independent life of struct pid simplifies attach_pid, and detach_pid,
      because we are always manipulating the list of pids and not the hash table.
      In addition in giving struct pid an indpendent life it makes the concept much
      more powerful.
      Kernel data structures can now embed a struct pid * instead of a pid_t and
      not suffer from pid wrap around problems or from keeping unnecessarily
      large amounts of memory allocated.
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
    • Andrew Morton's avatar
      [PATCH] resurrect __put_task_struct · 158d9ebd
      Andrew Morton authored
      This just got nuked in mainline.  Bring it back because Eric's patches use it.
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
  16. 29 Mar, 2006 9 commits
  17. 27 Mar, 2006 1 commit
  18. 26 Mar, 2006 2 commits
  19. 24 Mar, 2006 1 commit
    • Paul Jackson's avatar
      [PATCH] cpuset memory spread slab cache optimizations · c61afb18
      Paul Jackson authored
      The hooks in the slab cache allocator code path for support of NUMA
      mempolicies and cpuset memory spreading are in an important code path.  Many
      systems will use neither feature.
      This patch optimizes those hooks down to a single check of some bits in the
      current tasks task_struct flags.  For non NUMA systems, this hook and related
      code is already ifdef'd out.
      The optimization is done by using another task flag, set if the task is using
      a non-default NUMA mempolicy.  Taking this flag bit along with the
      PF_SPREAD_PAGE and PF_SPREAD_SLAB flag bits added earlier in this 'cpuset
      memory spreading' patch set, one can check for the combination of any of these
      special case memory placement mechanisms with a single test of the current
      tasks task_struct flags.
      This patch also tightens up the code, to save a few bytes of kernel text
      space, and moves some of it out of line.  Due to the nested inlines called
      from multiple places, we were ending up with three copies of this code, which
      once we get off the main code path (for local node allocation) seems a bit
      wasteful of instruction memory.
      Signed-off-by: default avatarPaul Jackson <pj@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
  20. 23 Mar, 2006 2 commits
    • Jens Axboe's avatar
    • Eric Dumazet's avatar
      [PATCH] Shrinks sizeof(files_struct) and better layout · 0c9e63fd
      Eric Dumazet authored
      1) Reduce the size of (struct fdtable) to exactly 64 bytes on 32bits
         platforms, lowering kmalloc() allocated space by 50%.
      2) Reduce the size of (files_struct), using a special 32 bits (or
         64bits) embedded_fd_set, instead of a 1024 bits fd_set for the
         close_on_exec_init and open_fds_init fields.  This save some ram (248
         bytes per task) as most tasks dont open more than 32 files.  D-Cache
         footprint for such tasks is also reduced to the minimum.
      3) Reduce size of allocated fdset.  Currently two full pages are
         allocated, that is 32768 bits on x86 for example, and way too much.  The
         minimum is now L1_CACHE_BYTES.
      UP and SMP should benefit from this patch, because most tasks will touch
      only one cache line when open()/close() stdin/stdout/stderr (0/1/2),
      (next_fd, close_on_exec_init, open_fds_init, fd_array[0 ..  2] being in the
      same cache line)
      Signed-off-by: default avatarEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>