1. 29 Mar, 2006 4 commits
    • Oleg Nesterov's avatar
      [PATCH] copy_process: cleanup bad_fork_cleanup_sighand · 7001510d
      Oleg Nesterov authored
      The only caller of exit_sighand(tsk) is copy_process's error path.  We can
      call __exit_sighand() directly and kill exit_sighand().
      This 'tsk' was not yet registered in pid_hash[] or init_task.tasks, it has no
      external references, nobody can see it, and
      	IF (clone_flags & CLONE_SIGHAND)
      		At least 'current' has a reference to ->sighand, this
      		means atomic_dec_and_test(sighand->count) can't be true.
      		Nobody can see this ->sighand, this means we can free it
      		without any locking.
      Signed-off-by: default avatarOleg Nesterov <oleg@tv-sign.ru>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Acked-by: default avatar"Paul E. McKenney" <paulmck@us.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
    • Oleg Nesterov's avatar
      [PATCH] convert sighand_cache to use SLAB_DESTROY_BY_RCU · aa1757f9
      Oleg Nesterov authored
      This patch borrows a clever Hugh's 'struct anon_vma' trick.
      Without tasklist_lock held we can't trust task->sighand until we locked it
      and re-checked that it is still the same.
      But this means we don't need to defer 'kmem_cache_free(sighand)'.  We can
      return the memory to slab immediately, all we need is to be sure that
      sighand->siglock can't dissapear inside rcu protected section.
      To do so we need to initialize ->siglock inside ctor function,
      SLAB_DESTROY_BY_RCU does the rest.
      Signed-off-by: default avatarOleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
    • Oleg Nesterov's avatar
      [PATCH] pidhash: don't count idle threads · 73b9ebfe
      Oleg Nesterov authored
      fork_idle() does unhash_process() just after copy_process().  Contrary,
      boot_cpu's idle thread explicitely registers itself for each pid_type with nr
      = 0.
      copy_process() already checks p->pid != 0 before process_counts++, I think we
      can just skip attach_pid() calls and job control inits for idle threads and
      kill unhash_process().  We don't need to cleanup ->proc_dentry in fork_idle()
      because with this patch idle threads are never hashed in
      We don't need to hash pid == 0 in pidmap_init().  free_pidmap() is never
      called with pid == 0 arg, so it will never be reused.  So it is still possible
      to use pid == 0 in any PIDTYPE_xxx namespace from kernel/pid.c's POV.
      However with this patch we don't hash pid == 0 for PIDTYPE_PID case.  We still
      have have PIDTYPE_PGID/PIDTYPE_SID entries with pid == 0: /sbin/init and
      kernel threads which don't call daemonize().
      Signed-off-by: default avatarOleg Nesterov <oleg@tv-sign.ru>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
    • Oleg Nesterov's avatar
      [PATCH] kill SET_LINKS/REMOVE_LINKS · c97d9893
      Oleg Nesterov authored
      Both SET_LINKS() and SET_LINKS/REMOVE_LINKS() have exactly one caller, and
      these callers already check thread_group_leader().
      This patch kills theese macros, they mix two different things: setting
      process's parent and registering it in init_task.tasks list.  Callers are
      updated to do these actions by hand.
      Signed-off-by: default avatarOleg Nesterov <oleg@tv-sign.ru>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
  2. 27 Mar, 2006 1 commit
  3. 26 Mar, 2006 2 commits
  4. 24 Mar, 2006 1 commit
    • Paul Jackson's avatar
      [PATCH] cpuset memory spread slab cache optimizations · c61afb18
      Paul Jackson authored
      The hooks in the slab cache allocator code path for support of NUMA
      mempolicies and cpuset memory spreading are in an important code path.  Many
      systems will use neither feature.
      This patch optimizes those hooks down to a single check of some bits in the
      current tasks task_struct flags.  For non NUMA systems, this hook and related
      code is already ifdef'd out.
      The optimization is done by using another task flag, set if the task is using
      a non-default NUMA mempolicy.  Taking this flag bit along with the
      PF_SPREAD_PAGE and PF_SPREAD_SLAB flag bits added earlier in this 'cpuset
      memory spreading' patch set, one can check for the combination of any of these
      special case memory placement mechanisms with a single test of the current
      tasks task_struct flags.
      This patch also tightens up the code, to save a few bytes of kernel text
      space, and moves some of it out of line.  Due to the nested inlines called
      from multiple places, we were ending up with three copies of this code, which
      once we get off the main code path (for local node allocation) seems a bit
      wasteful of instruction memory.
      Signed-off-by: default avatarPaul Jackson <pj@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
  5. 23 Mar, 2006 2 commits
    • Jens Axboe's avatar
    • Eric Dumazet's avatar
      [PATCH] Shrinks sizeof(files_struct) and better layout · 0c9e63fd
      Eric Dumazet authored
      1) Reduce the size of (struct fdtable) to exactly 64 bytes on 32bits
         platforms, lowering kmalloc() allocated space by 50%.
      2) Reduce the size of (files_struct), using a special 32 bits (or
         64bits) embedded_fd_set, instead of a 1024 bits fd_set for the
         close_on_exec_init and open_fds_init fields.  This save some ram (248
         bytes per task) as most tasks dont open more than 32 files.  D-Cache
         footprint for such tasks is also reduced to the minimum.
      3) Reduce size of allocated fdset.  Currently two full pages are
         allocated, that is 32768 bits on x86 for example, and way too much.  The
         minimum is now L1_CACHE_BYTES.
      UP and SMP should benefit from this patch, because most tasks will touch
      only one cache line when open()/close() stdin/stdout/stderr (0/1/2),
      (next_fd, close_on_exec_init, open_fds_init, fd_array[0 ..  2] being in the
      same cache line)
      Signed-off-by: default avatarEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
  6. 22 Mar, 2006 1 commit
  7. 18 Mar, 2006 1 commit
  8. 17 Mar, 2006 1 commit
  9. 14 Mar, 2006 1 commit
    • GOTO Masanori's avatar
      [PATCH] Fix sigaltstack corruption among cloned threads · f9a3879a
      GOTO Masanori authored
      This patch fixes alternate signal stack corruption among cloned threads
      with CLONE_SIGHAND (and CLONE_VM) for linux-2.6.16-rc6.
      The value of alternate signal stack is currently inherited after a call of
      clone(...  CLONE_SIGHAND | CLONE_VM).  But if sigaltstack is set by a
      parent thread, and then if multiple cloned child threads (+ parent threads)
      call signal handler at the same time, some threads may be conflicted -
      because they share to use the same alternative signal stack region.
      Finally they get sigsegv.  It's an undesirable race condition.  Note that
      child threads created from NPTL pthread_create() also hit this conflict
      when the parent thread uses sigaltstack, without my patch.
      To fix this problem, this patch clears the child threads' sigaltstack
      information like exec().  This behavior follows the SUSv3 specification.
      In SUSv3, pthread_create() says "The alternate stack shall not be inherited
      (when new threads are initialized)".  It means that sigaltstack should be
      cleared when sigaltstack memory space is shared by cloned threads with
      Note that I chose "if (clone_flags & CLONE_SIGHAND)" line because:
        - If clone_flags line is not existed, fork() does not inherit sigaltstack.
        - CLONE_VM is another choice, but vfork() does not inherit sigaltstack.
        - CLONE_SIGHAND implies CLONE_VM, and it looks suitable.
        - CLONE_THREAD is another candidate, and includes CLONE_SIGHAND + CLONE_VM,
          but this flag has a bit different semantics.
      I decided to use CLONE_SIGHAND.
      [ Changed to test for CLONE_VM && !CLONE_VFORK after discussion --Linus ]
      Signed-off-by: default avatarGOTO Masanori <gotom@sanori.org>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Acked-by: default avatarLinus Torvalds <torvalds@osdl.org>
      Cc: Ulrich Drepper <drepper@redhat.com>
      Cc: Jakub Jelinek <jakub@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
  10. 11 Mar, 2006 1 commit
  11. 15 Feb, 2006 2 commits
    • Oleg Nesterov's avatar
      [PATCH] fix kill_proc_info() vs fork() theoretical race · dadac81b
      Oleg Nesterov authored
      	attach_pid(p, PIDTYPE_PID, p->pid);
      	attach_pid(p, PIDTYPE_TGID, p->tgid);
      What if kill_proc_info(p->pid) happens in between?
      copy_process() holds current->sighand.siglock, so we are safe
      in CLONE_THREAD case, because current->sighand == p->sighand.
      Otherwise, p->sighand is unlocked, the new process is already
      visible to the find_task_by_pid(), but have a copy of parent's
      'struct pid' in ->pids[PIDTYPE_TGID].
      This means that __group_complete_signal() may hang while doing
      	do ... while (next_thread() != p)
      We can solve this problem if we reverse these 2 attach_pid()s:
      	attach_pid() does wmb()
      	group_send_sig_info() calls spin_lock(), which
      	provides a read barrier. // Yes ?
      I don't think we can hit this race in practice, but still.
      Signed-off-by: default avatarOleg Nesterov <oleg@tv-sign.ru>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
    • Oleg Nesterov's avatar
      [PATCH] fix kill_proc_info() vs CLONE_THREAD race · 3f17da69
      Oleg Nesterov authored
      There is a window after copy_process() unlocks ->sighand.siglock
      and before it adds the new thread to the thread list.
      In that window __group_complete_signal(SIGKILL) will not see the
      new thread yet, so this thread will start running while the whole
      thread group was supposed to exit.
      I beleive we have another good reason to place attach_pid(PID/TGID)
      under ->sighand.siglock. We can do the same for
      After that we don't need tasklist_lock to iterate over the thread
      list, and we can simplify things, see for example do_sigaction()
      or sys_times().
      Signed-off-by: default avatarOleg Nesterov <oleg@tv-sign.ru>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
  12. 08 Feb, 2006 5 commits
  13. 01 Feb, 2006 1 commit
  14. 12 Jan, 2006 2 commits
  15. 10 Jan, 2006 1 commit
  16. 09 Jan, 2006 5 commits
  17. 28 Nov, 2005 2 commits
  18. 22 Nov, 2005 1 commit
  19. 14 Nov, 2005 3 commits
  20. 07 Nov, 2005 2 commits
    • Hugh Dickins's avatar
      [SPARC64] mm: context switch ptlock · dedeb002
      Hugh Dickins authored
      sparc64 is unique among architectures in taking the page_table_lock in
      its context switch (well, cris does too, but erroneously, and it's not
      yet SMP anyway).
      This seems to be a private affair between switch_mm and activate_mm,
      using page_table_lock as a per-mm lock, without any relation to its uses
      elsewhere.  That's fine, but comment it as such; and unlock sooner in
      switch_mm, more like in activate_mm (preemption is disabled here).
      There is a block of "if (0)"ed code in smp_flush_tlb_pending which would
      have liked to rely on the page_table_lock, in switch_mm and elsewhere;
      but its comment explains how dup_mmap's flush_tlb_mm defeated it.  And
      though that could have been changed at any time over the past few years,
      now the chance vanishes as we push the page_table_lock downwards, and
      perhaps split it per page table page.  Just delete that block of code.
      Which leaves the mysterious spin_unlock_wait(&oldmm->page_table_lock)
      in kernel/fork.c copy_mm.  Textual analysis (supported by Nick Piggin)
      suggests that the comment was written by DaveM, and that it relates to
      the defeated approach in the sparc64 smp_flush_tlb_pending.  Just delete
      this block too.
      Signed-off-by: default avatarHugh Dickins <hugh@veritas.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Matt Helsley's avatar
      [PATCH] Process Events Connector · 9f46080c
      Matt Helsley authored
      This patch adds a connector that reports fork, exec, id change, and exit
      events for all processes to userspace.  It replaces the fork_advisor patch
      that ELSA is currently using.  Applications that may find these events
      useful include accounting/auditing (e.g.  ELSA), system activity monitoring
      (e.g.  top), security, and resource management (e.g.  CKRM).
      Signed-off-by: default avatarMatt Helsley <matthltc@us.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
  21. 30 Oct, 2005 1 commit
    • Hugh Dickins's avatar
      [PATCH] mm: ptd_alloc take ptlock · c74df32c
      Hugh Dickins authored
      Second step in pushing down the page_table_lock.  Remove the temporary
      bridging hack from __pud_alloc, __pmd_alloc, __pte_alloc: expect callers not
      to hold page_table_lock, whether it's on init_mm or a user mm; take
      page_table_lock internally to check if a racing task already allocated.
      Convert their callers from common code.  But avoid coming back to change them
      again later: instead of moving the spin_lock(&mm->page_table_lock) down,
      switch over to new macros pte_alloc_map_lock and pte_unmap_unlock, which
      encapsulate the mapping+locking and unlocking+unmapping together, and in the
      end may use alternatives to the mm page_table_lock itself.
      These callers all hold mmap_sem (some exclusively, some not), so at no level
      can a page table be whipped away from beneath them; and pte_alloc uses the
      "atomic" pmd_present to test whether it needs to allocate.  It appears that on
      all arches we can safely descend without page_table_lock.
      Signed-off-by: default avatarHugh Dickins <hugh@veritas.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>