1. 23 Mar, 2006 2 commits
    • Jens Axboe's avatar
    • Eric Dumazet's avatar
      [PATCH] Shrinks sizeof(files_struct) and better layout · 0c9e63fd
      Eric Dumazet authored
      
      
      1) Reduce the size of (struct fdtable) to exactly 64 bytes on 32bits
         platforms, lowering kmalloc() allocated space by 50%.
      
      2) Reduce the size of (files_struct), using a special 32 bits (or
         64bits) embedded_fd_set, instead of a 1024 bits fd_set for the
         close_on_exec_init and open_fds_init fields.  This save some ram (248
         bytes per task) as most tasks dont open more than 32 files.  D-Cache
         footprint for such tasks is also reduced to the minimum.
      
      3) Reduce size of allocated fdset.  Currently two full pages are
         allocated, that is 32768 bits on x86 for example, and way too much.  The
         minimum is now L1_CACHE_BYTES.
      
      UP and SMP should benefit from this patch, because most tasks will touch
      only one cache line when open()/close() stdin/stdout/stderr (0/1/2),
      (next_fd, close_on_exec_init, open_fds_init, fd_array[0 ..  2] being in the
      same cache line)
      Signed-off-by: default avatarEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      0c9e63fd
  2. 22 Mar, 2006 1 commit
  3. 18 Mar, 2006 1 commit
  4. 17 Mar, 2006 1 commit
  5. 14 Mar, 2006 1 commit
    • GOTO Masanori's avatar
      [PATCH] Fix sigaltstack corruption among cloned threads · f9a3879a
      GOTO Masanori authored
      
      
      This patch fixes alternate signal stack corruption among cloned threads
      with CLONE_SIGHAND (and CLONE_VM) for linux-2.6.16-rc6.
      
      The value of alternate signal stack is currently inherited after a call of
      clone(...  CLONE_SIGHAND | CLONE_VM).  But if sigaltstack is set by a
      parent thread, and then if multiple cloned child threads (+ parent threads)
      call signal handler at the same time, some threads may be conflicted -
      because they share to use the same alternative signal stack region.
      Finally they get sigsegv.  It's an undesirable race condition.  Note that
      child threads created from NPTL pthread_create() also hit this conflict
      when the parent thread uses sigaltstack, without my patch.
      
      To fix this problem, this patch clears the child threads' sigaltstack
      information like exec().  This behavior follows the SUSv3 specification.
      In SUSv3, pthread_create() says "The alternate stack shall not be inherited
      (when new threads are initialized)".  It means that sigaltstack should be
      cleared when sigaltstack memory space is shared by cloned threads with
      CLONE_SIGHAND.
      
      Note that I chose "if (clone_flags & CLONE_SIGHAND)" line because:
        - If clone_flags line is not existed, fork() does not inherit sigaltstack.
        - CLONE_VM is another choice, but vfork() does not inherit sigaltstack.
        - CLONE_SIGHAND implies CLONE_VM, and it looks suitable.
        - CLONE_THREAD is another candidate, and includes CLONE_SIGHAND + CLONE_VM,
          but this flag has a bit different semantics.
      I decided to use CLONE_SIGHAND.
      
      [ Changed to test for CLONE_VM && !CLONE_VFORK after discussion --Linus ]
      Signed-off-by: default avatarGOTO Masanori <gotom@sanori.org>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Acked-by: default avatarLinus Torvalds <torvalds@osdl.org>
      Cc: Ulrich Drepper <drepper@redhat.com>
      Cc: Jakub Jelinek <jakub@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      f9a3879a
  6. 11 Mar, 2006 1 commit
  7. 15 Feb, 2006 2 commits
    • Oleg Nesterov's avatar
      [PATCH] fix kill_proc_info() vs fork() theoretical race · dadac81b
      Oleg Nesterov authored
      
      
      copy_process:
      
      	attach_pid(p, PIDTYPE_PID, p->pid);
      	attach_pid(p, PIDTYPE_TGID, p->tgid);
      
      What if kill_proc_info(p->pid) happens in between?
      
      copy_process() holds current->sighand.siglock, so we are safe
      in CLONE_THREAD case, because current->sighand == p->sighand.
      
      Otherwise, p->sighand is unlocked, the new process is already
      visible to the find_task_by_pid(), but have a copy of parent's
      'struct pid' in ->pids[PIDTYPE_TGID].
      
      This means that __group_complete_signal() may hang while doing
      
      	do ... while (next_thread() != p)
      
      We can solve this problem if we reverse these 2 attach_pid()s:
      
      	attach_pid() does wmb()
      
      	group_send_sig_info() calls spin_lock(), which
      	provides a read barrier. // Yes ?
      
      I don't think we can hit this race in practice, but still.
      Signed-off-by: default avatarOleg Nesterov <oleg@tv-sign.ru>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      dadac81b
    • Oleg Nesterov's avatar
      [PATCH] fix kill_proc_info() vs CLONE_THREAD race · 3f17da69
      Oleg Nesterov authored
      
      
      There is a window after copy_process() unlocks ->sighand.siglock
      and before it adds the new thread to the thread list.
      
      In that window __group_complete_signal(SIGKILL) will not see the
      new thread yet, so this thread will start running while the whole
      thread group was supposed to exit.
      
      I beleive we have another good reason to place attach_pid(PID/TGID)
      under ->sighand.siglock. We can do the same for
      
      	release_task()->__unhash_process()
      
      	de_thread()->switch_exec_pids()
      
      After that we don't need tasklist_lock to iterate over the thread
      list, and we can simplify things, see for example do_sigaction()
      or sys_times().
      Signed-off-by: default avatarOleg Nesterov <oleg@tv-sign.ru>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      3f17da69
  8. 08 Feb, 2006 5 commits
  9. 01 Feb, 2006 1 commit
  10. 12 Jan, 2006 2 commits
  11. 10 Jan, 2006 1 commit
  12. 09 Jan, 2006 5 commits
  13. 28 Nov, 2005 2 commits
  14. 22 Nov, 2005 1 commit
  15. 14 Nov, 2005 3 commits
  16. 07 Nov, 2005 2 commits
    • Hugh Dickins's avatar
      [SPARC64] mm: context switch ptlock · dedeb002
      Hugh Dickins authored
      
      
      sparc64 is unique among architectures in taking the page_table_lock in
      its context switch (well, cris does too, but erroneously, and it's not
      yet SMP anyway).
      
      This seems to be a private affair between switch_mm and activate_mm,
      using page_table_lock as a per-mm lock, without any relation to its uses
      elsewhere.  That's fine, but comment it as such; and unlock sooner in
      switch_mm, more like in activate_mm (preemption is disabled here).
      
      There is a block of "if (0)"ed code in smp_flush_tlb_pending which would
      have liked to rely on the page_table_lock, in switch_mm and elsewhere;
      but its comment explains how dup_mmap's flush_tlb_mm defeated it.  And
      though that could have been changed at any time over the past few years,
      now the chance vanishes as we push the page_table_lock downwards, and
      perhaps split it per page table page.  Just delete that block of code.
      
      Which leaves the mysterious spin_unlock_wait(&oldmm->page_table_lock)
      in kernel/fork.c copy_mm.  Textual analysis (supported by Nick Piggin)
      suggests that the comment was written by DaveM, and that it relates to
      the defeated approach in the sparc64 smp_flush_tlb_pending.  Just delete
      this block too.
      Signed-off-by: default avatarHugh Dickins <hugh@veritas.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dedeb002
    • Matt Helsley's avatar
      [PATCH] Process Events Connector · 9f46080c
      Matt Helsley authored
      
      
      This patch adds a connector that reports fork, exec, id change, and exit
      events for all processes to userspace.  It replaces the fork_advisor patch
      that ELSA is currently using.  Applications that may find these events
      useful include accounting/auditing (e.g.  ELSA), system activity monitoring
      (e.g.  top), security, and resource management (e.g.  CKRM).
      Signed-off-by: default avatarMatt Helsley <matthltc@us.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      9f46080c
  17. 30 Oct, 2005 6 commits
    • Hugh Dickins's avatar
      [PATCH] mm: ptd_alloc take ptlock · c74df32c
      Hugh Dickins authored
      
      
      Second step in pushing down the page_table_lock.  Remove the temporary
      bridging hack from __pud_alloc, __pmd_alloc, __pte_alloc: expect callers not
      to hold page_table_lock, whether it's on init_mm or a user mm; take
      page_table_lock internally to check if a racing task already allocated.
      
      Convert their callers from common code.  But avoid coming back to change them
      again later: instead of moving the spin_lock(&mm->page_table_lock) down,
      switch over to new macros pte_alloc_map_lock and pte_unmap_unlock, which
      encapsulate the mapping+locking and unlocking+unmapping together, and in the
      end may use alternatives to the mm page_table_lock itself.
      
      These callers all hold mmap_sem (some exclusively, some not), so at no level
      can a page table be whipped away from beneath them; and pte_alloc uses the
      "atomic" pmd_present to test whether it needs to allocate.  It appears that on
      all arches we can safely descend without page_table_lock.
      Signed-off-by: default avatarHugh Dickins <hugh@veritas.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      c74df32c
    • Hugh Dickins's avatar
      [PATCH] mm: dup_mmap down new mmap_sem · 7ee78232
      Hugh Dickins authored
      
      
      One anomaly remains from when Andrea rationalized the responsibilities of
      mmap_sem and page_table_lock: in dup_mmap we add vmas to the child holding its
      page_table_lock, but not the mmap_sem which normally guards the vma list and
      rbtree.  Which could be an issue for unuse_mm: though since it just walks down
      the list (today with page_table_lock, tomorrow not), it's probably okay.  Will
      need a memory barrier?  Oh, keep it simple, Nick and I agreed, no harm in
      taking child's mmap_sem here.
      Signed-off-by: default avatarHugh Dickins <hugh@veritas.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      7ee78232
    • Hugh Dickins's avatar
      [PATCH] mm: dup_mmap use oldmm more · fd3e42fc
      Hugh Dickins authored
      
      
      Use the parent's oldmm throughout dup_mmap, instead of perversely going back
      to current->mm.  (Can you hear the sigh of relief from those mpnts?  Usually I
      squash them, but not today.)
      Signed-off-by: default avatarHugh Dickins <hugh@veritas.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      fd3e42fc
    • Hugh Dickins's avatar
      [PATCH] mm: rss = file_rss + anon_rss · 4294621f
      Hugh Dickins authored
      
      
      I was lazy when we added anon_rss, and chose to change as few places as
      possible.  So currently each anonymous page has to be counted twice, in rss
      and in anon_rss.  Which won't be so good if those are atomic counts in some
      configurations.
      
      Change that around: keep file_rss and anon_rss separately, and add them
      together (with get_mm_rss macro) when the total is needed - reading two
      atomics is much cheaper than updating two atomics.  And update anon_rss
      upfront, typically in memory.c, not tucked away in page_add_anon_rmap.
      Signed-off-by: default avatarHugh Dickins <hugh@veritas.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      4294621f
    • Hugh Dickins's avatar
      [PATCH] mm: mm_init set_mm_counters · 404351e6
      Hugh Dickins authored
      
      
      How is anon_rss initialized?  In dup_mmap, and by mm_alloc's memset; but
      that's not so good if an mm_counter_t is a special type.  And how is rss
      initialized?  By set_mm_counter, all over the place.  Come on, we just need to
      initialize them both at once by set_mm_counter in mm_init (which follows the
      memcpy when forking).
      Signed-off-by: default avatarHugh Dickins <hugh@veritas.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      404351e6
    • Hugh Dickins's avatar
      [PATCH] mm: vm_stat_account unshackled · ab50b8ed
      Hugh Dickins authored
      
      
      The original vm_stat_account has fallen into disuse, with only one user, and
      only one user of vm_stat_unaccount.  It's easier to keep track if we convert
      them all to __vm_stat_account, then free it from its __shackles.
      Signed-off-by: default avatarHugh Dickins <hugh@veritas.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      ab50b8ed
  18. 20 Oct, 2005 1 commit
  19. 17 Sep, 2005 1 commit
  20. 09 Sep, 2005 1 commit
    • Dipankar Sarma's avatar
      [PATCH] files: files struct with RCU · ab2af1f5
      Dipankar Sarma authored
      
      
      Patch to eliminate struct files_struct.file_lock spinlock on the reader side
      and use rcu refcounting rcuref_xxx api for the f_count refcounter.  The
      updates to the fdtable are done by allocating a new fdtable structure and
      setting files->fdt to point to the new structure.  The fdtable structure is
      protected by RCU thereby allowing lock-free lookup.  For fd arrays/sets that
      are vmalloced, we use keventd to free them since RCU callbacks can't sleep.  A
      global list of fdtable to be freed is not scalable, so we use a per-cpu list.
      If keventd is already handling the current cpu's work, we use a timer to defer
      queueing of that work.
      
      Since the last publication, this patch has been re-written to avoid using
      explicit memory barriers and use rcu_assign_pointer(), rcu_dereference()
      premitives instead.  This required that the fd information is kept in a
      separate structure (fdtable) and updated atomically.
      Signed-off-by: default avatarDipankar Sarma <dipankar@in.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      ab2af1f5