1. 09 May, 2007 1 commit
    • Roman Zippel's avatar
      rename thread_info to stack · f7e4217b
      Roman Zippel authored
      This finally renames the thread_info field in task structure to stack, so that
      the assumptions about this field are gone and archs have more freedom about
      placing the thread_info structure.
      Nonbroken archs which have a proper thread pointer can do the access to both
      current thread and task structure via a single pointer.
      It'll allow for a few more cleanups of the fork code, from which e.g.  ia64
      could benefit.
      Signed-off-by: default avatarRoman Zippel <zippel@linux-m68k.org>
      [akpm@linux-foundation.org: build fix]
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Ian Molton <spyro@f2s.com>
      Cc: Haavard Skinnemoen <hskinnemoen@atmel.com>
      Cc: Mikael Starvik <starvik@axis.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Hirokazu Takata <takata@linux-m32r.org>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Roman Zippel <zippel@linux-m68k.org>
      Cc: Greg Ungerer <gerg@uclinux.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
      Cc: Richard Curnow <rc@rc0.org.uk>
      Cc: William Lee Irwin III <wli@holomorphy.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
      Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp>
      Cc: Andi Kleen <ak@muc.de>
      Cc: Chris Zankel <chris@zankel.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  2. 08 May, 2007 2 commits
  3. 07 May, 2007 1 commit
    • Christoph Lameter's avatar
      slab allocators: Remove SLAB_DEBUG_INITIAL flag · 50953fe9
      Christoph Lameter authored
      I have never seen a use of SLAB_DEBUG_INITIAL.  It is only supported by
      I think its purpose was to have a callback after an object has been freed
      to verify that the state is the constructor state again?  The callback is
      performed before each freeing of an object.
      I would think that it is much easier to check the object state manually
      before the free.  That also places the check near the code object
      manipulation of the object.
      Also the SLAB_DEBUG_INITIAL callback is only performed if the kernel was
      compiled with SLAB debugging on.  If there would be code in a constructor
      handling SLAB_DEBUG_INITIAL then it would have to be conditional on
      SLAB_DEBUG otherwise it would just be dead code.  But there is no such code
      in the kernel.  I think SLUB_DEBUG_INITIAL is too problematic to make real
      use of, difficult to understand and there are easier ways to accomplish the
      same effect (i.e.  add debug code before kfree).
      There is a related flag SLAB_CTOR_VERIFY that is frequently checked to be
      clear in fs inode caches.  Remove the pointless checks (they would even be
      pointless without removeal of SLAB_DEBUG_INITIAL) from the fs constructors.
      This is the last slab flag that SLUB did not support.  Remove the check for
      unimplemented flags from SLUB.
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  4. 02 May, 2007 1 commit
    • Jeremy Fitzhardinge's avatar
      [PATCH] x86: PARAVIRT: add hooks to intercept mm creation and destruction · d6dd61c8
      Jeremy Fitzhardinge authored
      Add hooks to allow a paravirt implementation to track the lifetime of
      an mm.  Paravirtualization requires three hooks, but only two are
      needed in common code.  They are:
      arch_dup_mmap, which is called when a new mmap is created at fork
      arch_exit_mmap, which is called when the last process reference to an
        mm is dropped, which typically happens on exit and exec.
      The third hook is activate_mm, which is called from the arch-specific
      activate_mm() macro/function, and so doesn't need stub versions for
      other architectures.  It's called when an mm is first used.
      Signed-off-by: default avatarJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Cc: linux-arch@vger.kernel.org
      Cc: James Bottomley <James.Bottomley@SteelEye.com>
      Acked-by: default avatarIngo Molnar <mingo@elte.hu>
  5. 17 Mar, 2007 1 commit
  6. 16 Feb, 2007 1 commit
  7. 12 Feb, 2007 1 commit
  8. 11 Feb, 2007 1 commit
  9. 02 Feb, 2007 1 commit
  10. 30 Jan, 2007 2 commits
  11. 13 Dec, 2006 1 commit
  12. 10 Dec, 2006 4 commits
    • Vadim Lobanov's avatar
      [PATCH] fdtable: Remove the free_files field · 4fd45812
      Vadim Lobanov authored
      An fdtable can either be embedded inside a files_struct or standalone (after
      being expanded).  When an fdtable is being discarded after all RCU references
      to it have expired, we must either free it directly, in the standalone case,
      or free the files_struct it is contained within, in the embedded case.
      Currently the free_files field controls this behavior, but we can get rid of
      it entirely, as all the necessary information is already recorded.  We can
      distinguish embedded and standalone fdtables using max_fds, and if it is
      embedded we can divine the relevant files_struct using container_of().
      Signed-off-by: default avatarVadim Lobanov <vlobanov@speakeasy.net>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Dipankar Sarma <dipankar@in.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
    • Vadim Lobanov's avatar
      [PATCH] fdtable: Make fdarray and fdsets equal in size · bbea9f69
      Vadim Lobanov authored
      Currently, each fdtable supports three dynamically-sized arrays of data: the
      fdarray and two fdsets.  The code allows the number of fds supported by the
      fdarray (fdtable->max_fds) to differ from the number of fds supported by each
      of the fdsets (fdtable->max_fdset).
      In practice, it is wasteful for these two sizes to differ: whenever we hit a
      limit on the smaller-capacity structure, we will reallocate the entire fdtable
      and all the dynamic arrays within it, so any delta in the memory used by the
      larger-capacity structure will never be touched at all.
      Rather than hogging this excess, we shouldn't even allocate it in the first
      place, and keep the capacities of the fdarray and the fdsets equal.  This
      patch removes fdtable->max_fdset.  As an added bonus, most of the supporting
      code becomes simpler.
      Signed-off-by: default avatarVadim Lobanov <vlobanov@speakeasy.net>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Dipankar Sarma <dipankar@in.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
    • Vadim Lobanov's avatar
      [PATCH] fdtable: Delete pointless code in dup_fd() · f3d19c90
      Vadim Lobanov authored
      The dup_fd() function creates a new files_struct and fdtable embedded inside
      that files_struct, and then possibly expands the fdtable using expand_files().
      The out_release error path is invoked when expand_files() returns an error
      code.  However, when this attempt to expand fails, the fdtable is left in its
      original embedded form, so it is pointless to try to free the associated
      fdarray and fdsets.
      Signed-off-by: default avatarVadim Lobanov <vlobanov@speakeasy.net>
      Cc: Dipankar Sarma <dipankar@in.ibm.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
    • Andrew Morton's avatar
      [PATCH] io-accounting: core statistics · 7c3ab738
      Andrew Morton authored
      The present per-task IO accounting isn't very useful.  It simply counts the
      number of bytes passed into read() and write().  So if a process reads 1MB
      from an already-cached file, it is accused of having performed 1MB of I/O,
      which is wrong.
      (David Wright had some comments on the applicability of the present logical IO accounting:
        For billing purposes it is useless but for workload analysis it is very
        read_bytes/read_calls  average read request size
        write_bytes/write_calls average write request size
        read_bytes/read_blocks ie logical/physical can indicate hit rate or thrashing
        write_bytes/write_blocks  ie logical/physical  guess since pdflush writes can
                                                      be missed
        I often look for logical larger than physical to see filesystem cache
        problems.  And the bytes/cpusec can help find applications that are
        dominating the cache and causing slow interactive response from page cache
        I want to find the IO intensive applications and make sure they are doing
        efficient IO.  Thus the acctcms(sysV) or csacms command would give the high
        IO commands).
      This patchset adds new accounting which tries to be more accurate.  We account
      for three things:
        attempt to count the number of bytes which this process really did cause
        to be fetched from the storage layer.  Done at the submit_bio() level, so it
        is accurate for block-backed filesystems.  I also attempt to wire up NFS and
        attempt to count the number of bytes which this process caused to be sent
        to the storage layer.  This is done at page-dirtying time.
        The big inaccuracy here is truncate.  If a process writes 1MB to a file
        and then deletes the file, it will in fact perform no writeout.  But it will
        have been accounted as having caused 1MB of write.
        account the number of bytes which this process caused to not happen, by
        truncating pagecache.
        We _could_ just subtract this from the process's `write' accounting.  But
        that means that some processes would be reported to have done negative
        amounts of write IO, which is silly.
        So we just report the raw number and punt this decision up to userspace.
      Now, we _could_ account for writes at the physical I/O level.  But
      - This would require that we track memory-dirtying tasks at the per-page
        level (would require a new pointer in struct page).
      - It would mean that IO statistics for a process are usually only available
        long after that process has exitted.  Which means that we probably cannot
        communicate this info via taskstats.
      This patch:
      Wire up the kernel-private data structures and the accessor functions to
      manipulate them.
      Cc: Jay Lan <jlan@sgi.com>
      Cc: Shailabh Nagar <nagar@watson.ibm.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Chris Sturtivant <csturtiv@sgi.com>
      Cc: Tony Ernst <tee@sgi.com>
      Cc: Guillaume Thouvenin <guillaume.thouvenin@bull.net>
      Cc: David Wright <daw@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
  13. 08 Dec, 2006 5 commits
  14. 07 Dec, 2006 6 commits
    • Oleg Nesterov's avatar
      [PATCH] taskstats: cleanup ->signal->stats allocation · 34ec1234
      Oleg Nesterov authored
      Allocate ->signal->stats on demand in taskstats_exit(), this allows us to
      remove taskstats_tgid_alloc() (the last non-trivial inline) from taskstat's
      public interface.
      Signed-off-by: default avatarOleg Nesterov <oleg@tv-sign.ru>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Shailabh Nagar <nagar@watson.ibm.com>
      Cc: Jay Lan <jlan@engr.sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
    • Roland McGrath's avatar
      [PATCH] Disable CLONE_CHILD_CLEARTID for abnormal exit · fec1d011
      Roland McGrath authored
      The CLONE_CHILD_CLEARTID flag is used by NPTL to have its threads
      communicate via memory/futex when they exit, so pthread_join can
      synchronize using a simple futex wait.  The word of user memory where NPTL
      stores a thread's own TID is what it passes; this gets reset to zero at
      thread exit.
      It is not desireable to touch this user memory when threads are dying due
      to a fatal signal.  A core dump is more usefully representative of the
      dying program state if the threads live at the time of the crash have their
      NPTL data structures unperturbed.  The userland expectation of
      CLONE_CHILD_CLEARTID has only ever been that it works for a thread making
      an _exit system call.
      This problem was identified by Ernie Petrides <petrides@redhat.com>.
      Signed-off-by: default avatarRoland McGrath <roland@redhat.com>
      Cc: Ernie Petrides <petrides@redhat.com>
      Cc: Jakub Jelinek <jakub@redhat.com>
      Acked-by: default avatarIngo Molnar <mingo@elte.hu>
      Cc: Ulrich Drepper <drepper@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
    • Christoph Lameter's avatar
      [PATCH] slab: remove kmem_cache_t · e18b890b
      Christoph Lameter authored
      Replace all uses of kmem_cache_t with struct kmem_cache.
      The patch was generated using the following script:
      	# Replace one string by another in all the kernel sources.
      	set -e
      	for file in `find * -name "*.c" -o -name "*.h"|xargs grep -l $1`; do
      		quilt add $file
      		sed -e "1,\$s/$1/$2/g" $file >/tmp/$$
      		mv /tmp/$$ $file
      		quilt refresh
      The script was run like this
      	sh replace kmem_cache_t "struct kmem_cache"
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
    • Christoph Lameter's avatar
      [PATCH] slab: remove SLAB_KERNEL · e94b1766
      Christoph Lameter authored
      SLAB_KERNEL is an alias of GFP_KERNEL.
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
    • Ashwin Chaugule's avatar
      [PATCH] new scheme to preempt swap token · 7602bdf2
      Ashwin Chaugule authored
      The new swap token patches replace the current token traversal algo.  The old
      algo had a crude timeout parameter that was used to handover the token from
      one task to another.  This algo, transfers the token to the tasks that are in
      need of the token.  The urgency for the token is based on the number of times
      a task is required to swap-in pages.  Accordingly, the priority of a task is
      incremented if it has been badly affected due to swap-outs.  To ensure that
      the token doesnt bounce around rapidly, the token holders are given a priority
      boost.  The priority of tasks is also decremented, if their rate of swap-in's
      keeps reducing.  This way, the condition to check whether to pre-empt the swap
      token, is a matter of comparing two task's priority fields.
      [akpm@osdl.org: cleanups]
      Signed-off-by: default avatarAshwin Chaugule <ashwin.chaugule@celunite.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
    • Jeremy Fitzhardinge's avatar
      [PATCH] i386: Use %gs as the PDA base-segment in the kernel · f95d47ca
      Jeremy Fitzhardinge authored
      This patch is the meat of the PDA change.  This patch makes several related
      1: Most significantly, %gs is now used in the kernel.  This means that on
         entry, the old value of %gs is saved away, and it is reloaded with
      2: entry.S constructs the stack in the shape of struct pt_regs, and this
         is passed around the kernel so that the process's saved register
         state can be accessed.
         Unfortunately struct pt_regs doesn't currently have space for %gs
         (or %fs). This patch extends pt_regs to add space for gs (no space
         is allocated for %fs, since it won't be used, and it would just
         complicate the code in entry.S to work around the space).
      3: Because %gs is now saved on the stack like %ds, %es and the integer
         registers, there are a number of places where it no longer needs to
         be handled specially; namely context switch, and saving/restoring the
         register state in a signal context.
      4: And since kernel threads run in kernel space and call normal kernel
         code, they need to be created with their %gs == __KERNEL_PDA.
      Signed-off-by: default avatarJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Cc: Chuck Ebbert <76306.1226@compuserve.com>
      Cc: Zachary Amsden <zach@vmware.com>
      Cc: Jan Beulich <jbeulich@novell.com>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
  15. 25 Nov, 2006 1 commit
  16. 14 Nov, 2006 1 commit
    • Linus Torvalds's avatar
      Revert "[PATCH] fix Data Acess error in dup_fd" · 9a3a04ac
      Linus Torvalds authored
      This reverts commit 0130b0b3.
      Sergey Vlasov points out (and Vadim Lobanov concurs) that the bug it was
      supposed to fix must be some unrelated memory corruption, and the "fix"
      actually causes more problems:
        "However, the new code does not look safe in all cases.  If some other
         task has opened more files while dup_fd() released oldf->file_lock, the
         new code will update open_files to the new larger value.  But newf was
         allocated with the old smaller value of open_files, therefore subsequent
         accesses to newf may try to write into unallocated memory."
      so revert it.
      Cc: Sharyathi Nagesh <sharyath@in.ibm.com>
      Cc: Sergey Vlasov <vsu@altlinux.ru>
      Cc: Vadim Lobanov <vlobanov@speakeasy.net>
      Cc: Andrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
  17. 13 Nov, 2006 1 commit
    • Sharyathi Nagesh's avatar
      [PATCH] fix Data Acess error in dup_fd · 0130b0b3
      Sharyathi Nagesh authored
      On running the Stress Test on machine for more than 72 hours following
      error message was observed.
      0:mon> e
      cpu 0x0: Vector: 300 (Data Access) at [c00000007ce2f7f0]
          pc: c000000000060d90: .dup_fd+0x240/0x39c
          lr: c000000000060d6c: .dup_fd+0x21c/0x39c
          sp: c00000007ce2fa70
         msr: 800000000000b032
         dar: ffffffff00000028
       dsisr: 40000000
        current = 0xc000000074950980
        paca    = 0xc000000000454500
          pid   = 27330, comm = bash
      0:mon> t
      [c00000007ce2fa70] c000000000060d28 .dup_fd+0x1d8/0x39c (unreliable)
      [c00000007ce2fb30] c000000000060f48 .copy_files+0x5c/0x88
      [c00000007ce2fbd0] c000000000061f5c .copy_process+0x574/0x1520
      [c00000007ce2fcd0] c000000000062f88 .do_fork+0x80/0x1c4
      [c00000007ce2fdc0] c000000000011790 .sys_clone+0x5c/0x74
      [c00000007ce2fe30] c000000000008950 .ppc_clone+0x8/0xc
      The problem is because of race window.  When if(expand) block is executed in
      dup_fd unlocking of oldf->file_lock give a window for fdtable in oldf to be
      modified.  So actual open_files in oldf may not match with open_files
      Cc: Vadim Lobanov <vlobanov@speakeasy.net>
      Cc: <stable@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
  18. 28 Oct, 2006 2 commits
  19. 17 Oct, 2006 1 commit
    • Peter Zijlstra's avatar
      [PATCH] rt-mutex: fixup rt-mutex debug code · bea493a0
      Peter Zijlstra authored
      BUG: warning at kernel/rtmutex-debug.c:125/rt_mutex_debug_task_free() (Not tainted)
       [<c04051e3>] show_trace_log_lvl+0x58/0x16a
       [<c04057f0>] show_trace+0xd/0x10
       [<c0405900>] dump_stack+0x19/0x1b
       [<c043f03d>] rt_mutex_debug_task_free+0x35/0x6a
       [<c04224c0>] free_task+0x15/0x24
       [<c042378c>] copy_process+0x12bd/0x1324
       [<c0423835>] do_fork+0x42/0x113
       [<c04021dd>] sys_fork+0x19/0x1b
       [<c0403fb7>] syscall_call+0x7/0xb
      In copy_process(), dup_task_struct() also duplicates the ->pi_lock,
      ->pi_waiters and ->pi_blocked_on members.  rt_mutex_debug_task_free()
      called from free_task() validates these members.  However free_task() can
      be invoked before these members are reset for the new task.
      Move the initialization code before the first bail that can hit free_task().
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
  20. 02 Oct, 2006 6 commits