1. 11 Feb, 2007 1 commit
  2. 02 Feb, 2007 1 commit
  3. 30 Jan, 2007 2 commits
  4. 13 Dec, 2006 1 commit
  5. 10 Dec, 2006 4 commits
    • Vadim Lobanov's avatar
      [PATCH] fdtable: Remove the free_files field · 4fd45812
      Vadim Lobanov authored
      
      
      An fdtable can either be embedded inside a files_struct or standalone (after
      being expanded).  When an fdtable is being discarded after all RCU references
      to it have expired, we must either free it directly, in the standalone case,
      or free the files_struct it is contained within, in the embedded case.
      
      Currently the free_files field controls this behavior, but we can get rid of
      it entirely, as all the necessary information is already recorded.  We can
      distinguish embedded and standalone fdtables using max_fds, and if it is
      embedded we can divine the relevant files_struct using container_of().
      Signed-off-by: default avatarVadim Lobanov <vlobanov@speakeasy.net>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Dipankar Sarma <dipankar@in.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      4fd45812
    • Vadim Lobanov's avatar
      [PATCH] fdtable: Make fdarray and fdsets equal in size · bbea9f69
      Vadim Lobanov authored
      
      
      Currently, each fdtable supports three dynamically-sized arrays of data: the
      fdarray and two fdsets.  The code allows the number of fds supported by the
      fdarray (fdtable->max_fds) to differ from the number of fds supported by each
      of the fdsets (fdtable->max_fdset).
      
      In practice, it is wasteful for these two sizes to differ: whenever we hit a
      limit on the smaller-capacity structure, we will reallocate the entire fdtable
      and all the dynamic arrays within it, so any delta in the memory used by the
      larger-capacity structure will never be touched at all.
      
      Rather than hogging this excess, we shouldn't even allocate it in the first
      place, and keep the capacities of the fdarray and the fdsets equal.  This
      patch removes fdtable->max_fdset.  As an added bonus, most of the supporting
      code becomes simpler.
      Signed-off-by: default avatarVadim Lobanov <vlobanov@speakeasy.net>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Dipankar Sarma <dipankar@in.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      bbea9f69
    • Vadim Lobanov's avatar
      [PATCH] fdtable: Delete pointless code in dup_fd() · f3d19c90
      Vadim Lobanov authored
      
      
      The dup_fd() function creates a new files_struct and fdtable embedded inside
      that files_struct, and then possibly expands the fdtable using expand_files().
      
      The out_release error path is invoked when expand_files() returns an error
      code.  However, when this attempt to expand fails, the fdtable is left in its
      original embedded form, so it is pointless to try to free the associated
      fdarray and fdsets.
      Signed-off-by: default avatarVadim Lobanov <vlobanov@speakeasy.net>
      Cc: Dipankar Sarma <dipankar@in.ibm.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      f3d19c90
    • Andrew Morton's avatar
      [PATCH] io-accounting: core statistics · 7c3ab738
      Andrew Morton authored
      
      
      The present per-task IO accounting isn't very useful.  It simply counts the
      number of bytes passed into read() and write().  So if a process reads 1MB
      from an already-cached file, it is accused of having performed 1MB of I/O,
      which is wrong.
      
      (David Wright had some comments on the applicability of the present logical IO accounting:
      
        For billing purposes it is useless but for workload analysis it is very
        useful
      
        read_bytes/read_calls  average read request size
        write_bytes/write_calls average write request size
      
        read_bytes/read_blocks ie logical/physical can indicate hit rate or thrashing
        write_bytes/write_blocks  ie logical/physical  guess since pdflush writes can
                                                      be missed
      
        I often look for logical larger than physical to see filesystem cache
        problems.  And the bytes/cpusec can help find applications that are
        dominating the cache and causing slow interactive response from page cache
        contention.
      
        I want to find the IO intensive applications and make sure they are doing
        efficient IO.  Thus the acctcms(sysV) or csacms command would give the high
        IO commands).
      
      This patchset adds new accounting which tries to be more accurate.  We account
      for three things:
      
      reads:
      
        attempt to count the number of bytes which this process really did cause
        to be fetched from the storage layer.  Done at the submit_bio() level, so it
        is accurate for block-backed filesystems.  I also attempt to wire up NFS and
        CIFS.
      
      writes:
      
        attempt to count the number of bytes which this process caused to be sent
        to the storage layer.  This is done at page-dirtying time.
      
        The big inaccuracy here is truncate.  If a process writes 1MB to a file
        and then deletes the file, it will in fact perform no writeout.  But it will
        have been accounted as having caused 1MB of write.
      
        So...
      
      cancelled_writes:
      
        account the number of bytes which this process caused to not happen, by
        truncating pagecache.
      
        We _could_ just subtract this from the process's `write' accounting.  But
        that means that some processes would be reported to have done negative
        amounts of write IO, which is silly.
      
        So we just report the raw number and punt this decision up to userspace.
      
      Now, we _could_ account for writes at the physical I/O level.  But
      
      - This would require that we track memory-dirtying tasks at the per-page
        level (would require a new pointer in struct page).
      
      - It would mean that IO statistics for a process are usually only available
        long after that process has exitted.  Which means that we probably cannot
        communicate this info via taskstats.
      
      This patch:
      
      Wire up the kernel-private data structures and the accessor functions to
      manipulate them.
      
      Cc: Jay Lan <jlan@sgi.com>
      Cc: Shailabh Nagar <nagar@watson.ibm.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Chris Sturtivant <csturtiv@sgi.com>
      Cc: Tony Ernst <tee@sgi.com>
      Cc: Guillaume Thouvenin <guillaume.thouvenin@bull.net>
      Cc: David Wright <daw@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      7c3ab738
  6. 08 Dec, 2006 5 commits
  7. 07 Dec, 2006 6 commits
    • Oleg Nesterov's avatar
      [PATCH] taskstats: cleanup ->signal->stats allocation · 34ec1234
      Oleg Nesterov authored
      
      
      Allocate ->signal->stats on demand in taskstats_exit(), this allows us to
      remove taskstats_tgid_alloc() (the last non-trivial inline) from taskstat's
      public interface.
      Signed-off-by: default avatarOleg Nesterov <oleg@tv-sign.ru>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Shailabh Nagar <nagar@watson.ibm.com>
      Cc: Jay Lan <jlan@engr.sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      34ec1234
    • Roland McGrath's avatar
      [PATCH] Disable CLONE_CHILD_CLEARTID for abnormal exit · fec1d011
      Roland McGrath authored
      
      
      The CLONE_CHILD_CLEARTID flag is used by NPTL to have its threads
      communicate via memory/futex when they exit, so pthread_join can
      synchronize using a simple futex wait.  The word of user memory where NPTL
      stores a thread's own TID is what it passes; this gets reset to zero at
      thread exit.
      
      It is not desireable to touch this user memory when threads are dying due
      to a fatal signal.  A core dump is more usefully representative of the
      dying program state if the threads live at the time of the crash have their
      NPTL data structures unperturbed.  The userland expectation of
      CLONE_CHILD_CLEARTID has only ever been that it works for a thread making
      an _exit system call.
      
      This problem was identified by Ernie Petrides <petrides@redhat.com>.
      Signed-off-by: default avatarRoland McGrath <roland@redhat.com>
      Cc: Ernie Petrides <petrides@redhat.com>
      Cc: Jakub Jelinek <jakub@redhat.com>
      Acked-by: default avatarIngo Molnar <mingo@elte.hu>
      Cc: Ulrich Drepper <drepper@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      fec1d011
    • Christoph Lameter's avatar
      [PATCH] slab: remove kmem_cache_t · e18b890b
      Christoph Lameter authored
      
      
      Replace all uses of kmem_cache_t with struct kmem_cache.
      
      The patch was generated using the following script:
      
      	#!/bin/sh
      	#
      	# Replace one string by another in all the kernel sources.
      	#
      
      	set -e
      
      	for file in `find * -name "*.c" -o -name "*.h"|xargs grep -l $1`; do
      		quilt add $file
      		sed -e "1,\$s/$1/$2/g" $file >/tmp/$$
      		mv /tmp/$$ $file
      		quilt refresh
      	done
      
      The script was run like this
      
      	sh replace kmem_cache_t "struct kmem_cache"
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      e18b890b
    • Christoph Lameter's avatar
      [PATCH] slab: remove SLAB_KERNEL · e94b1766
      Christoph Lameter authored
      
      
      SLAB_KERNEL is an alias of GFP_KERNEL.
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      e94b1766
    • Ashwin Chaugule's avatar
      [PATCH] new scheme to preempt swap token · 7602bdf2
      Ashwin Chaugule authored
      
      
      The new swap token patches replace the current token traversal algo.  The old
      algo had a crude timeout parameter that was used to handover the token from
      one task to another.  This algo, transfers the token to the tasks that are in
      need of the token.  The urgency for the token is based on the number of times
      a task is required to swap-in pages.  Accordingly, the priority of a task is
      incremented if it has been badly affected due to swap-outs.  To ensure that
      the token doesnt bounce around rapidly, the token holders are given a priority
      boost.  The priority of tasks is also decremented, if their rate of swap-in's
      keeps reducing.  This way, the condition to check whether to pre-empt the swap
      token, is a matter of comparing two task's priority fields.
      
      [akpm@osdl.org: cleanups]
      Signed-off-by: default avatarAshwin Chaugule <ashwin.chaugule@celunite.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      7602bdf2
    • Jeremy Fitzhardinge's avatar
      [PATCH] i386: Use %gs as the PDA base-segment in the kernel · f95d47ca
      Jeremy Fitzhardinge authored
      
      
      This patch is the meat of the PDA change.  This patch makes several related
      changes:
      
      1: Most significantly, %gs is now used in the kernel.  This means that on
         entry, the old value of %gs is saved away, and it is reloaded with
         __KERNEL_PDA.
      
      2: entry.S constructs the stack in the shape of struct pt_regs, and this
         is passed around the kernel so that the process's saved register
         state can be accessed.
      
         Unfortunately struct pt_regs doesn't currently have space for %gs
         (or %fs). This patch extends pt_regs to add space for gs (no space
         is allocated for %fs, since it won't be used, and it would just
         complicate the code in entry.S to work around the space).
      
      3: Because %gs is now saved on the stack like %ds, %es and the integer
         registers, there are a number of places where it no longer needs to
         be handled specially; namely context switch, and saving/restoring the
         register state in a signal context.
      
      4: And since kernel threads run in kernel space and call normal kernel
         code, they need to be created with their %gs == __KERNEL_PDA.
      Signed-off-by: default avatarJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Cc: Chuck Ebbert <76306.1226@compuserve.com>
      Cc: Zachary Amsden <zach@vmware.com>
      Cc: Jan Beulich <jbeulich@novell.com>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      f95d47ca
  8. 25 Nov, 2006 1 commit
  9. 14 Nov, 2006 1 commit
    • Linus Torvalds's avatar
      Revert "[PATCH] fix Data Acess error in dup_fd" · 9a3a04ac
      Linus Torvalds authored
      This reverts commit 0130b0b3
      
      .
      
      Sergey Vlasov points out (and Vadim Lobanov concurs) that the bug it was
      supposed to fix must be some unrelated memory corruption, and the "fix"
      actually causes more problems:
      
        "However, the new code does not look safe in all cases.  If some other
         task has opened more files while dup_fd() released oldf->file_lock, the
         new code will update open_files to the new larger value.  But newf was
         allocated with the old smaller value of open_files, therefore subsequent
         accesses to newf may try to write into unallocated memory."
      
      so revert it.
      
      Cc: Sharyathi Nagesh <sharyath@in.ibm.com>
      Cc: Sergey Vlasov <vsu@altlinux.ru>
      Cc: Vadim Lobanov <vlobanov@speakeasy.net>
      Cc: Andrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      9a3a04ac
  10. 13 Nov, 2006 1 commit
    • Sharyathi Nagesh's avatar
      [PATCH] fix Data Acess error in dup_fd · 0130b0b3
      Sharyathi Nagesh authored
      
      
      On running the Stress Test on machine for more than 72 hours following
      error message was observed.
      
      0:mon> e
      cpu 0x0: Vector: 300 (Data Access) at [c00000007ce2f7f0]
          pc: c000000000060d90: .dup_fd+0x240/0x39c
          lr: c000000000060d6c: .dup_fd+0x21c/0x39c
          sp: c00000007ce2fa70
         msr: 800000000000b032
         dar: ffffffff00000028
       dsisr: 40000000
        current = 0xc000000074950980
        paca    = 0xc000000000454500
          pid   = 27330, comm = bash
      
      0:mon> t
      [c00000007ce2fa70] c000000000060d28 .dup_fd+0x1d8/0x39c (unreliable)
      [c00000007ce2fb30] c000000000060f48 .copy_files+0x5c/0x88
      [c00000007ce2fbd0] c000000000061f5c .copy_process+0x574/0x1520
      [c00000007ce2fcd0] c000000000062f88 .do_fork+0x80/0x1c4
      [c00000007ce2fdc0] c000000000011790 .sys_clone+0x5c/0x74
      [c00000007ce2fe30] c000000000008950 .ppc_clone+0x8/0xc
      
      The problem is because of race window.  When if(expand) block is executed in
      dup_fd unlocking of oldf->file_lock give a window for fdtable in oldf to be
      modified.  So actual open_files in oldf may not match with open_files
      variable.
      
      Cc: Vadim Lobanov <vlobanov@speakeasy.net>
      Cc: <stable@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      0130b0b3
  11. 28 Oct, 2006 2 commits
  12. 17 Oct, 2006 1 commit
    • Peter Zijlstra's avatar
      [PATCH] rt-mutex: fixup rt-mutex debug code · bea493a0
      Peter Zijlstra authored
      
      
      BUG: warning at kernel/rtmutex-debug.c:125/rt_mutex_debug_task_free() (Not tainted)
       [<c04051e3>] show_trace_log_lvl+0x58/0x16a
       [<c04057f0>] show_trace+0xd/0x10
       [<c0405900>] dump_stack+0x19/0x1b
       [<c043f03d>] rt_mutex_debug_task_free+0x35/0x6a
       [<c04224c0>] free_task+0x15/0x24
       [<c042378c>] copy_process+0x12bd/0x1324
       [<c0423835>] do_fork+0x42/0x113
       [<c04021dd>] sys_fork+0x19/0x1b
       [<c0403fb7>] syscall_call+0x7/0xb
      
      In copy_process(), dup_task_struct() also duplicates the ->pi_lock,
      ->pi_waiters and ->pi_blocked_on members.  rt_mutex_debug_task_free()
      called from free_task() validates these members.  However free_task() can
      be invoked before these members are reset for the new task.
      
      Move the initialization code before the first bail that can hit free_task().
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      bea493a0
  13. 02 Oct, 2006 6 commits
  14. 01 Oct, 2006 1 commit
    • Jay Lan's avatar
      [PATCH] csa: convert CONFIG tag for extended accounting routines · 8f0ab514
      Jay Lan authored
      
      
      There were a few accounting data/macros that are used in CSA but are #ifdef'ed
      inside CONFIG_BSD_PROCESS_ACCT.  This patch is to change those ifdef's from
      CONFIG_BSD_PROCESS_ACCT to CONFIG_TASK_XACCT.  A few defines are moved from
      kernel/acct.c and include/linux/acct.h to kernel/tsacct.c and
      include/linux/tsacct_kern.h.
      Signed-off-by: default avatarJay Lan <jlan@sgi.com>
      Cc: Shailabh Nagar <nagar@watson.ibm.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Jes Sorensen <jes@sgi.com>
      Cc: Chris Sturtivant <csturtiv@sgi.com>
      Cc: Tony Ernst <tee@sgi.com>
      Cc: Guillaume Thouvenin <guillaume.thouvenin@bull.net>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      8f0ab514
  15. 29 Sep, 2006 2 commits
  16. 26 Sep, 2006 1 commit
  17. 20 Sep, 2006 1 commit
  18. 01 Sep, 2006 1 commit
    • Shailabh Nagar's avatar
      [PATCH] task delay accounting fixes · 35df17c5
      Shailabh Nagar authored
      Cleanup allocation and freeing of tsk->delays used by delay accounting.
      This solves two problems reported for delay accounting:
      
      1. oops in __delayacct_blkio_ticks
      http://www.uwsg.indiana.edu/hypermail/linux/kernel/0608.2/1844.html
      
      Currently tsk->delays is getting freed too early in task exit which can
      cause a NULL tsk->delays to get accessed via reading of /proc/<tgid>/stats.
       The patch fixes this problem by freeing tsk->delays closer to when
      task_struct itself is freed up.  As a result, it also eliminates the use of
      tsk->delays_lock which was only being used (inadequately) to safeguard
      access to tsk->delays while a task was exiting.
      
      2. Possible memory leak in kernel/delayacct.c
      http://www.uwsg.indiana.edu/hypermail/linux/kernel/0608.2/1389.html
      
      
      
      The patch cleans up tsk->delays allocations after a bad fork which was
      missing earlier.
      
      The patch has been tested to fix the problems listed above and stress
      tested with rapid calls to delay accounting's taskstats command interface
      (which is the other path that can access the same data, besides the /proc
      interface causing the oops above).
      Signed-off-by: default avatarShailabh Nagar <nagar@watson.ibm.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      35df17c5
  19. 06 Aug, 2006 1 commit
  20. 15 Jul, 2006 1 commit
    • Shailabh Nagar's avatar
      [PATCH] delay accounting taskstats interface send tgid once · ad4ecbcb
      Shailabh Nagar authored
      
      
      Send per-tgid data only once during exit of a thread group instead of once
      with each member thread exit.
      
      Currently, when a thread exits, besides its per-tid data, the per-tgid data
      of its thread group is also sent out, if its thread group is non-empty.
      The per-tgid data sent consists of the sum of per-tid stats for all
      *remaining* threads of the thread group.
      
      This patch modifies this sending in two ways:
      
      - the per-tgid data is sent only when the last thread of a thread group
        exits.  This cuts down heavily on the overhead of sending/receiving
        per-tgid data, especially when other exploiters of the taskstats
        interface aren't interested in per-tgid stats
      
      - the semantics of the per-tgid data sent are changed.  Instead of being
        the sum of per-tid data for remaining threads, the value now sent is the
        true total accumalated statistics for all threads that are/were part of
        the thread group.
      
      The patch also addresses a minor issue where failure of one accounting
      subsystem to fill in the taskstats structure was causing the send of
      taskstats to not be sent at all.
      
      The patch has been tested for stability and run cerberus for over 4 hours
      on an SMP.
      
      [akpm@osdl.org: bugfixes]
      Signed-off-by: default avatarShailabh Nagar <nagar@watson.ibm.com>
      Signed-off-by: default avatarBalbir Singh <balbir@in.ibm.com>
      Cc: Jay Lan <jlan@engr.sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      ad4ecbcb