1. 09 Nov, 2007 1 commit
    • Srivatsa Vaddagiri's avatar
      sched: fix copy_namespace() <-> sched_fork() dependency in do_fork · 3c90e6e9
      Srivatsa Vaddagiri authored
      Sukadev Bhattiprolu reported a kernel crash with control groups.
      There are couple of problems discovered by Suka's test:
      
      - The test requires the cgroup filesystem to be mounted with
        atleast the cpu and ns options (i.e both namespace and cpu 
        controllers are active in the same hierarchy). 
      
      	# mkdir /dev/cpuctl
      	# mount -t cgroup -ocpu,ns none cpuctl
      	(or simply)
      	# mount -t cgroup none cpuctl -> Will activate all controllers
      					 in same hierarchy.
      
      - The test invokes clone() with CLONE_NEWNS set. This causes a a new child
        to be created, also a new group (do_fork->copy_namespaces->ns_cgroup_clone->
        cgroup_clone) and the child is attached to the new group (cgroup_clone->
        attach_task->sched_move_task). At this point in time, the child's scheduler 
        related fields are uninitialized (including its on_rq field, which it has
        inherited from parent). As a result sched_move_task thinks its on
        runqueue, when it isn't.
      
        As a solution to this problem, I moved sched_fork() call, which
        initializes scheduler related fields on a new task, before
        copy_namespaces(). I am not sure though whether moving up will
        cause other side-effects. Do you see any issue?
      
      - The second problem exposed by this test is that task_new_fair()
        assumes that parent and child will be part of the same group (which 
        needn't be as this test shows). As a result, cfs_rq->curr can be NULL
        for the child.
      
        The solution is to test for curr pointer being NULL in
        task_new_fair().
      
      With the patch below, I could run ns_exec() fine w/o a crash.
      Reported-by: default avatarSukadev Bhattiprolu <sukadev@us.ibm.com>
      Signed-off-by: default avatarSrivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      3c90e6e9
  2. 29 Oct, 2007 2 commits
  3. 19 Oct, 2007 15 commits
    • Alexey Dobriyan's avatar
      Uninline fork.c/exit.c · a39bc516
      Alexey Dobriyan authored
      Save ~650 bytes here.
      
      add/remove: 4/0 grow/shrink: 0/7 up/down: 430/-1088 (-658)
      function                                     old     new   delta
      __copy_fs_struct                               -     202    +202
      __put_fs_struct                                -     112    +112
      __exit_fs                                      -      58     +58
      __exit_files                                   -      58     +58
      exit_files                                    58       2     -56
      put_fs_struct                                112       5    -107
      exit_fs                                      161       2    -159
      sys_unshare                                  774     590    -184
      copy_process                                4031    3840    -191
      do_exit                                     1791    1597    -194
      copy_fs_struct                               202       5    -197
      
      No difference in lmbench lat_proc tests on 2-way Opteron 246.
      Smaaaal degradation on UP P4 (within errors).
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: default avatarAlexey Dobriyan <adobriyan@sw.ru>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a39bc516
    • Mariusz Kozlowski's avatar
      kernel/fork.c: remove unneeded variable initialization in copy_process() · a24efe62
      Mariusz Kozlowski authored
      This initialization of is not needed so just remove it.
      Signed-off-by: default avatarMariusz Kozlowski <m.kozlowski@tuxland.pl>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a24efe62
    • Pavel Emelyanov's avatar
      Isolate the explicit usage of signal->pgrp · 9a2e7057
      Pavel Emelyanov authored
      The pgrp field is not used widely around the kernel so it is now marked as
      deprecated with appropriate comment.
      
      The initialization of INIT_SIGNALS is trimmed because
      a) they are set to 0 automatically;
      b) gcc cannot properly initialize two anonymous (the second one
         is the one with the session) unions. In this particular case
         to make it compile we'd have to add some field initialized
         right before the .pgrp.
      
      This is the same patch as the 1ec320af one
      (from Cedric), but for the pgrp field.
      
      Some progress report:
      
      We have to deprecate the pid, tgid, session and pgrp fields on struct
      task_struct and struct signal_struct.  The session and pgrp are already
      deprecated.  The tgid value is close to being such - the worst known usage
      in in fs/locks.c and audit code.  The pid field deprecation is mainly
      blocked by numerous printk-s around the kernel that print the tsk->pid to
      log.
      Signed-off-by: default avatarPavel Emelyanov <xemul@openvz.org>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
      Cc: Cedric Le Goater <clg@fr.ibm.com>
      Cc: Serge Hallyn <serue@us.ibm.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Herbert Poetzl <herbert@13thfloor.at>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9a2e7057
    • Eugene Teo's avatar
      Fix tsk->exit_state usage · 270f722d
      Eugene Teo authored
      tsk->exit_state can only be 0, EXIT_ZOMBIE, or EXIT_DEAD.  A non-zero test
      is the same as tsk->exit_state & (EXIT_ZOMBIE | EXIT_DEAD), so just testing
      tsk->exit_state is sufficient.
      Signed-off-by: default avatarEugene Teo <eugeneteo@kernel.sg>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      270f722d
    • Pavel Emelyanov's avatar
      pid namespaces: changes to show virtual ids to user · b488893a
      Pavel Emelyanov authored
      This is the largest patch in the set. Make all (I hope) the places where
      the pid is shown to or get from user operate on the virtual pids.
      
      The idea is:
       - all in-kernel data structures must store either struct pid itself
         or the pid's global nr, obtained with pid_nr() call;
       - when seeking the task from kernel code with the stored id one
         should use find_task_by_pid() call that works with global pids;
       - when showing pid's numerical value to the user the virtual one
         should be used, but however when one shows task's pid outside this
         task's namespace the global one is to be used;
       - when getting the pid from userspace one need to consider this as
         the virtual one and use appropriate task/pid-searching functions.
      
      [akpm@linux-foundation.org: build fix]
      [akpm@linux-foundation.org: nuther build fix]
      [akpm@linux-foundation.org: yet nuther build fix]
      [akpm@linux-foundation.org: remove unneeded casts]
      Signed-off-by: default avatarPavel Emelyanov <xemul@openvz.org>
      Signed-off-by: default avatarAlexey Dobriyan <adobriyan@openvz.org>
      Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Cc: Paul Menage <menage@google.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b488893a
    • Pavel Emelyanov's avatar
      pid namespaces: initialize the namespace's proc_mnt · 6f4e6433
      Pavel Emelyanov authored
      The namespace's proc_mnt must be kern_mount-ed to make this pointer always
      valid, independently of whether the user space mounted the proc or not.  This
      solves raced in proc_flush_task, etc.  with the proc_mnt switching from NULL
      to not-NULL.
      
      The initialization is done after the init's pid is created and hashed to make
      proc_get_sb() finr it and get for root inode.
      
      Sice the namespace holds the vfsmnt, vfsmnt holds the superblock and the
      superblock holds the namespace we must explicitly break this circle to destroy
      all the stuff.  This is done after the init of the namespace dies.  Running a
      few steps forward - when init exits it will kill all its children, so no
      proc_mnt will be needed after its death.
      Signed-off-by: default avatarPavel Emelyanov <xemul@openvz.org>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
      Cc: Paul Menage <menage@google.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6f4e6433
    • Pavel Emelyanov's avatar
      pid namespaces: allow cloning of new namespace · 30e49c26
      Pavel Emelyanov authored
      When clone() is invoked with CLONE_NEWPID, create a new pid namespace and then
      create a new struct pid for the new process.  Allocate pid_t's for the new
      process in the new pid namespace and all ancestor pid namespaces.  Make the
      newly cloned process the session and process group leader.
      
      Since the active pid namespace is special and expected to be the first entry
      in pid->upid_list, preserve the order of pid namespaces.
      
      The size of 'struct pid' is dependent on the the number of pid namespaces the
      process exists in, so we use multiple pid-caches'.  Only one pid cache is
      created during system startup and this used by processes that exist only in
      init_pid_ns.
      
      When a process clones its pid namespace, we create additional pid caches as
      necessary and use the pid cache to allocate 'struct pids' for that depth.
      
      Note, that with this patch the newly created namespace won't work, since the
      rest of the kernel still uses global pids, but this is to be fixed soon.  Init
      pid namespace still works.
      
      [oleg@tv-sign.ru: merge fix]
      Signed-off-by: default avatarPavel Emelyanov <xemul@openvz.org>
      Signed-off-by: default avatarSukadev Bhattiprolu <sukadev@us.ibm.com>
      Cc: Paul Menage <menage@google.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      30e49c26
    • Pavel Emelyanov's avatar
      pid namespaces: move alloc_pid() lower in copy_process() · 425fb2b4
      Pavel Emelyanov authored
      When we create new namespace we will need to allocate the struct pid, that
      will have one extra struct upid in array, comparing to the parent.
      
      Thus we need to know the new namespace (if any) in alloc_pid() to init this
      struct upid properly, so move the alloc_pid() call lower in copy_process().
      Signed-off-by: default avatarPavel Emelyanov <xemul@openvz.org>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
      Cc: Paul Menage <menage@google.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      425fb2b4
    • Pavel Emelyanov's avatar
      pid namespaces: make alloc_pid(), free_pid() and put_pid() work with struct upid · 8ef047aa
      Pavel Emelyanov authored
      Each struct upid element of struct pid has to be initialized properly, i.e.
      its nr mst be allocated from appropriate pidmap and ns set to appropriate
      namespace.
      
      When allocating a new pid, we need to know the namespace this pid will live
      in, so the additional argument is added to alloc_pid().
      
      On the other hand, the rest of the kernel still uses the pid->nr and
      pid->pid_chain fields, so these ones are still initialized, but this will be
      removed soon.
      Signed-off-by: default avatarPavel Emelyanov <xemul@openvz.org>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
      Cc: Paul Menage <menage@google.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8ef047aa
    • Pavel Emelyanov's avatar
      Make access to task's nsproxy lighter · cf7b708c
      Pavel Emelyanov authored
      When someone wants to deal with some other taks's namespaces it has to lock
      the task and then to get the desired namespace if the one exists.  This is
      slow on read-only paths and may be impossible in some cases.
      
      E.g.  Oleg recently noticed a race between unshare() and the (sent for
      review in cgroups) pid namespaces - when the task notifies the parent it
      has to know the parent's namespace, but taking the task_lock() is
      impossible there - the code is under write locked tasklist lock.
      
      On the other hand switching the namespace on task (daemonize) and releasing
      the namespace (after the last task exit) is rather rare operation and we
      can sacrifice its speed to solve the issues above.
      
      The access to other task namespaces is proposed to be performed
      like this:
      
           rcu_read_lock();
           nsproxy = task_nsproxy(tsk);
           if (nsproxy != NULL) {
                   / *
                     * work with the namespaces here
                     * e.g. get the reference on one of them
                     * /
           } / *
               * NULL task_nsproxy() means that this task is
               * almost dead (zombie)
               * /
           rcu_read_unlock();
      
      This patch has passed the review by Eric and Oleg :) and,
      of course, tested.
      
      [clg@fr.ibm.com: fix unshare()]
      [ebiederm@xmission.com: Update get_net_ns_by_pid]
      Signed-off-by: default avatarPavel Emelyanov <xemul@openvz.org>
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Serge Hallyn <serue@us.ibm.com>
      Signed-off-by: default avatarCedric Le Goater <clg@fr.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      cf7b708c
    • Sukadev Bhattiprolu's avatar
      pid namespaces: move alloc_pid() to copy_process() · a6f5e063
      Sukadev Bhattiprolu authored
      Move alloc_pid() into copy_process().  This will keep all pid and pid
      namespace code together and simplify error handling when we support multiple
      pid namespaces.
      Signed-off-by: default avatarSukadev Bhattiprolu <sukadev@us.ibm.com>
      Cc: Pavel Emelianov <xemul@openvz.org>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Cedric Le Goater <clg@fr.ibm.com>
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Cc: Serge Hallyn <serue@us.ibm.com>
      Cc: Herbert Poetzel <herbert@13thfloor.at>
      Cc: Kirill Korotaev <dev@sw.ru>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a6f5e063
    • Pavel Emelianov's avatar
      pid namespaces: round up the API · a47afb0f
      Pavel Emelianov authored
      The set of functions process_session, task_session, process_group and
      task_pgrp is confusing, as the names can be mixed with each other when looking
      at the code for a long time.
      
      The proposals are to
      * equip the functions that return the integer with _nr suffix to
        represent that fact,
      * and to make all functions work with task (not process) by making
        the common prefix of the same name.
      
      For monotony the routines signal_session() and set_signal_session() are
      replaced with task_session_nr() and set_task_session(), especially since they
      are only used with the explicit task->signal dereference.
      Signed-off-by: default avatarPavel Emelianov <xemul@openvz.org>
      Acked-by: default avatarSerge E. Hallyn <serue@us.ibm.com>
      Cc: Kirill Korotaev <dev@openvz.org>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Cedric Le Goater <clg@fr.ibm.com>
      Cc: Herbert Poetzl <herbert@13thfloor.at>
      Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a47afb0f
    • Paul Menage's avatar
      Task Control Groups: make cpusets a client of cgroups · 8793d854
      Paul Menage authored
      Remove the filesystem support logic from the cpusets system and makes cpusets
      a cgroup subsystem
      
      The "cpuset" filesystem becomes a dummy filesystem; attempts to mount it get
      passed through to the cgroup filesystem with the appropriate options to
      emulate the old cpuset filesystem behaviour.
      Signed-off-by: default avatarPaul Menage <menage@google.com>
      Cc: Serge E. Hallyn <serue@us.ibm.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Paul Jackson <pj@sgi.com>
      Cc: Kirill Korotaev <dev@openvz.org>
      Cc: Herbert Poetzl <herbert@13thfloor.at>
      Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
      Cc: Cedric Le Goater <clg@fr.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8793d854
    • Paul Menage's avatar
      Task Control Groups: shared cgroup subsystem group arrays · 817929ec
      Paul Menage authored
      Replace the struct css_set embedded in task_struct with a pointer; all tasks
      that have the same set of memberships across all hierarchies will share a
      css_set object, and will be linked via their css_sets field to the "tasks"
      list_head in the css_set.
      
      Assuming that many tasks share the same cgroup assignments, this reduces
      overall space usage and keeps the size of the task_struct down (three pointers
      added to task_struct compared to a non-cgroups kernel, no matter how many
      subsystems are registered).
      
      [akpm@linux-foundation.org: fix a printk]
      [akpm@linux-foundation.org: build fix]
      Signed-off-by: default avatarPaul Menage <menage@google.com>
      Cc: Serge E. Hallyn <serue@us.ibm.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Paul Jackson <pj@sgi.com>
      Cc: Kirill Korotaev <dev@openvz.org>
      Cc: Herbert Poetzl <herbert@13thfloor.at>
      Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
      Cc: Cedric Le Goater <clg@fr.ibm.com>
      Cc: Serge E. Hallyn <serue@us.ibm.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Paul Jackson <pj@sgi.com>
      Cc: Kirill Korotaev <dev@openvz.org>
      Cc: Herbert Poetzl <herbert@13thfloor.at>
      Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
      Cc: Cedric Le Goater <clg@fr.ibm.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      817929ec
    • Paul Menage's avatar
      Task Control Groups: add fork()/exit() hooks · b4f48b63
      Paul Menage authored
      This adds the necessary hooks to the fork() and exit() paths to ensure
      that new children inherit their parent's cgroup assignments, and that
      exiting processes release reference counts on their cgroups.
      Signed-off-by: default avatarPaul Menage <menage@google.com>
      Cc: Serge E. Hallyn <serue@us.ibm.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Paul Jackson <pj@sgi.com>
      Cc: Kirill Korotaev <dev@openvz.org>
      Cc: Herbert Poetzl <herbert@13thfloor.at>
      Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
      Cc: Cedric Le Goater <clg@fr.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b4f48b63
  4. 18 Oct, 2007 4 commits
  5. 17 Oct, 2007 4 commits
  6. 15 Oct, 2007 1 commit
  7. 10 Oct, 2007 1 commit
    • Eric W. Biederman's avatar
      [NET]: Add network namespace clone & unshare support. · 9dd776b6
      Eric W. Biederman authored
      This patch allows you to create a new network namespace
      using sys_clone, or sys_unshare.
      
      As the network namespace is still experimental and under development
      clone and unshare support is only made available when CONFIG_NET_NS is
      selected at compile time.
      
      As this patch introduces network namespace support into code paths
      that exist when the CONFIG_NET is not selected there are a few
      additions made to net_namespace.h to allow a few more functions
      to be used when the networking stack is not compiled in.
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9dd776b6
  8. 20 Sep, 2007 1 commit
    • Davide Libenzi's avatar
      signalfd simplification · b8fceee1
      Davide Libenzi authored
      This simplifies signalfd code, by avoiding it to remain attached to the
      sighand during its lifetime.
      
      In this way, the signalfd remain attached to the sighand only during
      poll(2) (and select and epoll) and read(2).  This also allows to remove
      all the custom "tsk == current" checks in kernel/signal.c, since
      dequeue_signal() will only be called by "current".
      
      I think this is also what Ben was suggesting time ago.
      
      The external effect of this, is that a thread can extract only its own
      private signals and the group ones.  I think this is an acceptable
      behaviour, in that those are the signals the thread would be able to
      fetch w/out signalfd.
      Signed-off-by: default avatarDavide Libenzi <davidel@xmailserver.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b8fceee1
  9. 20 Jul, 2007 1 commit
    • Paul Mundt's avatar
      mm: Remove slab destructors from kmem_cache_create(). · 20c2df83
      Paul Mundt authored
      Slab destructors were no longer supported after Christoph's
      c59def9f change. They've been
      BUGs for both slab and slub, and slob never supported them
      either.
      
      This rips out support for the dtor pointer from kmem_cache_create()
      completely and fixes up every single callsite in the kernel (there were
      about 224, not including the slab allocator definitions themselves,
      or the documentation references).
      Signed-off-by: default avatarPaul Mundt <lethal@linux-sh.org>
      20c2df83
  10. 19 Jul, 2007 3 commits
  11. 17 Jul, 2007 1 commit
    • Rafael J. Wysocki's avatar
      Freezer: make kernel threads nonfreezable by default · 83144186
      Rafael J. Wysocki authored
      Currently, the freezer treats all tasks as freezable, except for the kernel
      threads that explicitly set the PF_NOFREEZE flag for themselves.  This
      approach is problematic, since it requires every kernel thread to either
      set PF_NOFREEZE explicitly, or call try_to_freeze(), even if it doesn't
      care for the freezing of tasks at all.
      
      It seems better to only require the kernel threads that want to or need to
      be frozen to use some freezer-related code and to remove any
      freezer-related code from the other (nonfreezable) kernel threads, which is
      done in this patch.
      
      The patch causes all kernel threads to be nonfreezable by default (ie.  to
      have PF_NOFREEZE set by default) and introduces the set_freezable()
      function that should be called by the freezable kernel threads in order to
      unset PF_NOFREEZE.  It also makes all of the currently freezable kernel
      threads call set_freezable(), so it shouldn't cause any (intentional)
      change of behaviour to appear.  Additionally, it updates documentation to
      describe the freezing of tasks more accurately.
      
      [akpm@linux-foundation.org: build fixes]
      Signed-off-by: default avatarRafael J. Wysocki <rjw@sisk.pl>
      Acked-by: default avatarNigel Cunningham <nigel@nigel.suspend2.net>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Cc: Gautham R Shenoy <ego@in.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      83144186
  12. 16 Jul, 2007 4 commits
    • Serge E. Hallyn's avatar
      user namespace: add unshare · 77ec739d
      Serge E. Hallyn authored
      This patch enables the unshare of user namespaces.
      
      It adds a new clone flag CLONE_NEWUSER and implements copy_user_ns() which
      resets the current user_struct and adds a new root user (uid == 0)
      
      For now, unsharing the user namespace allows a process to reset its
      user_struct accounting and uid 0 in the new user namespace should be contained
      using appropriate means, for instance selinux
      
      The plan, when the full support is complete (all uid checks covered), is to
      keep the original user's rights in the original namespace, and let a process
      become uid 0 in the new namespace, with full capabilities to the new
      namespace.
      Signed-off-by: default avatarSerge E. Hallyn <serue@us.ibm.com>
      Signed-off-by: default avatarCedric Le Goater <clg@fr.ibm.com>
      Acked-by: default avatarPavel Emelianov <xemul@openvz.org>
      Cc: Herbert Poetzl <herbert@13thfloor.at>
      Cc: Kirill Korotaev <dev@sw.ru>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Chris Wright <chrisw@sous-sol.org>
      Cc: Stephen Smalley <sds@tycho.nsa.gov>
      Cc: James Morris <jmorris@namei.org>
      Cc: Andrew Morgan <agm@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      77ec739d
    • Cedric Le Goater's avatar
      user namespace: add the framework · acce292c
      Cedric Le Goater authored
      Basically, it will allow a process to unshare its user_struct table,
      resetting at the same time its own user_struct and all the associated
      accounting.
      
      A new root user (uid == 0) is added to the user namespace upon creation.
      Such root users have full privileges and it seems that theses privileges
      should be controlled through some means (process capabilities ?)
      
      The unshare is not included in this patch.
      
      Changes since [try #4]:
      	- Updated get_user_ns and put_user_ns to accept NULL, and
      	  get_user_ns to return the namespace.
      
      Changes since [try #3]:
      	- moved struct user_namespace to files user_namespace.{c,h}
      
      Changes since [try #2]:
      	- removed struct user_namespace* argument from find_user()
      
      Changes since [try #1]:
      	- removed struct user_namespace* argument from find_user()
      	- added a root_user per user namespace
      Signed-off-by: default avatarCedric Le Goater <clg@fr.ibm.com>
      Signed-off-by: default avatarSerge E. Hallyn <serue@us.ibm.com>
      Acked-by: default avatarPavel Emelianov <xemul@openvz.org>
      Cc: Herbert Poetzl <herbert@13thfloor.at>
      Cc: Kirill Korotaev <dev@sw.ru>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Chris Wright <chrisw@sous-sol.org>
      Cc: Stephen Smalley <sds@tycho.nsa.gov>
      Cc: James Morris <jmorris@namei.org>
      Cc: Andrew Morgan <agm@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      acce292c
    • Miloslav Trmac's avatar
      Audit: add TTY input auditing · 522ed776
      Miloslav Trmac authored
      Add TTY input auditing, used to audit system administrator's actions.  This is
      required by various security standards such as DCID 6/3 and PCI to provide
      non-repudiation of administrator's actions and to allow a review of past
      actions if the administrator seems to overstep their duties or if the system
      becomes misconfigured for unknown reasons.  These requirements do not make it
      necessary to audit TTY output as well.
      
      Compared to an user-space keylogger, this approach records TTY input using the
      audit subsystem, correlated with other audit events, and it is completely
      transparent to the user-space application (e.g.  the console ioctls still
      work).
      
      TTY input auditing works on a higher level than auditing all system calls
      within the session, which would produce an overwhelming amount of mostly
      useless audit events.
      
      Add an "audit_tty" attribute, inherited across fork ().  Data read from TTYs
      by process with the attribute is sent to the audit subsystem by the kernel.
      The audit netlink interface is extended to allow modifying the audit_tty
      attribute, and to allow sending explanatory audit events from user-space (for
      example, a shell might send an event containing the final command, after the
      interactive command-line editing and history expansion is performed, which
      might be difficult to decipher from the TTY input alone).
      
      Because the "audit_tty" attribute is inherited across fork (), it would be set
      e.g.  for sshd restarted within an audited session.  To prevent this, the
      audit_tty attribute is cleared when a process with no open TTY file
      descriptors (e.g.  after daemon startup) opens a TTY.
      
      See https://www.redhat.com/archives/linux-audit/2007-June/msg00000.html for a
      more detailed rationale document for an older version of this patch.
      
      [akpm@linux-foundation.org: build fix]
      Signed-off-by: default avatarMiloslav Trmac <mitr@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Cc: Paul Fulghum <paulkf@microgate.com>
      Cc: Casey Schaufler <casey@schaufler-ca.com>
      Cc: Steve Grubb <sgrubb@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      522ed776
    • Tomas Janousek's avatar
      Use boot based time for process start time and boot time in /proc · 924b42d5
      Tomas Janousek authored
      Commit 411187fb caused boot time to move and
      process start times to become invalid after suspend.  Using boot based time
      for those restores the old behaviour and fixes the issue.
      
      [akpm@linux-foundation.org: little cleanup]
      Signed-off-by: default avatarTomas Janousek <tjanouse@redhat.com>
      Cc: Tomas Smetana <tsmetana@redhat.com>
      Acked-by: default avatarJohn Stultz <johnstul@us.ibm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      924b42d5
  13. 09 Jul, 2007 1 commit
  14. 24 May, 2007 1 commit
    • Rafael J. Wysocki's avatar
      freezer: fix vfork problem · ba96a0c8
      Rafael J. Wysocki authored
      Currently try_to_freeze_tasks() has to wait until all of the vforked processes
      exit and for this reason every user can make it fail.  To fix this problem we
      can introduce the additional process flag PF_FREEZER_SKIP to be used by tasks
      that do not want to be counted as freezable by the freezer and want to have
      TIF_FREEZE set nevertheless.  Then, this flag can be set by tasks using
      sys_vfork() before they call wait_for_completion(&vfork) and cleared after
      they have woken up.  After clearing it, the tasks should call try_to_freeze()
      as soon as possible.
      Signed-off-by: default avatarRafael J. Wysocki <rjw@sisk.pl>
      Cc: Gautham R Shenoy <ego@in.ibm.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ba96a0c8