1. 18 Jun, 2009 1 commit
  2. 24 Nov, 2008 1 commit
    • Serge Hallyn's avatar
      User namespaces: set of cleanups (v2) · 18b6e041
      Serge Hallyn authored
      The user_ns is moved from nsproxy to user_struct, so that a struct
      cred by itself is sufficient to determine access (which it otherwise
      would not be).  Corresponding ecryptfs fixes (by David Howells) are
      here as well.
      Fix refcounting.  The following rules now apply:
              1. The task pins the user struct.
              2. The user struct pins its user namespace.
              3. The user namespace pins the struct user which created it.
      User namespaces are cloned during copy_creds().  Unsharing a new user_ns
      is no longer possible.  (We could re-add that, but it'll cause code
      duplication and doesn't seem useful if PAM doesn't need to clone user
      When a user namespace is created, its first user (uid 0) gets empty
      keyrings and a clean group_info.
      This incorporates a previous patch by David Howells.  Here
      is his original patch description:
      >I suggest adding the attached incremental patch.  It makes the following
      > (1) Provides a current_user_ns() macro to wrap accesses to current's user
      >     namespace.
      > (2) Fixes eCryptFS.
      > (3) Renames create_new_userns() to create_user_ns() to be more consistent
      >     with the other associated functions and because the 'new' in the name is
      >     superfluous.
      > (4) Moves the argument and permission checks made for CLONE_NEWUSER to the
      >     beginning of do_fork() so that they're done prior to making any attempts
      >     at allocation.
      > (5) Calls create_user_ns() after prepare_creds(), and gives it the new creds
      >     to fill in rather than have it return the new root user.  I don't imagine
      >     the new root user being used for anything other than filling in a cred
      >     struct.
      >     This also permits me to get rid of a get_uid() and a free_uid(), as the
      >     reference the creds were holding on the old user_struct can just be
      >     transferred to the new namespace's creator pointer.
      > (6) Makes create_user_ns() reset the UIDs and GIDs of the creds under
      >     preparation rather than doing it in copy_creds().
      >Signed-off-by: David Howells <dhowells@redhat.com>
      	Oct 20: integrate dhowells comments
      		1. leave thread_keyring alone
      		2. use current_user_ns() in set_user()
      Signed-off-by: default avatarSerge Hallyn <serue@us.ibm.com>
  3. 23 Aug, 2008 1 commit
  4. 25 Jul, 2008 1 commit
    • Serge E. Hallyn's avatar
      cgroup_clone: use pid of newly created task for new cgroup · e885dcde
      Serge E. Hallyn authored
      cgroup_clone creates a new cgroup with the pid of the task.  This works
      correctly for unshare, but for clone cgroup_clone is called from
      copy_namespaces inside copy_process, which happens before the new pid is
      created.  As a result, the new cgroup was created with current's pid.
      This patch:
      	1. Moves the call inside copy_process to after the new pid
      	   is created
      	2. Passes the struct pid into ns_cgroup_clone (as it is not
      	   yet attached to the task)
      	3. Passes a name from ns_cgroup_clone() into cgroup_clone()
      	   so as to keep cgroup_clone() itself simpler
      	4. Uses pid_vnr() to get the process id value, so that the
      	   pid used to name the new cgroup is always the pid as it
      	   would be known to the task which did the cloning or
      	   unsharing.  I think that is the most intuitive thing to
      	   do.  This way, task t1 does clone(CLONE_NEWPID) to get
      	   t2, which does clone(CLONE_NEWPID) to get t3, then the
      	   cgroup for t3 will be named for the pid by which t2 knows
      (Thanks to Dan Smith for finding the main bug)
      	June 11: Incorporate Paul Menage's feedback:  don't pass
      	         NULL to ns_cgroup_clone from unshare, and reduce
      		 patch size by using 'nodename' in cgroup_clone.
      	June 10: Original version
      [akpm@linux-foundation.org: build fix]
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: default avatarSerge Hallyn <serge@us.ibm.com>
      Acked-by: default avatarPaul Menage <menage@google.com>
      Tested-by: default avatarDan Smith <danms@us.ibm.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  5. 29 Apr, 2008 1 commit
    • Serge E. Hallyn's avatar
      ipc: sysvsem: refuse clone(CLONE_SYSVSEM|CLONE_NEWIPC) · 02fdb36a
      Serge E. Hallyn authored
      CLONE_NEWIPC|CLONE_SYSVSEM interaction isn't handled properly.  This can cause
      a kernel memory corruption.  CLONE_NEWIPC must detach from the existing undo
      Fix, part 3: refuse clone(CLONE_SYSVSEM|CLONE_NEWIPC).
      With unshare, specifying CLONE_SYSVSEM means unshare the sysvsem.  So it seems
      reasonable that CLONE_NEWIPC without CLONE_SYSVSEM would just imply
      However with clone, specifying CLONE_SYSVSEM means *share* the sysvsem.  So
      calling clone(CLONE_SYSVSEM|CLONE_NEWIPC) is explicitly asking for something
      we can't allow.  So return -EINVAL in that case.
      [akpm@linux-foundation.org: cleanups]
      Signed-off-by: default avatarSerge E. Hallyn <serue@us.ibm.com>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Acked-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
      Cc: Pierre Peiffer <peifferp@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  6. 08 Feb, 2008 1 commit
    • Pavel Emelyanov's avatar
      namespaces: move the IPC namespace under IPC_NS option · ae5e1b22
      Pavel Emelyanov authored
      Currently the IPC namespace management code is spread over the ipc/*.c files.
      I moved this code into ipc/namespace.c file which is compiled out when needed.
      The linux/ipc_namespace.h file is used to store the prototypes of the
      functions in namespace.c and the stubs for NAMESPACES=n case.  This is done
      so, because the stub for copy_ipc_namespace requires the knowledge of the
      CLONE_NEWIPC flag, which is in sched.h.  But the linux/ipc.h file itself in
      included into many many .c files via the sys.h->sem.h sequence so adding the
      sched.h into it will make all these .c depend on sched.h which is not that
      good.  On the other hand the knowledge about the namespaces stuff is required
      in 4 .c files only.
      Besides, this patch compiles out some auxiliary functions from ipc/sem.c,
      msg.c and shm.c files.  It turned out that moving these functions into
      namespaces.c is not that easy because they use many other calls and macros
      from the original file.  Moving them would make this patch complicated.  On
      the other hand all these functions can be consolidated, so I will send a
      separate patch doing this a bit later.
      Signed-off-by: default avatarPavel Emelyanov <xemul@openvz.org>
      Acked-by: default avatarSerge Hallyn <serue@us.ibm.com>
      Cc: Cedric Le Goater <clg@fr.ibm.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Herbert Poetzl <herbert@13thfloor.at>
      Cc: Kirill Korotaev <dev@sw.ru>
      Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  7. 19 Oct, 2007 4 commits
    • Pavel Emelyanov's avatar
      pid namespaces: allow cloning of new namespace · 30e49c26
      Pavel Emelyanov authored
      When clone() is invoked with CLONE_NEWPID, create a new pid namespace and then
      create a new struct pid for the new process.  Allocate pid_t's for the new
      process in the new pid namespace and all ancestor pid namespaces.  Make the
      newly cloned process the session and process group leader.
      Since the active pid namespace is special and expected to be the first entry
      in pid->upid_list, preserve the order of pid namespaces.
      The size of 'struct pid' is dependent on the the number of pid namespaces the
      process exists in, so we use multiple pid-caches'.  Only one pid cache is
      created during system startup and this used by processes that exist only in
      When a process clones its pid namespace, we create additional pid caches as
      necessary and use the pid cache to allocate 'struct pids' for that depth.
      Note, that with this patch the newly created namespace won't work, since the
      rest of the kernel still uses global pids, but this is to be fixed soon.  Init
      pid namespace still works.
      [oleg@tv-sign.ru: merge fix]
      Signed-off-by: default avatarPavel Emelyanov <xemul@openvz.org>
      Signed-off-by: default avatarSukadev Bhattiprolu <sukadev@us.ibm.com>
      Cc: Paul Menage <menage@google.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    • Pavel Emelyanov's avatar
      Make access to task's nsproxy lighter · cf7b708c
      Pavel Emelyanov authored
      When someone wants to deal with some other taks's namespaces it has to lock
      the task and then to get the desired namespace if the one exists.  This is
      slow on read-only paths and may be impossible in some cases.
      E.g.  Oleg recently noticed a race between unshare() and the (sent for
      review in cgroups) pid namespaces - when the task notifies the parent it
      has to know the parent's namespace, but taking the task_lock() is
      impossible there - the code is under write locked tasklist lock.
      On the other hand switching the namespace on task (daemonize) and releasing
      the namespace (after the last task exit) is rather rare operation and we
      can sacrifice its speed to solve the issues above.
      The access to other task namespaces is proposed to be performed
      like this:
           nsproxy = task_nsproxy(tsk);
           if (nsproxy != NULL) {
                   / *
                     * work with the namespaces here
                     * e.g. get the reference on one of them
                     * /
           } / *
               * NULL task_nsproxy() means that this task is
               * almost dead (zombie)
               * /
      This patch has passed the review by Eric and Oleg :) and,
      of course, tested.
      [clg@fr.ibm.com: fix unshare()]
      [ebiederm@xmission.com: Update get_net_ns_by_pid]
      Signed-off-by: default avatarPavel Emelyanov <xemul@openvz.org>
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Serge Hallyn <serue@us.ibm.com>
      Signed-off-by: default avatarCedric Le Goater <clg@fr.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    • Sukadev Bhattiprolu's avatar
      pid namespaces: define and use task_active_pid_ns() wrapper · 2894d650
      Sukadev Bhattiprolu authored
      With multiple pid namespaces, a process is known by some pid_t in every
      ancestor pid namespace.  Every time the process forks, the child process also
      gets a pid_t in every ancestor pid namespace.
      While a process is visible in >=1 pid namespaces, it can see pid_t's in only
      one pid namespace.  We call this pid namespace it's "active pid namespace",
      and it is always the youngest pid namespace in which the process is known.
      This patch defines and uses a wrapper to find the active pid namespace of a
      process.  The implementation of the wrapper will be changed in when support
      for multiple pid namespaces are added.
      	- [Pavel Emelianov, Alexey Dobriyan] Back out the change to use
      	  task_active_pid_ns() in child_reaper() since task->nsproxy
      	  can be NULL during task exit (so child_reaper() continues to
      	  use init_pid_ns).
      	  to implement child_reaper() since init_pid_ns.child_reaper to
      	  implement child_reaper() since tsk->nsproxy can be NULL during exit.
      	- Rename task_pid_ns() to task_active_pid_ns() to reflect that a
      	  process can have multiple pid namespaces.
      Signed-off-by: default avatarSukadev Bhattiprolu <sukadev@us.ibm.com>
      Acked-by: default avatarPavel Emelianov <xemul@openvz.org>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Cedric Le Goater <clg@fr.ibm.com>
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Cc: Serge Hallyn <serue@us.ibm.com>
      Cc: Herbert Poetzel <herbert@13thfloor.at>
      Cc: Kirill Korotaev <dev@sw.ru>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    • Serge E. Hallyn's avatar
      cgroups: implement namespace tracking subsystem · 858d72ea
      Serge E. Hallyn authored
      When a task enters a new namespace via a clone() or unshare(), a new cgroup
      is created and the task moves into it.
      This version names cgroups which are automatically created using
      cgroup_clone() as "node_<pid>" where pid is the pid of the unsharing or
      cloned process.  (Thanks Pavel for the idea) This is safe because if the
      process unshares again, it will create
      The only possibilities (AFAICT) for a -EEXIST on unshare are
      	1. pid wraparound
      	2. a process fails an unshare, then tries again.
      Case 1 is unlikely enough that I ignore it (at least for now).  In case 2, the
      node_<pid> will be empty and can be rmdir'ed to make the subsequent unshare()
      	Name cloned cgroups as "node_<pid>".
      [clg@fr.ibm.com: fix order of cgroup subsystems in init/Kconfig]
      Signed-off-by: default avatarSerge E. Hallyn <serue@us.ibm.com>
      Cc: Paul Menage <menage@google.com>
      Signed-off-by: default avatarCedric Le Goater <clg@fr.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  8. 17 Oct, 2007 1 commit
  9. 10 Oct, 2007 1 commit
    • Eric W. Biederman's avatar
      [NET]: Add network namespace clone & unshare support. · 9dd776b6
      Eric W. Biederman authored
      This patch allows you to create a new network namespace
      using sys_clone, or sys_unshare.
      As the network namespace is still experimental and under development
      clone and unshare support is only made available when CONFIG_NET_NS is
      selected at compile time.
      As this patch introduces network namespace support into code paths
      that exist when the CONFIG_NET is not selected there are a few
      additions made to net_namespace.h to allow a few more functions
      to be used when the networking stack is not compiled in.
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  10. 20 Jul, 2007 1 commit
    • Paul Mundt's avatar
      mm: Remove slab destructors from kmem_cache_create(). · 20c2df83
      Paul Mundt authored
      Slab destructors were no longer supported after Christoph's
      c59def9f change. They've been
      BUGs for both slab and slub, and slob never supported them
      This rips out support for the dtor pointer from kmem_cache_create()
      completely and fixes up every single callsite in the kernel (there were
      about 224, not including the slab allocator definitions themselves,
      or the documentation references).
      Signed-off-by: default avatarPaul Mundt <lethal@linux-sh.org>
  11. 16 Jul, 2007 6 commits
  12. 24 Jun, 2007 1 commit
  13. 08 May, 2007 1 commit
    • Badari Pulavarty's avatar
      Merge sys_clone()/sys_unshare() nsproxy and namespace handling · e3222c4e
      Badari Pulavarty authored
      sys_clone() and sys_unshare() both makes copies of nsproxy and its associated
      namespaces.  But they have different code paths.
      This patch merges all the nsproxy and its associated namespace copy/clone
      handling (as much as possible).  Posted on container list earlier for
      - Create a new nsproxy and its associated namespaces and pass it back to
        caller to attach it to right process.
      - Changed all copy_*_ns() routines to return a new copy of namespace
        instead of attaching it to task->nsproxy.
      - Moved the CAP_SYS_ADMIN checks out of copy_*_ns() routines.
      - Removed unnessary !ns checks from copy_*_ns() and added BUG_ON()
        just incase.
      - Get rid of all individual unshare_*_ns() routines and make use of
        copy_*_ns() instead.
      [akpm@osdl.org: cleanups, warning fix]
      [clg@fr.ibm.com: remove dup_namespaces() declaration]
      [serue@us.ibm.com: fix CONFIG_IPC_NS=n, clone(CLONE_NEWIPC) retval]
      [akpm@linux-foundation.org: fix build with CONFIG_SYSVIPC=n]
      Signed-off-by: default avatarBadari Pulavarty <pbadari@us.ibm.com>
      Signed-off-by: default avatarSerge Hallyn <serue@us.ibm.com>
      Cc: Cedric Le Goater <clg@fr.ibm.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: <containers@lists.osdl.org>
      Signed-off-by: default avatarCedric Le Goater <clg@fr.ibm.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  14. 30 Jan, 2007 2 commits
  15. 13 Dec, 2006 1 commit
  16. 08 Dec, 2006 3 commits
  17. 20 Oct, 2006 1 commit
  18. 02 Oct, 2006 7 commits