1. 16 Nov, 2013 1 commit
  2. 15 Nov, 2013 4 commits
    • Tetsuo Handa's avatar
      seq_file: remove "%n" usage from seq_file users · 652586df
      Tetsuo Handa authored
      All seq_printf() users are using "%n" for calculating padding size,
      convert them to use seq_setwidth() / seq_pad() pair.
      Signed-off-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Cc: Joe Perches <joe@perches.com>
      Cc: David Miller <davem@davemloft.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      652586df
    • Kirill A. Shutemov's avatar
      mm, hugetlb: convert hugetlbfs to use split pmd lock · cb900f41
      Kirill A. Shutemov authored
      Hugetlb supports multiple page sizes. We use split lock only for PMD
      level, but not for PUD.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Signed-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Tested-by: default avatarAlex Thorlton <athorlton@sgi.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "Eric W . Biederman" <ebiederm@xmission.com>
      Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Dave Jones <davej@redhat.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Robin Holt <robinmholt@gmail.com>
      Cc: Sedat Dilek <sedat.dilek@gmail.com>
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      cb900f41
    • Kirill A. Shutemov's avatar
      mm, thp: change pmd_trans_huge_lock() to return taken lock · bf929152
      Kirill A. Shutemov authored
      With split ptlock it's important to know which lock
      pmd_trans_huge_lock() took.  This patch adds one more parameter to the
      function to return the lock.
      
      In most places migration to new api is trivial.  Exception is
      move_huge_pmd(): we need to take two locks if pmd tables are different.
      Signed-off-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Signed-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Tested-by: default avatarAlex Thorlton <athorlton@sgi.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "Eric W . Biederman" <ebiederm@xmission.com>
      Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Dave Jones <davej@redhat.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Robin Holt <robinmholt@gmail.com>
      Cc: Sedat Dilek <sedat.dilek@gmail.com>
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      bf929152
    • Kirill A. Shutemov's avatar
      mm: convert mm->nr_ptes to atomic_long_t · e1f56c89
      Kirill A. Shutemov authored
      With split page table lock for PMD level we can't hold mm->page_table_lock
      while updating nr_ptes.
      
      Let's convert it to atomic_long_t to avoid races.
      Signed-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Tested-by: default avatarAlex Thorlton <athorlton@sgi.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: "Eric W . Biederman" <ebiederm@xmission.com>
      Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Dave Jones <davej@redhat.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Robin Holt <robinmholt@gmail.com>
      Cc: Sedat Dilek <sedat.dilek@gmail.com>
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e1f56c89
  3. 13 Nov, 2013 6 commits
  4. 25 Oct, 2013 1 commit
  5. 17 Oct, 2013 3 commits
  6. 10 Oct, 2013 1 commit
    • Rob Herring's avatar
      of: remove HAVE_ARCH_DEVTREE_FIXUPS · 32df8dca
      Rob Herring authored
      HAVE_ARCH_DEVTREE_FIXUPS appears to always be needed except for sparc,
      but it is only used for /proc/device-teee and sparc does not enable
      /proc/device-tree. So this option is redundant. Remove the option and
      always enable it. This has the side effect of fixing /proc/device-tree
      on arches such as arm64 which failed to define this option.
      Signed-off-by: default avatarRob Herring <rob.herring@calxeda.com>
      Acked-by: default avatarVineet Gupta <vgupta@synopsys.com>
      Acked-by: default avatarGrant Likely <grant.likely@linaro.org>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: x86@kernel.org
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      32df8dca
  7. 09 Oct, 2013 1 commit
  8. 12 Sep, 2013 1 commit
  9. 11 Sep, 2013 6 commits
    • Michael Holzheu's avatar
      vmcore: enable /proc/vmcore mmap for s390 · 11e376a3
      Michael Holzheu authored
      The patch "s390/vmcore: Implement remap_oldmem_pfn_range for s390" allows
      now to use mmap also on s390.
      
      So enable mmap for s390 again.
      Signed-off-by: default avatarMichael Holzheu <holzheu@linux.vnet.ibm.com>
      Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
      Cc: Jan Willeke <willeke@de.ibm.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      11e376a3
    • Michael Holzheu's avatar
      vmcore: introduce remap_oldmem_pfn_range() · 9cb21813
      Michael Holzheu authored
      For zfcpdump we can't map the HSA storage because it is only available via
      a read interface.  Therefore, for the new vmcore mmap feature we have
      introduce a new mechanism to create mappings on demand.
      
      This patch introduces a new architecture function remap_oldmem_pfn_range()
      that should be used to create mappings with remap_pfn_range() for oldmem
      areas that can be directly mapped.  For zfcpdump this is everything
      besides of the HSA memory.  For the areas that are not mapped by
      remap_oldmem_pfn_range() a generic vmcore a new generic vmcore fault
      handler mmap_vmcore_fault() is called.
      
      This handler works as follows:
      
      * Get already available or new page from page cache (find_or_create_page)
      * Check if /proc/vmcore page is filled with data (PageUptodate)
      * If yes:
        Return that page
      * If no:
        Fill page using __vmcore_read(), set PageUptodate, and return page
      Signed-off-by: default avatarMichael Holzheu <holzheu@linux.vnet.ibm.com>
      Acked-by: default avatarVivek Goyal <vgoyal@redhat.com>
      Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
      Cc: Jan Willeke <willeke@de.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9cb21813
    • Michael Holzheu's avatar
      vmcore: introduce ELF header in new memory feature · be8a8d06
      Michael Holzheu authored
      For s390 we want to use /proc/vmcore for our SCSI stand-alone dump
      (zfcpdump).  We have support where the first HSA_SIZE bytes are saved into
      a hypervisor owned memory area (HSA) before the kdump kernel is booted.
      When the kdump kernel starts, it is restricted to use only HSA_SIZE bytes.
      
      The advantages of this mechanism are:
      
       * No crashkernel memory has to be defined in the old kernel.
       * Early boot problems (before kexec_load has been done) can be dumped
       * Non-Linux systems can be dumped.
      
      We modify the s390 copy_oldmem_page() function to read from the HSA memory
      if memory below HSA_SIZE bytes is requested.
      
      Since we cannot use the kexec tool to load the kernel in this scenario,
      we have to build the ELF header in the 2nd (kdump/new) kernel.
      
      So with the following patch set we would like to introduce the new
      function that the ELF header for /proc/vmcore can be created in the 2nd
      kernel memory.
      
      The following steps are done during zfcpdump execution:
      
      1.  Production system crashes
      2.  User boots a SCSI disk that has been prepared with the zfcpdump tool
      3.  Hypervisor saves CPU state of boot CPU and HSA_SIZE bytes of memory into HSA
      4.  Boot loader loads kernel into low memory area
      5.  Kernel boots and uses only HSA_SIZE bytes of memory
      6.  Kernel saves registers of non-boot CPUs
      7.  Kernel does memory detection for dump memory map
      8.  Kernel creates ELF header for /proc/vmcore
      9.  /proc/vmcore uses this header for initialization
      10. The zfcpdump user space reads /proc/vmcore to write dump to SCSI disk
          - copy_oldmem_page() copies from HSA for memory below HSA_SIZE
          - copy_oldmem_page() copies from real memory for memory above HSA_SIZE
      
      Currently for s390 we create the ELF core header in the 2nd kernel with a
      small trick.  We relocate the addresses in the ELF header in a way that
      for the /proc/vmcore code it seems to be in the 1st kernel (old) memory
      and the read_from_oldmem() returns the correct data.  This allows the
      /proc/vmcore code to use the ELF header in the 2nd kernel.
      
      This patch:
      
      Exchange the old mechanism with the new and much cleaner function call
      override feature that now offcially allows to create the ELF core header
      in the 2nd kernel.
      
      To use the new feature the following function have to be defined
      by the architecture backend code to read from new memory:
      
       * elfcorehdr_alloc: Allocate ELF header
       * elfcorehdr_free: Free the memory of the ELF header
       * elfcorehdr_read: Read from ELF header
       * elfcorehdr_read_notes: Read from ELF notes
      Signed-off-by: default avatarMichael Holzheu <holzheu@linux.vnet.ibm.com>
      Acked-by: default avatarVivek Goyal <vgoyal@redhat.com>
      Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
      Cc: Jan Willeke <willeke@de.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      be8a8d06
    • Oleg Nesterov's avatar
      proc: make proc_fd_permission() thread-friendly · 96d0df79
      Oleg Nesterov authored
      proc_fd_permission() says "process can still access /proc/self/fd after it
      has executed a setuid()", but the "task_pid() = proc_pid() check only
      helps if the task is group leader, /proc/self points to
      /proc/<leader-pid>.
      
      Change this check to use task_tgid() so that the whole thread group can
      access its /proc/self/fd or /proc/<tid-of-sub-thread>/fd.
      
      Notes:
      	- CLONE_THREAD does not require CLONE_FILES so task->files
      	  can differ, but I don't think this can lead to any security
      	  problem. And this matches same_thread_group() in
      	  __ptrace_may_access().
      
      	- /proc/self should probably point to /proc/<thread-tid>, but
      	  it is too late to change the rules. Perhaps it makes sense
      	  to add /proc/thread though.
      
      Test-case:
      
      	void *tfunc(void *arg)
      	{
      		assert(opendir("/proc/self/fd"));
      		return NULL;
      	}
      
      	int main(void)
      	{
      		pthread_t t;
      		pthread_create(&t, NULL, tfunc, NULL);
      		pthread_join(t, NULL);
      		return 0;
      	}
      
      fails if, say, this executable is not readable and suid_dumpable = 0.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      96d0df79
    • Chen Gang's avatar
      fs/proc/task_mmu.c: check the return value of mpol_to_str() · a3c03992
      Chen Gang authored
      mpol_to_str() may fail, and not fill the buffer (e.g. -EINVAL), so need
      check about it, or buffer may not be zero based, and next seq_printf()
      will cause issue.
      
      The failure return need after mpol_cond_put() to match get_vma_policy().
      Signed-off-by: default avatarChen Gang <gang.chen@asianux.com>
      Cc: Cyrill Gorcunov <gorcunov@gmail.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Andi Kleen <andi@firstfloor.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a3c03992
    • Cyrill Gorcunov's avatar
      mm: track vma changes with VM_SOFTDIRTY bit · d9104d1c
      Cyrill Gorcunov authored
      Pavel reported that in case if vma area get unmapped and then mapped (or
      expanded) in-place, the soft dirty tracker won't be able to recognize this
      situation since it works on pte level and ptes are get zapped on unmap,
      loosing soft dirty bit of course.
      
      So to resolve this situation we need to track actions on vma level, there
      VM_SOFTDIRTY flag comes in.  When new vma area created (or old expanded)
      we set this bit, and keep it here until application calls for clearing
      soft dirty bit.
      
      Thus when user space application track memory changes now it can detect if
      vma area is renewed.
      Reported-by: default avatarPavel Emelyanov <xemul@parallels.com>
      Signed-off-by: default avatarCyrill Gorcunov <gorcunov@openvz.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Matt Mackall <mpm@selenic.com>
      Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Rob Landley <rob@landley.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d9104d1c
  10. 05 Sep, 2013 1 commit
  11. 27 Aug, 2013 1 commit
    • Eric W. Biederman's avatar
      userns: Better restrictions on when proc and sysfs can be mounted · e51db735
      Eric W. Biederman authored
      Rely on the fact that another flavor of the filesystem is already
      mounted and do not rely on state in the user namespace.
      
      Verify that the mounted filesystem is not covered in any significant
      way.  I would love to verify that the previously mounted filesystem
      has no mounts on top but there are at least the directories
      /proc/sys/fs/binfmt_misc and /sys/fs/cgroup/ that exist explicitly
      for other filesystems to mount on top of.
      
      Refactor the test into a function named fs_fully_visible and call that
      function from the mount routines of proc and sysfs.  This makes this
      test local to the filesystems involved and the results current of when
      the mounts take place, removing a weird threading of the user
      namespace, the mount namespace and the filesystems themselves.
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      e51db735
  12. 26 Aug, 2013 1 commit
    • Eric W. Biederman's avatar
      proc: Restrict mounting the proc filesystem · aee1c13d
      Eric W. Biederman authored
      Don't allow mounting the proc filesystem unless the caller has
      CAP_SYS_ADMIN rights over the pid namespace.  The principle here is if
      you create or have capabilities over it you can mount it, otherwise
      you get to live with what other people have mounted.
      
      Andy pointed out that this is needed to prevent users in a user
      namespace from remounting proc and specifying different hidepid and gid
      options on already existing proc mounts.
      
      Cc: stable@vger.kernel.org
      Reported-by: default avatarAndy Lutomirski <luto@amacapital.net>
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      aee1c13d
  13. 24 Aug, 2013 1 commit
  14. 19 Aug, 2013 2 commits
    • Linus Torvalds's avatar
      proc: more readdir conversion bug-fixes · fd3930f7
      Linus Torvalds authored
      In the previous commit, Richard Genoud fixed proc_root_readdir(), which
      had lost the check for whether all of the non-process /proc entries had
      been returned or not.
      
      But that in turn exposed _another_ bug, namely that the original readdir
      conversion patch had yet another problem: it had lost the return value
      of proc_readdir_de(), so now checking whether it had completed
      successfully or not didn't actually work right anyway.
      
      This reinstates the non-zero return for the "end of base entries" that
      had also gotten lost in commit f0c3b509 ("[readdir] convert
      procfs").  So now you get all the base entries *and* you get all the
      process entries, regardless of getdents buffer size.
      
      (Side note: the Linux "getdents" manual page actually has a nice example
      application for testing getdents, which can be easily modified to use
      different buffers.  Who knew? Man-pages can be useful)
      Reported-by: default avatarEmmanuel Benisty <benisty.e@gmail.com>
      Reported-by: default avatarMarc Dionne <marc.c.dionne@gmail.com>
      Cc: Richard Genoud <richard.genoud@gmail.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fd3930f7
    • Richard Genoud's avatar
      proc: return on proc_readdir error · 94fc5d9d
      Richard Genoud authored
      Commit f0c3b509 ("[readdir] convert procfs") introduced a bug on the
      listing of the proc file-system.  The return value of proc_readdir()
      isn't tested anymore in the proc_root_readdir function.
      
      This lead to an "interesting" behaviour when we are using the getdents()
      system call with a buffer too small: instead of failing, it returns the
      first entries of /proc (enough to fill the given buffer), plus the PID
      directories.
      
      This is not triggered on glibc (as getdents is called with a 32KB
      buffer), but on uclibc, the buffer size is only 1KB, thus some proc
      entries are missing.
      
      See https://lkml.org/lkml/2013/8/12/288 for more background.
      Signed-off-by: default avatarRichard Genoud <richard.genoud@gmail.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      94fc5d9d
  15. 14 Aug, 2013 3 commits
  16. 18 Jul, 2013 1 commit
  17. 03 Jul, 2013 6 commits