1. 11 Nov, 2014 1 commit
  2. 30 Apr, 2014 1 commit
    • H. Peter Anvin's avatar
      x86-64, espfix: Don't leak bits 31:16 of %esp returning to 16-bit stack · 3891a04a
      H. Peter Anvin authored
      The IRET instruction, when returning to a 16-bit segment, only
      restores the bottom 16 bits of the user space stack pointer.  This
      causes some 16-bit software to break, but it also leaks kernel state
      to user space.  We have a software workaround for that ("espfix") for
      the 32-bit kernel, but it relies on a nonzero stack segment base which
      is not available in 64-bit mode.
      In checkin:
       x86-64, modify_ldt: Ban 16-bit segments on 64-bit kernels
      we "solved" this by forbidding 16-bit segments on 64-bit kernels, with
      the logic that 16-bit support is crippled on 64-bit kernels anyway (no
      V86 support), but it turns out that people are doing stuff like
      running old Win16 binaries under Wine and expect it to work.
      This works around this by creating percpu "ministacks", each of which
      is mapped 2^16 times 64K apart.  When we detect that the return SS is
      on the LDT, we copy the IRET frame to the ministack and use the
      relevant alias to return to userspace.  The ministacks are mapped
      readonly, so if IRET faults we promote #GP to #DF which is an IST
      vector and thus has its own stack; we then do the fixup in the #DF
      (Making #GP an IST exception would make the msr_safe functions unsafe
      in NMI/MC context, and quite possibly have other effects.)
      Special thanks to:
      - Andy Lutomirski, for the suggestion of using very small stack slots
        and copy (as opposed to map) the IRET frame there, and for the
        suggestion to mark them readonly and let the fault promote to #DF.
      - Konrad Wilk for paravirt fixup and testing.
      - Borislav Petkov for testing help and useful comments.
      Reported-by: default avatarBrian Gerst <brgerst@gmail.com>
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      Link: http://lkml.kernel.org/r/1398816946-3351-1-git-send-email-hpa@linux.intel.com
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Andrew Lutomriski <amluto@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Dirk Hohndel <dirk@hohndel.org>
      Cc: Arjan van de Ven <arjan.van.de.ven@intel.com>
      Cc: comex <comexk@gmail.com>
      Cc: Alexander van Heukelum <heukelum@fastmail.fm>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: <stable@vger.kernel.org> # consider after upstream merge
  3. 13 Oct, 2013 1 commit
  4. 29 Jan, 2013 1 commit
    • H. Peter Anvin's avatar
      x86, 64bit: Use a #PF handler to materialize early mappings on demand · 8170e6be
      H. Peter Anvin authored
      Linear mode (CR0.PG = 0) is mutually exclusive with 64-bit mode; all
      64-bit code has to use page tables.  This makes it awkward before we
      have first set up properly all-covering page tables to access objects
      that are outside the static kernel range.
      So far we have dealt with that simply by mapping a fixed amount of
      low memory, but that fails in at least two upcoming use cases:
      1. We will support load and run kernel, struct boot_params, ramdisk,
         command line, etc. above the 4 GiB mark.
      2. need to access ramdisk early to get microcode to update that as
         early possible.
      We could use early_iomap to access them too, but it will make code to
      messy and hard to be unified with 32 bit.
      Hence, set up a #PF table and use a fixed number of buffers to set up
      page tables on demand.  If the buffers fill up then we simply flush
      them and start over.  These buffers are all in __initdata, so it does
      not increase RAM usage at runtime.
      Thus, with the help of the #PF handler, we can set the final kernel
      mapping from blank, and switch to init_level4_pgt later.
      During the switchover in head_64.S, before #PF handler is available,
      we use three pages to handle kernel crossing 1G, 512G boundaries with
      sharing page by playing games with page aliasing: the same page is
      mapped twice in the higher-level tables with appropriate wraparound.
      The kernel region itself will be properly mapped; other mappings may
      be spurious.
      early_make_pgtable is using kernel high mapping address to access pages
      to set page table.
      -v4: Add phys_base offset to make kexec happy, and add
      	init_mapping_kernel()   - Yinghai
      -v5: fix compiling with xen, and add back ident level3 and level2 for xen
           also move back init_level4_pgt from BSS to DATA again.
           because we have to clear it anyway.  - Yinghai
      -v6: switch to init_level4_pgt in init_mem_mapping. - Yinghai
      -v7: remove not needed clear_page for init_level4_page
           it is with fill 512,8,0 already in head_64.S  - Yinghai
      -v8: we need to keep that handler alive until init_mem_mapping and don't
           let early_trap_init to trash that early #PF handler.
           So split early_trap_pf_init out and move it down. - Yinghai
      -v9: switchover only cover kernel space instead of 1G so could avoid
           touch possible mem holes. - Yinghai
      -v11: change far jmp back to far return to initial_code, that is needed
           to fix failure that is reported by Konrad on AMD systems.  - Yinghai
      Signed-off-by: default avatarYinghai Lu <yinghai@kernel.org>
      Link: http://lkml.kernel.org/r/1359058816-7615-12-git-send-email-yinghai@kernel.org
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
  5. 06 May, 2009 1 commit
    • Rik van Riel's avatar
      x86: 46 bit physical address support on 64 bits · c898faf9
      Rik van Riel authored
      Extend the maximum addressable memory on x86-64 from 2^44 to
      2^46 bytes. This requires some shuffling around of the vmalloc
      and virtual memmap memory areas, to keep them away from the
      direct mapping of up to 64TB of physical memory.
      This patch also introduces a guard hole between the vmalloc
      area and the virtual memory map space.  There's really no
      good reason why we wouldn't have a guard hole there.
      [ Impact: future hardware enablement ]
      Signed-off-by: default avatarRik van Riel <riel@redhat.com>
      LKML-Reference: <20090505172856.6820db22@cuia.bos.redhat.com>
      Signed-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
  6. 13 Feb, 2009 1 commit
  7. 11 Feb, 2009 2 commits