Skip to content
Snippets Groups Projects
  1. Aug 28, 2019
  2. Aug 27, 2019
  3. Aug 23, 2019
    • Andre Przywara's avatar
      KVM: arm/arm64: VGIC: Properly initialise private IRQ affinity · 2e16f3e9
      Andre Przywara authored
      
      At the moment we initialise the target *mask* of a virtual IRQ to the
      VCPU it belongs to, even though this mask is only defined for GICv2 and
      quickly runs out of bits for many GICv3 guests.
      This behaviour triggers an UBSAN complaint for more than 32 VCPUs:
      ------
      [ 5659.462377] UBSAN: Undefined behaviour in virt/kvm/arm/vgic/vgic-init.c:223:21
      [ 5659.471689] shift exponent 32 is too large for 32-bit type 'unsigned int'
      ------
      Also for GICv3 guests the reporting of TARGET in the "vgic-state" debugfs
      dump is wrong, due to this very same problem.
      
      Because there is no requirement to create the VGIC device before the
      VCPUs (and QEMU actually does it the other way round), we can't safely
      initialise mpidr or targets in kvm_vgic_vcpu_init(). But since we touch
      every private IRQ for each VCPU anyway later (in vgic_init()), we can
      just move the initialisation of those fields into there, where we
      definitely know the VGIC type.
      
      On the way make sure we really have either a VGICv2 or a VGICv3 device,
      since the existing code is just checking for "VGICv3 or not", silently
      ignoring the uninitialised case.
      
      Signed-off-by: default avatarAndre Przywara <andre.przywara@arm.com>
      Reported-by: default avatarDave Martin <dave.martin@arm.com>
      Tested-by: default avatarJulien Grall <julien.grall@arm.com>
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      2e16f3e9
  4. Aug 22, 2019
    • Andrew Jones's avatar
      KVM: arm/arm64: Only skip MMIO insn once · 2113c5f6
      Andrew Jones authored
      
      If after an MMIO exit to userspace a VCPU is immediately run with an
      immediate_exit request, such as when a signal is delivered or an MMIO
      emulation completion is needed, then the VCPU completes the MMIO
      emulation and immediately returns to userspace. As the exit_reason
      does not get changed from KVM_EXIT_MMIO in these cases we have to
      be careful not to complete the MMIO emulation again, when the VCPU is
      eventually run again, because the emulation does an instruction skip
      (and doing too many skips would be a waste of guest code :-) We need
      to use additional VCPU state to track if the emulation is complete.
      As luck would have it, we already have 'mmio_needed', which even
      appears to be used in this way by other architectures already.
      
      Fixes: 0d640732 ("arm64: KVM: Skip MMIO insn after emulation")
      Acked-by: default avatarMark Rutland <mark.rutland@arm.com>
      Signed-off-by: default avatarAndrew Jones <drjones@redhat.com>
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      2113c5f6
  5. Aug 09, 2019
  6. Aug 05, 2019
    • Marc Zyngier's avatar
      KVM: arm/arm64: Sync ICH_VMCR_EL2 back when about to block · 5eeaf10e
      Marc Zyngier authored
      
      Since commit commit 328e5664 ("KVM: arm/arm64: vgic: Defer
      touching GICH_VMCR to vcpu_load/put"), we leave ICH_VMCR_EL2 (or
      its GICv2 equivalent) loaded as long as we can, only syncing it
      back when we're scheduled out.
      
      There is a small snag with that though: kvm_vgic_vcpu_pending_irq(),
      which is indirectly called from kvm_vcpu_check_block(), needs to
      evaluate the guest's view of ICC_PMR_EL1. At the point were we
      call kvm_vcpu_check_block(), the vcpu is still loaded, and whatever
      changes to PMR is not visible in memory until we do a vcpu_put().
      
      Things go really south if the guest does the following:
      
      	mov x0, #0	// or any small value masking interrupts
      	msr ICC_PMR_EL1, x0
      
      	[vcpu preempted, then rescheduled, VMCR sampled]
      
      	mov x0, #ff	// allow all interrupts
      	msr ICC_PMR_EL1, x0
      	wfi		// traps to EL2, so samping of VMCR
      
      	[interrupt arrives just after WFI]
      
      Here, the hypervisor's view of PMR is zero, while the guest has enabled
      its interrupts. kvm_vgic_vcpu_pending_irq() will then say that no
      interrupts are pending (despite an interrupt being received) and we'll
      block for no reason. If the guest doesn't have a periodic interrupt
      firing once it has blocked, it will stay there forever.
      
      To avoid this unfortuante situation, let's resync VMCR from
      kvm_arch_vcpu_blocking(), ensuring that a following kvm_vcpu_check_block()
      will observe the latest value of PMR.
      
      This has been found by booting an arm64 Linux guest with the pseudo NMI
      feature, and thus using interrupt priorities to mask interrupts instead
      of the usual PSTATE masking.
      
      Cc: stable@vger.kernel.org # 4.12
      Fixes: 328e5664 ("KVM: arm/arm64: vgic: Defer touching GICH_VMCR to vcpu_load/put")
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      5eeaf10e
    • Greg KH's avatar
      KVM: no need to check return value of debugfs_create functions · 3e7093d0
      Greg KH authored
      
      When calling debugfs functions, there is no need to ever check the
      return value.  The function can work or not, but the code logic should
      never do something different based on this.
      
      Also, when doing this, change kvm_arch_create_vcpu_debugfs() to return
      void instead of an integer, as we should not care at all about if this
      function actually does anything or not.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "Radim Krčmář" <rkrcmar@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: <x86@kernel.org>
      Cc: <kvm@vger.kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      3e7093d0
    • Paolo Bonzini's avatar
      KVM: remove kvm_arch_has_vcpu_debugfs() · 741cbbae
      Paolo Bonzini authored
      
      There is no need for this function as all arches have to implement
      kvm_arch_create_vcpu_debugfs() no matter what.  A #define symbol
      let us actually simplify the code.
      
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      741cbbae
    • Wanpeng Li's avatar
      KVM: Fix leak vCPU's VMCS value into other pCPU · 17e433b5
      Wanpeng Li authored
      
      After commit d73eb57b (KVM: Boost vCPUs that are delivering interrupts), a
      five years old bug is exposed. Running ebizzy benchmark in three 80 vCPUs VMs
      on one 80 pCPUs Skylake server, a lot of rcu_sched stall warning splatting
      in the VMs after stress testing:
      
       INFO: rcu_sched detected stalls on CPUs/tasks: { 4 41 57 62 77} (detected by 15, t=60004 jiffies, g=899, c=898, q=15073)
       Call Trace:
         flush_tlb_mm_range+0x68/0x140
         tlb_flush_mmu.part.75+0x37/0xe0
         tlb_finish_mmu+0x55/0x60
         zap_page_range+0x142/0x190
         SyS_madvise+0x3cd/0x9c0
         system_call_fastpath+0x1c/0x21
      
      swait_active() sustains to be true before finish_swait() is called in
      kvm_vcpu_block(), voluntarily preempted vCPUs are taken into account
      by kvm_vcpu_on_spin() loop greatly increases the probability condition
      kvm_arch_vcpu_runnable(vcpu) is checked and can be true, when APICv
      is enabled the yield-candidate vCPU's VMCS RVI field leaks(by
      vmx_sync_pir_to_irr()) into spinning-on-a-taken-lock vCPU's current
      VMCS.
      
      This patch fixes it by checking conservatively a subset of events.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Marc Zyngier <Marc.Zyngier@arm.com>
      Cc: stable@vger.kernel.org
      Fixes: 98f4a146 (KVM: add kvm_arch_vcpu_runnable() test to kvm_vcpu_on_spin() loop)
      Signed-off-by: default avatarWanpeng Li <wanpengli@tencent.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      17e433b5
    • Wanpeng Li's avatar
      KVM: Check preempted_in_kernel for involuntary preemption · 046ddeed
      Wanpeng Li authored
      
      preempted_in_kernel is updated in preempt_notifier when involuntary preemption
      ocurrs, it can be stale when the voluntarily preempted vCPUs are taken into
      account by kvm_vcpu_on_spin() loop. This patch lets it just check preempted_in_kernel
      for involuntary preemption.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: default avatarWanpeng Li <wanpengli@tencent.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      046ddeed
  7. Jul 26, 2019
    • Anders Roxell's avatar
      KVM: arm: vgic-v3: Mark expected switch fall-through · 1a8248c7
      Anders Roxell authored
      
      When fall-through warnings was enabled by default the following warnings
      was starting to show up:
      
      ../virt/kvm/arm/hyp/vgic-v3-sr.c: In function ‘__vgic_v3_save_aprs’:
      ../virt/kvm/arm/hyp/vgic-v3-sr.c:351:24: warning: this statement may fall
       through [-Wimplicit-fallthrough=]
         cpu_if->vgic_ap0r[2] = __vgic_v3_read_ap0rn(2);
         ~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~
      ../virt/kvm/arm/hyp/vgic-v3-sr.c:352:2: note: here
        case 6:
        ^~~~
      ../virt/kvm/arm/hyp/vgic-v3-sr.c:353:24: warning: this statement may fall
       through [-Wimplicit-fallthrough=]
         cpu_if->vgic_ap0r[1] = __vgic_v3_read_ap0rn(1);
         ~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~
      ../virt/kvm/arm/hyp/vgic-v3-sr.c:354:2: note: here
        default:
        ^~~~~~~
      
      Rework so that the compiler doesn't warn about fall-through.
      
      Fixes: d93512ef0f0e ("Makefile: Globally enable fall-through warning")
      Signed-off-by: default avatarAnders Roxell <anders.roxell@linaro.org>
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      1a8248c7
  8. Jul 24, 2019
  9. Jul 23, 2019
  10. Jul 20, 2019
    • Wanpeng Li's avatar
      KVM: Boost vCPUs that are delivering interrupts · d73eb57b
      Wanpeng Li authored
      
      Inspired by commit 9cac38dd (KVM/s390: Set preempted flag during
      vcpu wakeup and interrupt delivery), we want to also boost not just
      lock holders but also vCPUs that are delivering interrupts. Most
      smp_call_function_many calls are synchronous, so the IPI target vCPUs
      are also good yield candidates.  This patch introduces vcpu->ready to
      boost vCPUs during wakeup and interrupt delivery time; unlike s390 we do
      not reuse vcpu->preempted so that voluntarily preempted vCPUs are taken
      into account by kvm_vcpu_on_spin, but vmx_vcpu_pi_put is not affected
      (VT-d PI handles voluntary preemption separately, in pi_pre_block).
      
      Testing on 80 HT 2 socket Xeon Skylake server, with 80 vCPUs VM 80GB RAM:
      ebizzy -M
      
                  vanilla     boosting    improved
      1VM          21443       23520         9%
      2VM           2800        8000       180%
      3VM           1800        3100        72%
      
      Testing on my Haswell desktop 8 HT, with 8 vCPUs VM 8GB RAM, two VMs,
      one running ebizzy -M, the other running 'stress --cpu 2':
      
      w/ boosting + w/o pv sched yield(vanilla)
      
                  vanilla     boosting   improved
                    1570         4000      155%
      
      w/ boosting + w/ pv sched yield(vanilla)
      
                  vanilla     boosting   improved
                    1844         5157      179%
      
      w/o boosting, perf top in VM:
      
       72.33%  [kernel]       [k] smp_call_function_many
        4.22%  [kernel]       [k] call_function_i
        3.71%  [kernel]       [k] async_page_fault
      
      w/ boosting, perf top in VM:
      
       38.43%  [kernel]       [k] smp_call_function_many
        6.31%  [kernel]       [k] async_page_fault
        6.13%  libc-2.23.so   [.] __memcpy_avx_unaligned
        4.88%  [kernel]       [k] call_function_interrupt
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Paul Mackerras <paulus@ozlabs.org>
      Cc: Marc Zyngier <maz@kernel.org>
      Signed-off-by: default avatarWanpeng Li <wanpengli@tencent.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      d73eb57b
  11. Jul 12, 2019
    • Mike Rapoport's avatar
      arm64: switch to generic version of pte allocation · 50f11a8a
      Mike Rapoport authored
      The PTE allocations in arm64 are identical to the generic ones modulo the
      GFP flags.
      
      Using the generic pte_alloc_one() functions ensures that the user page
      tables are allocated with __GFP_ACCOUNT set.
      
      The arm64 definition of PGALLOC_GFP is removed and replaced with
      GFP_PGTABLE_USER for p[gum]d_alloc_one() for the user page tables and
      GFP_PGTABLE_KERNEL for the kernel page tables. The KVM memory cache is now
      using GFP_PGTABLE_USER.
      
      The mappings created with create_pgd_mapping() are now using
      GFP_PGTABLE_KERNEL.
      
      The conversion to the generic version of pte_free_kernel() removes the NULL
      check for pte.
      
      The pte_free() version on arm64 is identical to the generic one and
      can be simply dropped.
      
      [cai@lca.pw: fix a bogus GFP flag in pgd_alloc()]
        Link: https://lore.kernel.org/r/1559656836-24940-1-git-send-email-cai@lca.pw/
      [and fix it more]
        Link: https://lore.kernel.org/linux-mm/20190617151252.GF16810@rapoport-lnx/
      Link: http://lkml.kernel.org/r/1557296232-15361-5-git-send-email-rppt@linux.ibm.com
      
      
      Signed-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Guo Ren <ren_guo@c-sky.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Ley Foon Tan <lftan@altera.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Palmer Dabbelt <palmer@sifive.com>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Sam Creasey <sammy@sammy.net>
      Cc: Vincent Chen <deanbo422@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      50f11a8a
  12. Jul 10, 2019
  13. Jul 08, 2019
    • Marc Zyngier's avatar
      KVM: arm/arm64: Initialise host's MPIDRs by reading the actual register · 1e0cf16c
      Marc Zyngier authored
      
      As part of setting up the host context, we populate its
      MPIDR by using cpu_logical_map(). It turns out that contrary
      to arm64, cpu_logical_map() on 32bit ARM doesn't return the
      *full* MPIDR, but a truncated version.
      
      This leaves the host MPIDR slightly corrupted after the first
      run of a VM, since we won't correctly restore the MPIDR on
      exit. Oops.
      
      Since we cannot trust cpu_logical_map(), let's adopt a different
      strategy. We move the initialization of the host CPU context as
      part of the per-CPU initialization (which, in retrospect, makes
      a lot of sense), and directly read the MPIDR from the HW. This
      is guaranteed to work on both arm and arm64.
      
      Reported-by: default avatarAndre Przywara <Andre.Przywara@arm.com>
      Tested-by: default avatarAndre Przywara <Andre.Przywara@arm.com>
      Fixes: 32f13955 ("arm/arm64: KVM: Statically configure the host's view of MPIDR")
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      1e0cf16c
  14. Jul 05, 2019
  15. Jun 19, 2019
  16. Jun 12, 2019
    • Dave Martin's avatar
      KVM: arm/arm64: vgic: Fix kvm_device leak in vgic_its_destroy · 4729ec8c
      Dave Martin authored
      
      kvm_device->destroy() seems to be supposed to free its kvm_device
      struct, but vgic_its_destroy() is not currently doing this,
      resulting in a memory leak, resulting in kmemleak reports such as
      the following:
      
      unreferenced object 0xffff800aeddfe280 (size 128):
        comm "qemu-system-aar", pid 13799, jiffies 4299827317 (age 1569.844s)
        [...]
        backtrace:
          [<00000000a08b80e2>] kmem_cache_alloc+0x178/0x208
          [<00000000dcad2bd3>] kvm_vm_ioctl+0x350/0xbc0
      
      Fix it.
      
      Cc: Andre Przywara <andre.przywara@arm.com>
      Fixes: 1085fdc6 ("KVM: arm64: vgic-its: Introduce new KVM ITS device")
      Signed-off-by: default avatarDave Martin <Dave.Martin@arm.com>
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      4729ec8c
  17. Jun 05, 2019
  18. Jun 04, 2019
    • Sean Christopherson's avatar
      KVM: Directly return result from kvm_arch_check_processor_compat() · f257d6dc
      Sean Christopherson authored
      
      Add a wrapper to invoke kvm_arch_check_processor_compat() so that the
      boilerplate ugliness of checking virtualization support on all CPUs is
      hidden from the arch specific code.  x86's implementation in particular
      is quite heinous, as it unnecessarily propagates the out-param pattern
      into kvm_x86_ops.
      
      While the x86 specific issue could be resolved solely by changing
      kvm_x86_ops, make the change for all architectures as returning a value
      directly is prettier and technically more robust, e.g. s390 doesn't set
      the out param, which could lead to subtle breakage in the (highly
      unlikely) scenario where the out-param was not pre-initialized by the
      caller.
      
      Opportunistically annotate svm_check_processor_compat() with __init.
      
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Reviewed-by: default avatarCornelia Huck <cohuck@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      f257d6dc
  19. May 30, 2019
  20. May 28, 2019
    • Thomas Huth's avatar
      KVM: s390: Do not report unusabled IDs via KVM_CAP_MAX_VCPU_ID · a86cb413
      Thomas Huth authored
      
      KVM_CAP_MAX_VCPU_ID is currently always reporting KVM_MAX_VCPU_ID on all
      architectures. However, on s390x, the amount of usable CPUs is determined
      during runtime - it is depending on the features of the machine the code
      is running on. Since we are using the vcpu_id as an index into the SCA
      structures that are defined by the hardware (see e.g. the sca_add_vcpu()
      function), it is not only the amount of CPUs that is limited by the hard-
      ware, but also the range of IDs that we can use.
      Thus KVM_CAP_MAX_VCPU_ID must be determined during runtime on s390x, too.
      So the handling of KVM_CAP_MAX_VCPU_ID has to be moved from the common
      code into the architecture specific code, and on s390x we have to return
      the same value here as for KVM_CAP_MAX_VCPUS.
      This problem has been discovered with the kvm_create_max_vcpus selftest.
      With this change applied, the selftest now passes on s390x, too.
      
      Reviewed-by: default avatarAndrew Jones <drjones@redhat.com>
      Reviewed-by: default avatarCornelia Huck <cohuck@redhat.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Signed-off-by: default avatarThomas Huth <thuth@redhat.com>
      Message-Id: <20190523164309.13345-9-thuth@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      a86cb413
Loading