- Aug 22, 2018
-
-
Michal Hocko authored
There are several blockable mmu notifiers which might sleep in mmu_notifier_invalidate_range_start and that is a problem for the oom_reaper because it needs to guarantee a forward progress so it cannot depend on any sleepable locks. Currently we simply back off and mark an oom victim with blockable mmu notifiers as done after a short sleep. That can result in selecting a new oom victim prematurely because the previous one still hasn't torn its memory down yet. We can do much better though. Even if mmu notifiers use sleepable locks there is no reason to automatically assume those locks are held. Moreover majority of notifiers only care about a portion of the address space and there is absolutely zero reason to fail when we are unmapping an unrelated range. Many notifiers do really block and wait for HW which is harder to handle and we have to bail out though. This patch handles the low hanging fruit. __mmu_notifier_invalidate_range_start gets a blockable flag and callbacks are not allowed to sleep if the flag is set to false. This is achieved by using trylock instead of the sleepable lock for most callbacks and continue as long as we do not block down the call chain. I think we can improve that even further because there is a common pattern to do a range lookup first and then do something about that. The first part can be done without a sleeping lock in most cases AFAICS. The oom_reaper end then simply retries if there is at least one notifier which couldn't make any progress in !blockable mode. A retry loop is already implemented to wait for the mmap_sem and this is basically the same thing. The simplest way for driver developers to test this code path is to wrap userspace code which uses these notifiers into a memcg and set the hard limit to hit the oom. This can be done e.g. after the test faults in all the mmu notifier managed memory and set the hard limit to something really small. Then we are looking for a proper process tear down. [akpm@linux-foundation.org: coding style fixes] [akpm@linux-foundation.org: minor code simplification] Link: http://lkml.kernel.org/r/20180716115058.5559-1-mhocko@kernel.org Signed-off-by:
Michal Hocko <mhocko@suse.com> Acked-by: Christian König <christian.koenig@amd.com> # AMD notifiers Acked-by: Leon Romanovsky <leonro@mellanox.com> # mlx and umem_odp Reported-by:
David Rientjes <rientjes@google.com> Cc: "David (ChunMing) Zhou" <David1.Zhou@amd.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: David Airlie <airlied@linux.ie> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Doug Ledford <dledford@redhat.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Mike Marciniszyn <mike.marciniszyn@intel.com> Cc: Dennis Dalessandro <dennis.dalessandro@intel.com> Cc: Sudeep Dutt <sudeep.dutt@intel.com> Cc: Ashutosh Dixit <ashutosh.dixit@intel.com> Cc: Dimitri Sivanich <sivanich@sgi.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Juergen Gross <jgross@suse.com> Cc: "Jérôme Glisse" <jglisse@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Aug 13, 2018
-
-
Punit Agrawal authored
When there is contention on faulting in a particular page table entry at stage 2, the break-before-make requirement of the architecture can lead to additional refaulting due to TLB invalidation. Avoid this by skipping a page table update if the new value of the PTE matches the previous value. Cc: stable@vger.kernel.org Fixes: d5d8184d ("KVM: ARM: Memory virtualization setup") Reviewed-by:
Suzuki Poulose <suzuki.poulose@arm.com> Acked-by:
Christoffer Dall <christoffer.dall@arm.com> Signed-off-by:
Punit Agrawal <punit.agrawal@arm.com> Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com>
-
Punit Agrawal authored
Contention on updating a PMD entry by a large number of vcpus can lead to duplicate work when handling stage 2 page faults. As the page table update follows the break-before-make requirement of the architecture, it can lead to repeated refaults due to clearing the entry and flushing the tlbs. This problem is more likely when - * there are large number of vcpus * the mapping is large block mapping such as when using PMD hugepages (512MB) with 64k pages. Fix this by skipping the page table update if there is no change in the entry being updated. Cc: stable@vger.kernel.org Fixes: ad361f09 ("KVM: ARM: Support hugetlbfs backed huge pages") Reviewed-by:
Suzuki Poulose <suzuki.poulose@arm.com> Acked-by:
Christoffer Dall <christoffer.dall@arm.com> Signed-off-by:
Punit Agrawal <punit.agrawal@arm.com> Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com>
-
- Aug 12, 2018
-
-
Jia He authored
kvm_vgic_sync_hwstate is only called with IRQ being disabled. There is thus no need to call spin_lock_irqsave/restore in vgic_fold_lr_state and vgic_prune_ap_list. This patch replace them with the non irq-safe version. Signed-off-by:
Jia He <jia.he@hxt-semitech.com> Acked-by:
Christoffer Dall <christoffer.dall@arm.com> [maz: commit message tidy-up] Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com>
-
Jia He authored
DEBUG_SPINLOCK_BUG_ON can be used with both vgic-v2 and vgic-v3, so let's move it to vgic.h Signed-off-by:
Jia He <jia.he@hxt-semitech.com> [maz: commit message tidy-up] Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com>
-
Marc Zyngier authored
Although vgic-v3 now supports Group0 interrupts, it still doesn't deal with Group0 SGIs. As usually with the GIC, nothing is simple: - ICC_SGI1R can signal SGIs of both groups, since GICD_CTLR.DS==1 with KVM (as per 8.1.10, Non-secure EL1 access) - ICC_SGI0R can only generate Group0 SGIs - ICC_ASGI1R sees its scope refocussed to generate only Group0 SGIs (as per the note at the bottom of Table 8-14) We only support Group1 SGIs so far, so no material change. Reviewed-by:
Eric Auger <eric.auger@redhat.com> Reviewed-by:
Christoffer Dall <christoffer.dall@arm.com> Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com>
-
- Aug 06, 2018
-
-
Paolo Bonzini authored
We are currently cutting hva_to_pfn_fast short if we do not want an immediate exit, which is represented by !async && !atomic. However, this is unnecessary, and __get_user_pages_fast is *much* faster because the regular get_user_pages takes pmd_lock/pte_lock. In fact, when many CPUs take a nested vmexit at the same time the contention on those locks is visible, and this patch removes about 25% (compared to 4.18) from vmexit.flat on a 16 vCPU nested guest. Suggested-by:
Andrea Arcangeli <aarcange@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Tianyu Lan authored
This patch is to provide a way for platforms to register hv tlb remote flush callback and this helps to optimize operation of tlb flush among vcpus for nested virtualization case. Signed-off-by:
Lan Tianyu <Tianyu.Lan@microsoft.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Junaid Shahid authored
Use the fast CR3 switch mechanism to locklessly change the MMU root page when switching between L1 and L2. The switch from L2 to L1 should always go through the fast path, while the switch from L1 to L2 should go through the fast path if L1's CR3/EPTP for L2 hasn't changed since the last time. Signed-off-by:
Junaid Shahid <junaids@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- Jul 31, 2018
-
-
Christoffer Dall authored
When the VCPU is blocked (for example from WFI) we don't inject the physical timer interrupt if it should fire while the CPU is blocked, but instead we just wake up the VCPU and expect kvm_timer_vcpu_load to take care of injecting the interrupt. Unfortunately, kvm_timer_vcpu_load() doesn't actually do that, it only has support to schedule a soft timer if the emulated phys timer is expected to fire in the future. Follow the same pattern as kvm_timer_update_state() and update the irq state after potentially scheduling a soft timer. Reported-by:
Andre Przywara <andre.przywara@arm.com> Cc: Stable <stable@vger.kernel.org> # 4.15+ Fixes: bbdd52cf ("KVM: arm/arm64: Avoid phys timer emulation in vcpu entry/exit") Signed-off-by:
Christoffer Dall <christoffer.dall@arm.com> Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com>
-
Christoffer Dall authored
kvm_timer_update_state() is called when changing the phys timer configuration registers, either via vcpu reset, as a result of a trap from the guest, or when userspace programs the registers. phys_timer_emulate() is in turn called by kvm_timer_update_state() to either cancel an existing software timer, or program a new software timer, to emulate the behavior of a real phys timer, based on the change in configuration registers. Unfortunately, the interaction between these two functions left a small race; if the conceptual emulated phys timer should actually fire, but the soft timer hasn't executed its callback yet, we cancel the timer in phys_timer_emulate without injecting an irq. This only happens if the check in kvm_timer_update_state is called before the timer should fire, which is relatively unlikely, but possible. The solution is to update the state of the phys timer after calling phys_timer_emulate, which will pick up the pending timer state and update the interrupt value. Note that this leaves the opportunity of raising the interrupt twice, once in the just-programmed soft timer, and once in kvm_timer_update_state. Since this always happens synchronously with the VCPU execution, there is no harm in this, and the guest ever only sees a single timer interrupt. Cc: Stable <stable@vger.kernel.org> # 4.15+ Signed-off-by:
Christoffer Dall <christoffer.dall@arm.com> Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com>
-
- Jul 24, 2018
-
-
Mark Rutland authored
It's possible for userspace to control n. Sanitize n when using it as an array index, to inhibit the potential spectre-v1 write gadget. Note that while it appears that n must be bound to the interval [0,3] due to the way it is extracted from addr, we cannot guarantee that compiler transformations (and/or future refactoring) will ensure this is the case, and given this is a slow path it's better to always perform the masking. Found by smatch. Signed-off-by:
Mark Rutland <mark.rutland@arm.com> Cc: Christoffer Dall <christoffer.dall@arm.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: kvmarm@lists.cs.columbia.edu Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com>
-
- Jul 21, 2018
-
-
Eric W. Biederman authored
Signed-off-by:
"Eric W. Biederman" <ebiederm@xmission.com>
-
James Morse authored
arm64's new use of KVMs get_events/set_events API calls isn't just or RAS, it allows an SError that has been made pending by KVM as part of its device emulation to be migrated. Wire this up for 32bit too. We only need to read/write the HCR_VA bit, and check that no esr has been provided, as we don't yet support VDFSR. Signed-off-by:
James Morse <james.morse@arm.com> Reviewed-by:
Dongjiu Geng <gengdongjiu@huawei.com> Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com>
-
James Morse authored
The get/set events helpers to do some work to check reserved and padding fields are zero. This is useful on 32bit too. Move this code into virt/kvm/arm/arm.c, and give the arch code some underscores. This is temporarily hidden behind __KVM_HAVE_VCPU_EVENTS until 32bit is wired up. Signed-off-by:
James Morse <james.morse@arm.com> Reviewed-by:
Dongjiu Geng <gengdongjiu@huawei.com> Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com>
-
Dongjiu Geng authored
For the migrating VMs, user space may need to know the exception state. For example, in the machine A, KVM make an SError pending, when migrate to B, KVM also needs to pend an SError. This new IOCTL exports user-invisible states related to SError. Together with appropriate user space changes, user space can get/set the SError exception state to do migrate/snapshot/suspend. Signed-off-by:
Dongjiu Geng <gengdongjiu@huawei.com> Reviewed-by:
James Morse <james.morse@arm.com> [expanded documentation wording] Signed-off-by:
James Morse <james.morse@arm.com> Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com>
-
Christoffer Dall authored
Simply letting IGROUPR be writable from userspace would break migration from old kernels to newer kernels, because old kernels incorrectly report interrupt groups as group 1. This would not be a big problem if userspace wrote GICD_IIDR as read from the kernel, because we could detect the incompatibility and return an error to userspace. Unfortunately, this is not the case with current userspace implementations and simply letting IGROUPR be writable from userspace for an emulated GICv2 silently breaks migration and causes the destination VM to no longer run after migration. We now encourage userspace to write the read and expected value of GICD_IIDR as the first part of a GIC register restore, and if we observe a write to GICD_IIDR we know that userspace has been updated and has had a chance to cope with older kernels (VGICv2 IIDR.Revision == 0) incorrectly reporting interrupts as group 1, and therefore we now allow groups to be user writable. Reviewed-by:
Andrew Jones <drjones@redhat.com> Signed-off-by:
Christoffer Dall <christoffer.dall@arm.com> Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com>
-
Christoffer Dall authored
Implement the required MMIO accessors for GICv2 and GICv3 for the IGROUPR distributor and redistributor registers. This can allow guests to change behavior compared to running on previous versions of KVM, but only to align with the architecture and hardware implementations. This also allows userspace to configure the interrupts groups for GICv3. We don't allow userspace to write the groups on GICv2 just yet, because that would result in GICv2 guests not receiving interrupts after migrating from an older kernel that exposes GICv2 interrupts as group 1. Reviewed-by:
Andrew Jones <drjones@redhat.com> Signed-off-by:
Christoffer Dall <christoffer.dall@arm.com> Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com>
-
Christoffer Dall authored
If userspace attempts to write a GICD_IIDR that does not match the kernel version, return an error to userspace. The intention is to allow implementation changes inside KVM while avoiding silently breaking migration resulting in guests not running without any clear indication of what went wrong. Reviewed-by:
Andrew Jones <drjones@redhat.com> Signed-off-by:
Christoffer Dall <christoffer.dall@arm.com> Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com>
-
Christoffer Dall authored
Currently we do not allow any vgic mmio write operations to fail, which makes sense from mmio traps from the guest. However, we should be able to report failures to userspace, if userspace writes incompatible values to read-only registers. Rework the internal interface to allow errors to be returned on the write side for userspace writes. Reviewed-by:
Andrew Jones <drjones@redhat.com> Signed-off-by:
Christoffer Dall <christoffer.dall@arm.com> Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com>
-
Christoffer Dall authored
Now when we have a group configuration on the struct IRQ, use this state when populating the LR and signaling interrupts as either group 0 or group 1 to the VM. Depending on the model of the emulated GIC, and the guest's configuration of the VMCR, interrupts may be signaled as IRQs or FIQs to the VM. Reviewed-by:
Andrew Jones <drjones@redhat.com> Signed-off-by:
Christoffer Dall <christoffer.dall@arm.com> Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com>
-
Christoffer Dall authored
In preparation for proper group 0 and group 1 support in the vgic, we add a field in the struct irq to store the group of all interrupts. We initialize the group to group 0 when emulating GICv2 and to group 1 when emulating GICv3, just like we treat them today. LPIs are always group 1. We also continue to ignore writes from the guest, preserving existing functionality, for now. Finally, we also add this field to the vgic debug logic to show the group for all interrupts. Reviewed-by:
Andrew Jones <drjones@redhat.com> Signed-off-by:
Christoffer Dall <christoffer.dall@arm.com> Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com>
-
Christoffer Dall authored
We currently don't support grouping in the emulated VGIC, which is a known defect on KVM (not hurting any currently used guests as far as we're aware). This is currently handled by treating all interrupts as group 0 interrupts for an emulated GICv2 and always signaling interrupts as group 0 to the virtual CPU interface. However, when reading which group interrupts belongs to in the guest from the emulated VGIC, the VGIC currently reports group 1 instead of group 0, which is misleading. Fix this temporarily before introducing full group support by changing the hander to _raz instead of _rao. Fixes: fb848db3 "KVM: arm/arm64: vgic-new: Add GICv2 MMIO handling framework" Reviewed-by:
Andrew Jones <drjones@redhat.com> Signed-off-by:
Christoffer Dall <christoffer.dall@arm.com> Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com>
-
Christoffer Dall authored
As we are about to tweak implementation aspects of the VGIC emulation, while still preserving some level of backwards compatibility support, add a field to keep track of the implementation revision field which is reported to the VM and to userspace. Reviewed-by:
Andrew Jones <drjones@redhat.com> Signed-off-by:
Christoffer Dall <christoffer.dall@arm.com> Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com>
-
Christoffer Dall authored
Instead of hardcoding the shifts and masks in the GICD_IIDR register emulation, let's add the definition of these fields to the GIC header files and use them. This will make things more obvious when we're going to bump the revision in the IIDR when we'll make guest-visible changes to the implementation. Reviewed-by:
Andrew Jones <drjones@redhat.com> Signed-off-by:
Christoffer Dall <christoffer.dall@arm.com> Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com>
-
Marc Zyngier authored
The vgic debugfs file only knows about SGI/PPI/SPI interrupts, and completely ignores LPIs. Let's fix that. Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com>
-
Kees Cook authored
In the quest to remove all stack VLA usage from the kernel[1], this switches to using a maximum size and adds sanity checks. Additionally cleans up some of the int-vs-u32 usage and adds additional bounds checking. As it currently stands, this will always be 8 bytes until the ABI changes. [1] https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com Cc: Christoffer Dall <christoffer.dall@arm.com> Cc: Eric Auger <eric.auger@redhat.com> Cc: Andre Przywara <andre.przywara@arm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: kvmarm@lists.cs.columbia.edu Signed-off-by:
Kees Cook <keescook@chromium.org> [maz: dropped WARN_ONs] Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com>
-
Christoffer Dall authored
The vgic_init function can race with kvm_arch_vcpu_create() which does not hold kvm_lock() and we therefore have no synchronization primitives to ensure we're doing the right thing. As the user is trying to initialize or run the VM while at the same time creating more VCPUs, we just have to refuse to initialize the VGIC in this case rather than silently failing with a broken VCPU. Reviewed-by:
Eric Auger <eric.auger@redhat.com> Signed-off-by:
Christoffer Dall <christoffer.dall@arm.com> Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com>
-
- Jul 18, 2018
-
-
Paolo Bonzini authored
A comment warning against this bug is there, but the code is not doing what the comment says. Therefore it is possible that an EPOLLHUP races against irq_bypass_register_consumer. The EPOLLHUP handler schedules irqfd_shutdown, and if that runs soon enough, you get a use-after-free. Reported-by:
syzbot <syzkaller@googlegroups.com> Cc: stable@vger.kernel.org Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Reviewed-by:
David Hildenbrand <david@redhat.com>
-
Lan Tianyu authored
Syzbot reports crashes in kvm_irqfd_assign(), caused by use-after-free when kvm_irqfd_assign() and kvm_irqfd_deassign() run in parallel for one specific eventfd. When the assign path hasn't finished but irqfd has been added to kvm->irqfds.items list, another thead may deassign the eventfd and free struct kvm_kernel_irqfd(). The assign path then uses the struct kvm_kernel_irqfd that has been freed by deassign path. To avoid such issue, keep irqfd under kvm->irq_srcu protection after the irqfd has been added to kvm->irqfds.items list, and call synchronize_srcu() in irq_shutdown() to make sure that irqfd has been fully initialized in the assign path. Reported-by:
Dmitry Vyukov <dvyukov@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Signed-off-by:
Tianyu Lan <tianyu.lan@intel.com> Cc: stable@vger.kernel.org Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- Jul 13, 2018
-
-
Claudio Imbrenda authored
Introduce a utility function that will be used later on for storage attributes migration, and use it in kvm_main.c to replace existing code that does the same thing. Signed-off-by:
Claudio Imbrenda <imbrenda@linux.vnet.ibm.com> Message-Id: <1525106005-13931-2-git-send-email-imbrenda@linux.vnet.ibm.com> Signed-off-by:
Christian Borntraeger <borntraeger@de.ibm.com>
-
- Jul 09, 2018
-
-
Marc Zyngier authored
Trapping blocking WFE is extremely beneficial in situations where the system is oversubscribed, as it allows another thread to run while being blocked. In a non-oversubscribed environment, this is the complete opposite, and trapping WFE is just unnecessary overhead. Let's only enable WFE trapping if the CPU has more than a single task to run (that is, more than just the vcpu thread). Reviewed-by:
Christoffer Dall <christoffer.dall@arm.com> Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com>
-
Marc Zyngier authored
There is no need to perform cache maintenance operations when creating the HYP page tables if we have the multiprocessing extensions. ARMv7 mandates them with the virtualization support, and ARMv8 just mandates them unconditionally. Let's remove these operations. Acked-by:
Mark Rutland <mark.rutland@arm.com> Acked-by:
Christoffer Dall <christoffer.dall@arm.com> Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com>
-
Marc Zyngier authored
The {pmd,pud,pgd}_populate accessors usage have always been a bit weird in KVM. We don't have a struct mm to pass (and neither does the kernel most of the time, but still...), and the 32bit code has all kind of cache maintenance that doesn't make sense on ARMv7+ when MP extensions are mandatory (which is the case when the VEs are present). Let's bite the bullet and provide our own implementations. The only bit of architectural code left has to do with building the table entry itself (arm64 having up to 52bit PA, arm lacking PUD level). Acked-by:
Mark Rutland <mark.rutland@arm.com> Acked-by:
Christoffer Dall <christoffer.dall@arm.com> Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com>
-
Marc Zyngier authored
The arm and arm64 KVM page tables accessors are pointlessly different between the two architectures, and likely both wrong one way or another: arm64 lacks a dsb(), and arm doesn't use WRITE_ONCE. Let's unify them. Acked-by:
Mark Rutland <mark.rutland@arm.com> Acked-by:
Christoffer Dall <christoffer.dall@arm.com> Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com>
-
Marc Zyngier authored
Up to ARMv8.3, the combinaison of Stage-1 and Stage-2 attributes results in the strongest attribute of the two stages. This means that the hypervisor has to perform quite a lot of cache maintenance just in case the guest has some non-cacheable mappings around. ARMv8.4 solves this problem by offering a different mode (FWB) where Stage-2 has total control over the memory attribute (this is limited to systems where both I/O and instruction fetches are coherent with the dcache). This is achieved by having a different set of memory attributes in the page tables, and a new bit set in HCR_EL2. On such a system, we can then safely sidestep any form of dcache management. Acked-by:
Catalin Marinas <catalin.marinas@arm.com> Reviewed-by:
Christoffer Dall <christoffer.dall@arm.com> Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com>
-
- Jul 05, 2018
-
-
Mark Rutland authored
Some code cares about the SPSR_ELx format for exceptions taken from AArch32 to inspect or manipulate the SPSR_ELx value, which is already in the SPSR_ELx format, and not in the AArch32 PSR format. To separate these from cases where we care about the AArch32 PSR format, migrate these cases to use the PSR_AA32_* definitions rather than COMPAT_PSR_*. There should be no functional change as a result of this patch. Note that arm64 KVM does not support a compat KVM API, and always uses the SPSR_ELx format, even for AArch32 guests. Signed-off-by:
Mark Rutland <mark.rutland@arm.com> Acked-by:
Christoffer Dall <christoffer.dall@arm.com> Acked-by:
Marc Zyngier <marc.zyngier@arm.com> Signed-off-by:
Will Deacon <will.deacon@arm.com>
-
- Jun 21, 2018
-
-
Marc Zyngier authored
There is very little point in trying to support the 32bit KVM/arm API on arm64, and this was never an anticipated use case. Let's make it clear by not selecting KVM_COMPAT. Acked-by:
Mark Rutland <mark.rutland@arm.com> Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com>
-
Marc Zyngier authored
The current behaviour of the compat ioctls is a bit odd. We provide a compat_ioctl method when KVM_COMPAT is set, and NULL otherwise. But NULL means that the normal, non-compat ioctl should be used directly for compat tasks, and there is no way to actually prevent a compat task from issueing KVM ioctls. This patch changes this behaviour, by always registering a compat_ioctl method, even if KVM_COMPAT is not selected. In that case, the callback will always return -EINVAL. Fixes: de8e5d74 ("KVM: Disable compat ioctl for s390") Reported-by:
Mark Rutland <mark.rutland@arm.com> Acked-by:
Christian Borntraeger <borntraeger@de.ibm.com> Acked-by:
Radim Krčmář <rkrcmar@redhat.com> Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com>
-
Jia He authored
There is a panic in armv8a server(QDF2400) under memory pressure tests (start 20 guests and run memhog in the host). ---------------------------------begin-------------------------------- [35380.800950] BUG: Bad page state in process qemu-kvm pfn:dd0b6 [35380.805825] page:ffff7fe003742d80 count:-4871 mapcount:-2126053375 mapping: (null) index:0x0 [35380.815024] flags: 0x1fffc00000000000() [35380.818845] raw: 1fffc00000000000 0000000000000000 0000000000000000 ffffecf981470000 [35380.826569] raw: dead000000000100 dead000000000200 ffff8017c001c000 0000000000000000 [35380.805825] page:ffff7fe003742d80 count:-4871 mapcount:-2126053375 mapping: (null) index:0x0 [35380.815024] flags: 0x1fffc00000000000() [35380.818845] raw: 1fffc00000000000 0000000000000000 0000000000000000 ffffecf981470000 [35380.826569] raw: dead000000000100 dead000000000200 ffff8017c001c000 0000000000000000 [35380.834294] page dumped because: nonzero _refcount [...] --------------------------------end-------------------------------------- The root cause might be what was fixed at [1]. But from the KVM points of view, it would be better if the issue was caught earlier. If the size is not PAGE_SIZE aligned, unmap_stage2_range might unmap the wrong(more or less) page range. Hence it caused the "BUG: Bad page state" Let's WARN in that case, so that the issue is obvious. [1] https://lkml.org/lkml/2018/5/3/1042 Reviewed-by:
Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by:
<jia.he@hxt-semitech.com> [maz: tidied up commit message] Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com>
-