- Mar 11, 2014
-
-
Paolo Bonzini authored
When not running in guest-debug mode, the guest controls the debug registers and having to take an exit for each DR access is a waste of time. If the guest gets into a state where each context switch causes DR to be saved and restored, this can take away as much as 40% of the execution time from the guest. After this patch, VMX- and SVM-specific code can set a flag in switch_db_regs, telling vcpu_enter_guest that on the next exit the debug registers might be dirty and need to be reloaded (syncing will be taken care of by a new callback in kvm_x86_ops). This flag can be set on the first access to a debug registers, so that multiple accesses to the debug registers only cause one vmexit. Note that since the guest will be able to read debug registers and enable breakpoints in DR7, we need to ensure that they are synchronized on entry to the guest---including DR6 that was not synced before. Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
The next patch will add another bit that we can test with the same "if". Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Jan Kiszka authored
It's no longer possible to enter enable_irq_window in guest mode when L1 intercepts external interrupts and we are entering L2. This is now caught in vcpu_enter_guest. So we can remove the check from the VMX version of enable_irq_window, thus the need to return an error code from both enable_irq_window and enable_nmi_window. Signed-off-by:
Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Jan Kiszka authored
Move the check for leaving L2 on pending and intercepted IRQs or NMIs from the *_allowed handler into a dedicated callback. Invoke this callback at the relevant points before KVM checks if IRQs/NMIs can be injected. The callback has the task to switch from L2 to L1 if needed and inject the proper vmexit events. The rework fixes L2 wakeups from HLT and provides the foundation for preemption timer emulation. Signed-off-by:
Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- Mar 04, 2014
-
-
Andrew Jones authored
commit 0061d53d introduced a mechanism to execute a global clock update for a vm. We can apply this periodically in order to propagate host NTP corrections. Also, if all vcpus of a vm are pinned, then without an additional trigger, no guest NTP corrections can propagate either, as the current trigger is only vcpu cpu migration. Signed-off-by:
Andrew Jones <drjones@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Andrew Jones authored
When we update a vcpu's local clock it may pick up an NTP correction. We can't wait an indeterminate amount of time for other vcpus to pick up that correction, so commit 0061d53d introduced a global clock update. However, we can't request a global clock update on every vcpu load either (which is what happens if the tsc is marked as unstable). The solution is to rate-limit the global clock updates. Marcelo calculated that we should delay the global clock updates no more than 0.1s as follows: Assume an NTP correction c is applied to one vcpu, but not the other, then in n seconds the delta of the vcpu system_timestamps will be c * n. If we assume a correction of 500ppm (worst-case), then the two vcpus will diverge 50us in 0.1s, which is a considerable amount. Signed-off-by:
Andrew Jones <drjones@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- Feb 27, 2014
-
-
Andrew Honig authored
The problem occurs when the guest performs a pusha with the stack address pointing to an mmio address (or an invalid guest physical address) to start with, but then extending into an ordinary guest physical address. When doing repeated emulated pushes emulator_read_write sets mmio_needed to 1 on the first one. On a later push when the stack points to regular memory, mmio_nr_fragments is set to 0, but mmio_is_needed is not set to 0. As a result, KVM exits to userspace, and then returns to complete_emulated_mmio. In complete_emulated_mmio vcpu->mmio_cur_fragment is incremented. The termination condition of vcpu->mmio_cur_fragment == vcpu->mmio_nr_fragments is never achieved. The code bounces back and fourth to userspace incrementing mmio_cur_fragment past it's buffer. If the guest does nothing else it eventually leads to a a crash on a memcpy from invalid memory address. However if a guest code can cause the vm to be destroyed in another vcpu with excellent timing, then kvm_clear_async_pf_completion_queue can be used by the guest to control the data that's pointed to by the call to cancel_work_item, which can be used to gain execution. Fixes: f78146b0 Signed-off-by:
Andrew Honig <ahonig@google.com> Cc: stable@vger.kernel.org (3.5+) Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Takuya Yoshikawa authored
No need to scan the entire VCPU array. Signed-off-by:
Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- Feb 26, 2014
-
-
Marcelo Tosatti authored
emulator_cmpxchg_emulated writes to guest memory, therefore it should update the dirty bitmap accordingly. Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com> Reviewed-by:
Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- Feb 25, 2014
-
-
Liu, Jinsong authored
From 44c2abca2c2eadc6f2f752b66de4acc8131880c4 Mon Sep 17 00:00:00 2001 From: Liu Jinsong <jinsong.liu@intel.com> Date: Mon, 24 Feb 2014 18:12:31 +0800 Subject: [PATCH v5 3/3] KVM: x86: Enable Intel MPX for guest This patch enable Intel MPX feature to guest. Signed-off-by:
Xudong Hao <xudong.hao@intel.com> Signed-off-by:
Liu Jinsong <jinsong.liu@intel.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Liu, Jinsong authored
From 5d5a80cd172ea6fb51786369bcc23356b1e9e956 Mon Sep 17 00:00:00 2001 From: Liu Jinsong <jinsong.liu@intel.com> Date: Mon, 24 Feb 2014 18:11:55 +0800 Subject: [PATCH v5 2/3] KVM: x86: add MSR_IA32_BNDCFGS to msrs_to_save Add MSR_IA32_BNDCFGS to msrs_to_save, and corresponding logic to kvm_get/set_msr(). Signed-off-by:
Liu Jinsong <jinsong.liu@intel.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- Feb 22, 2014
-
-
Liu, Jinsong authored
From 00c920c96127d20d4c3bb790082700ae375c39a0 Mon Sep 17 00:00:00 2001 From: Liu Jinsong <jinsong.liu@intel.com> Date: Fri, 21 Feb 2014 23:47:18 +0800 Subject: [PATCH] KVM: x86: Fix xsave cpuid exposing bug EBX of cpuid(0xD, 0) is dynamic per XCR0 features enable/disable. Bit 63 of XCR0 is reserved for future expansion. Signed-off-by:
Liu Jinsong <jinsong.liu@intel.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- Feb 04, 2014
-
-
Marcelo Tosatti authored
Remove unused last_kernel_ns variable. Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- Jan 29, 2014
-
-
Paolo Bonzini authored
Self explanatory. Reported-by:
Radim Krcmar <rkrcmar@redhat.com> Cc: Vadim Rozenfeld <vrozenfe@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- Jan 27, 2014
-
-
Jan Kiszka authored
Check for invalid state transitions on guest-initiated updates of MSR_IA32_APICBASE. This address both enabling of the x2APIC when it is not supported and all invalid transitions as described in SDM section 10.12.5. It also checks that no reserved bit is set in APICBASE by the guest. Signed-off-by:
Jan Kiszka <jan.kiszka@siemens.com> [Use cpuid_maxphyaddr instead of guest_cpuid_get_phys_bits. - Paolo] Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- Jan 24, 2014
-
-
Vadim Rozenfeld authored
Signed-off-by:
Vadim Rozenfeld <vrozenfe@redhat.com> Reviewed-by:
Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- Jan 23, 2014
-
-
Vadim Rozenfeld authored
Signed-off-by:
Vadim Rozenfeld <vrozenfe@redhat.com> Reviewed-by:
Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- Jan 17, 2014
-
-
Jan Kiszka authored
In contrast to VMX, SVM dose not automatically transfer DR6 into the VCPU's arch.dr6. So if we face a DR6 read, we must consult a new vendor hook to obtain the current value. And as SVM now picks the DR6 state from its VMCB, we also need a set callback in order to write updates of DR6 back. Fixes a regression of 020df079. Signed-off-by:
Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Jan Kiszka authored
Whenever we change arch.dr7, we also have to call kvm_update_dr7. In case guest debugging is off, this will synchronize the new state into hardware. Signed-off-by:
Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Vadim Rozenfeld authored
Signed-off: Peter Lieven <pl@kamp.de> Signed-off: Gleb Natapov Signed-off: Vadim Rozenfeld <vrozenfe@redhat.com> After some consideration I decided to submit only Hyper-V reference counters support this time. I will submit iTSC support as a separate patch as soon as it is ready. v1 -> v2 1. mark TSC page dirty as suggested by Eric Northup <digitaleric@google.com> and Gleb 2. disable local irq when calling get_kernel_ns, as it was done by Peter Lieven <pl@amp.de> 3. move check for TSC page enable from second patch to this one. v3 -> v4 Get rid of ref counter offset. v4 -> v5 replace __copy_to_user with kvm_write_guest when updateing iTSC page. Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- Jan 15, 2014
-
-
Paolo Bonzini authored
After the previous patch from Marcelo, the comment before this write became obsolete. In fact, the write is unnecessary. The calls to kvm_write_tsc ultimately result in a master clock update as soon as all TSCs agree and the master clock is re-enabled. This master clock update will rewrite tsc_timestamp. So, together with the comment, delete the dead write too. Reviewed-by:
Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Marcelo Tosatti authored
To fix a problem related to different resolution of TSC and system clock, the offset in TSC units is approximated by delta = vcpu->hv_clock.tsc_timestamp - vcpu->last_guest_tsc (Guest TSC value at (Guest TSC value at last VM-exit) the last kvm_guest_time_update call) Delta is then later scaled using mult,shift pair found in hv_clock structure (which is correct against tsc_timestamp in that structure). However, if a frequency change is performed between these two points, this delta is measured using different TSC frequencies, but scaled using mult,shift pair for one frequency only. The end result is an incorrect delta. The bug which this code works around is not the only cause for clock backwards events. The global accumulator is still necessary, so remove the max_kernel_ns fix and rely on the global accumulator for no clock backwards events. Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Marcelo Tosatti authored
Limit PIT timer frequency similarly to the limit applied by LAPIC timer. Cc: stable@kernel.org Reviewed-by:
Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- Dec 13, 2013
-
-
Takuya Yoshikawa authored
Giving proper names to the 0 and 1 was once suggested. But since 0 is returned to the userspace, giving it another name can introduce extra confusion. This patch just explains the meanings instead. Signed-off-by:
Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Takuya Yoshikawa authored
Since the commit 15ad7146 ("KVM: Use the scheduler preemption notifiers to make kvm preemptible"), the remaining stuff in this function is a simple cond_resched() call with an extra need_resched() check which was there to avoid dropping VCPUs unnecessarily. Now it is meaningless. Signed-off-by:
Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- Dec 12, 2013
-
-
Andy Honig authored
In kvm_lapic_sync_from_vapic and kvm_lapic_sync_to_vapic there is the potential to corrupt kernel memory if userspace provides an address that is at the end of a page. This patches concerts those functions to use kvm_write_guest_cached and kvm_read_guest_cached. It also checks the vapic_address specified by userspace during ioctl processing and returns an error to userspace if the address is not a valid GPA. This is generally not guest triggerable, because the required write is done by firmware that runs before the guest. Also, it only affects AMD processors and oldish Intel that do not have the FlexPriority feature (unless you disable FlexPriority, of course; then newer processors are also affected). Fixes: b93463aa ('KVM: Accelerated apic support') Reported-by:
Andrew Honig <ahonig@google.com> Cc: stable@vger.kernel.org Signed-off-by:
Andrew Honig <ahonig@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- Nov 06, 2013
-
-
Michael S. Tsirkin authored
I noticed that srcu_read_lock/unlock both have a memory barrier, so just by moving srcu_read_unlock earlier we can get rid of one call to smp_mb() using smp_mb__after_srcu_read_unlock instead. Unsurprisingly, the gain is small but measureable using the unit test microbenchmark: before vmcall in the ballpark of 1410 cycles after vmcall in the ballpark of 1360 cycles Signed-off-by:
Michael S. Tsirkin <mst@redhat.com> Signed-off-by:
Gleb Natapov <gleb@redhat.com>
-
- Oct 31, 2013
-
-
Paolo Bonzini authored
The loop was always using 0 as the index. This means that any rubbish after the first element of the array went undetected. It seems reasonable to assume that no KVM userspace did that. Reviewed-by:
Gleb Natapov <gleb@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
The KVM_SET_XCRS ioctl must accept anything that KVM_GET_XCRS could return. XCR0's bit 0 is always 1 in real processors with XSAVE, and KVM_GET_XCRS will always leave bit 0 set even if the emulated processor does not have XSAVE. So, KVM_SET_XCRS must ignore that bit when checking for attempts to enable unsupported save states. Reviewed-by:
Gleb Natapov <gleb@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- Oct 30, 2013
-
-
Alex Williamson authored
We currently use some ad-hoc arch variables tied to legacy KVM device assignment to manage emulation of instructions that depend on whether non-coherent DMA is present. Create an interface for this, adapting legacy KVM device assignment and adding VFIO via the KVM-VFIO device. For now we assume that non-coherent DMA is possible any time we have a VFIO group. Eventually an interface can be developed as part of the VFIO external user interface to query the coherency of a group. Signed-off-by:
Alex Williamson <alex.williamson@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Alex Williamson authored
Default to operating in coherent mode. This simplifies the logic when we switch to a model of registering and unregistering noncoherent I/O with KVM. Signed-off-by:
Alex Williamson <alex.williamson@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Borislav Petkov authored
Call it EmulateOnUD which is exactly what we're trying to do with vendor-specific instructions. Rename ->only_vendor_specific_insn to something shorter, while at it. Signed-off-by:
Borislav Petkov <bp@suse.de> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Borislav Petkov authored
Add a field to the current emulation context which contains the instruction opcode length. This will streamline handling of opcodes of different length. Signed-off-by:
Borislav Petkov <bp@suse.de> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Borislav Petkov authored
Add a kvm ioctl which states which system functionality kvm emulates. The format used is that of CPUID and we return the corresponding CPUID bits set for which we do emulate functionality. Make sure ->padding is being passed on clean from userspace so that we can use it for something in the future, after the ioctl gets cast in stone. s/kvm_dev_ioctl_get_supported_cpuid/kvm_dev_ioctl_get_cpuid/ while at it. Signed-off-by:
Borislav Petkov <bp@suse.de> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- Oct 17, 2013
-
-
Aneesh Kumar K.V authored
We will use that in the later patch to find the kvm ops handler Signed-off-by:
Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by:
Alexander Graf <agraf@suse.de>
-
- Oct 15, 2013
-
-
chai wen authored
Page pinning is not mandatory in kvm async page fault processing since after async page fault event is delivered to a guest it accesses page once again and does its own GUP. Drop the FOLL_GET flag in GUP in async_pf code, and do some simplifying in check/clear processing. Suggested-by:
Gleb Natapov <gleb@redhat.com> Signed-off-by:
Gu zheng <guz.fnst@cn.fujitsu.com> Signed-off-by:
chai wen <chaiw.fnst@cn.fujitsu.com> Signed-off-by:
Gleb Natapov <gleb@redhat.com>
-
- Oct 03, 2013
-
-
Paolo Bonzini authored
kvm_mmu initialization is mostly filling in function pointers, there is no way for it to fail. Clean up unused return values. Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Signed-off-by:
Gleb Natapov <gleb@redhat.com>
-
Paolo Bonzini authored
The new_cr3 MMU callback has been a wrapper for mmu_free_roots since commit e676505a (KVM: MMU: Force cr3 reload with two dimensional paging on mov cr3 emulation, 2012-07-08). The commit message mentioned that "mmu_free_roots() is somewhat of an overkill, but fixing that is more complicated and will be done after this minimal fix". One year has passed, and no one really felt the need to do a different fix. Wrap the call with a kvm_mmu_new_cr3 function for clarity, but remove the callback. Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Signed-off-by:
Gleb Natapov <gleb@redhat.com>
-
Paolo Bonzini authored
This makes the interface more deterministic for userspace, which can expect (after configuring only the features it supports) to get exactly the same state from the kernel, independent of the host CPU and kernel version. Suggested-by:
Gleb Natapov <gleb@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Signed-off-by:
Gleb Natapov <gleb@redhat.com>
-
Paolo Bonzini authored
A guest can still attempt to save and restore XSAVE states even if they have been masked in CPUID leaf 0Dh. This usually is not visible to the guest, but is still wrong: "Any attempt to set a reserved bit (as determined by the contents of EAX and EDX after executing CPUID with EAX=0DH, ECX= 0H) in XCR0 for a given processor will result in a #GP exception". The patch also performs the same checks as __kvm_set_xcr in KVM_SET_XSAVE. This catches migration from newer to older kernel/processor before the guest starts running. Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Signed-off-by:
Gleb Natapov <gleb@redhat.com>
-