History log of /linux/arch/x86/kvm/vmx/nested.c (Results 1 – 25 of 2159)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# ab93e0dd 06-Aug-2025 Dmitry Torokhov <dmitry.torokhov@gmail.com>

Merge branch 'next' into for-linus

Prepare input updates for 6.17 merge window.


# a7bee4e7 04-Aug-2025 Dmitry Torokhov <dmitry.torokhov@gmail.com>

Merge tag 'ib-mfd-gpio-input-pwm-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd into next

Merge an immutable branch between MFD, GPIO, Input and PWM to resolve
conflicts for the mer

Merge tag 'ib-mfd-gpio-input-pwm-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd into next

Merge an immutable branch between MFD, GPIO, Input and PWM to resolve
conflicts for the merge window pull request.

show more ...


# 63eb28bb 31-Jul-2025 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
"ARM:

- Host driver for GICv5, the next generation interrupt controller for
arm64, i

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
"ARM:

- Host driver for GICv5, the next generation interrupt controller for
arm64, including support for interrupt routing, MSIs, interrupt
translation and wired interrupts

- Use FEAT_GCIE_LEGACY on GICv5 systems to virtualize GICv3 VMs on
GICv5 hardware, leveraging the legacy VGIC interface

- Userspace control of the 'nASSGIcap' GICv3 feature, allowing
userspace to disable support for SGIs w/o an active state on
hardware that previously advertised it unconditionally

- Map supporting endpoints with cacheable memory attributes on
systems with FEAT_S2FWB and DIC where KVM no longer needs to
perform cache maintenance on the address range

- Nested support for FEAT_RAS and FEAT_DoubleFault2, allowing the
guest hypervisor to inject external aborts into an L2 VM and take
traps of masked external aborts to the hypervisor

- Convert more system register sanitization to the config-driven
implementation

- Fixes to the visibility of EL2 registers, namely making VGICv3
system registers accessible through the VGIC device instead of the
ONE_REG vCPU ioctls

- Various cleanups and minor fixes

LoongArch:

- Add stat information for in-kernel irqchip

- Add tracepoints for CPUCFG and CSR emulation exits

- Enhance in-kernel irqchip emulation

- Various cleanups

RISC-V:

- Enable ring-based dirty memory tracking

- Improve perf kvm stat to report interrupt events

- Delegate illegal instruction trap to VS-mode

- MMU improvements related to upcoming nested virtualization

s390x

- Fixes

x86:

- Add CONFIG_KVM_IOAPIC for x86 to allow disabling support for I/O
APIC, PIC, and PIT emulation at compile time

- Share device posted IRQ code between SVM and VMX and harden it
against bugs and runtime errors

- Use vcpu_idx, not vcpu_id, for GA log tag/metadata, to make lookups
O(1) instead of O(n)

- For MMIO stale data mitigation, track whether or not a vCPU has
access to (host) MMIO based on whether the page tables have MMIO
pfns mapped; using VFIO is prone to false negatives

- Rework the MSR interception code so that the SVM and VMX APIs are
more or less identical

- Recalculate all MSR intercepts from scratch on MSR filter changes,
instead of maintaining shadow bitmaps

- Advertise support for LKGS (Load Kernel GS base), a new instruction
that's loosely related to FRED, but is supported and enumerated
independently

- Fix a user-triggerable WARN that syzkaller found by setting the
vCPU in INIT_RECEIVED state (aka wait-for-SIPI), and then putting
the vCPU into VMX Root Mode (post-VMXON). Trying to detect every
possible path leading to architecturally forbidden states is hard
and even risks breaking userspace (if it goes from valid to valid
state but passes through invalid states), so just wait until
KVM_RUN to detect that the vCPU state isn't allowed

- Add KVM_X86_DISABLE_EXITS_APERFMPERF to allow disabling
interception of APERF/MPERF reads, so that a "properly" configured
VM can access APERF/MPERF. This has many caveats (APERF/MPERF
cannot be zeroed on vCPU creation or saved/restored on suspend and
resume, or preserved over thread migration let alone VM migration)
but can be useful whenever you're interested in letting Linux
guests see the effective physical CPU frequency in /proc/cpuinfo

- Reject KVM_SET_TSC_KHZ for vm file descriptors if vCPUs have been
created, as there's no known use case for changing the default
frequency for other VM types and it goes counter to the very reason
why the ioctl was added to the vm file descriptor. And also, there
would be no way to make it work for confidential VMs with a
"secure" TSC, so kill two birds with one stone

- Dynamically allocation the shadow MMU's hashed page list, and defer
allocating the hashed list until it's actually needed (the TDP MMU
doesn't use the list)

- Extract many of KVM's helpers for accessing architectural local
APIC state to common x86 so that they can be shared by guest-side
code for Secure AVIC

- Various cleanups and fixes

x86 (Intel):

- Preserve the host's DEBUGCTL.FREEZE_IN_SMM when running the guest.
Failure to honor FREEZE_IN_SMM can leak host state into guests

- Explicitly check vmcs12.GUEST_DEBUGCTL on nested VM-Enter to
prevent L1 from running L2 with features that KVM doesn't support,
e.g. BTF

x86 (AMD):

- WARN and reject loading kvm-amd.ko instead of panicking the kernel
if the nested SVM MSRPM offsets tracker can't handle an MSR (which
is pretty much a static condition and therefore should never
happen, but still)

- Fix a variety of flaws and bugs in the AVIC device posted IRQ code

- Inhibit AVIC if a vCPU's ID is too big (relative to what hardware
supports) instead of rejecting vCPU creation

- Extend enable_ipiv module param support to SVM, by simply leaving
IsRunning clear in the vCPU's physical ID table entry

- Disable IPI virtualization, via enable_ipiv, if the CPU is affected
by erratum #1235, to allow (safely) enabling AVIC on such CPUs

- Request GA Log interrupts if and only if the target vCPU is
blocking, i.e. only if KVM needs a notification in order to wake
the vCPU

- Intercept SPEC_CTRL on AMD if the MSR shouldn't exist according to
the vCPU's CPUID model

- Accept any SNP policy that is accepted by the firmware with respect
to SMT and single-socket restrictions. An incompatible policy
doesn't put the kernel at risk in any way, so there's no reason for
KVM to care

- Drop a superfluous WBINVD (on all CPUs!) when destroying a VM and
use WBNOINVD instead of WBINVD when possible for SEV cache
maintenance

- When reclaiming memory from an SEV guest, only do cache flushes on
CPUs that have ever run a vCPU for the guest, i.e. don't flush the
caches for CPUs that can't possibly have cache lines with dirty,
encrypted data

Generic:

- Rework irqbypass to track/match producers and consumers via an
xarray instead of a linked list. Using a linked list leads to
O(n^2) insertion times, which is hugely problematic for use cases
that create large numbers of VMs. Such use cases typically don't
actually use irqbypass, but eliminating the pointless registration
is a future problem to solve as it likely requires new uAPI

- Track irqbypass's "token" as "struct eventfd_ctx *" instead of a
"void *", to avoid making a simple concept unnecessarily difficult
to understand

- Decouple device posted IRQs from VFIO device assignment, as binding
a VM to a VFIO group is not a requirement for enabling device
posted IRQs

- Clean up and document/comment the irqfd assignment code

- Disallow binding multiple irqfds to an eventfd with a priority
waiter, i.e. ensure an eventfd is bound to at most one irqfd
through the entire host, and add a selftest to verify eventfd:irqfd
bindings are globally unique

- Add a tracepoint for KVM_SET_MEMORY_ATTRIBUTES to help debug issues
related to private <=> shared memory conversions

- Drop guest_memfd's .getattr() implementation as the VFS layer will
call generic_fillattr() if inode_operations.getattr is NULL

- Fix issues with dirty ring harvesting where KVM doesn't bound the
processing of entries in any way, which allows userspace to keep
KVM in a tight loop indefinitely

- Kill off kvm_arch_{start,end}_assignment() and x86's associated
tracking, now that KVM no longer uses assigned_device_count as a
heuristic for either irqbypass usage or MDS mitigation

Selftests:

- Fix a comment typo

- Verify KVM is loaded when getting any KVM module param so that
attempting to run a selftest without kvm.ko loaded results in a
SKIP message about KVM not being loaded/enabled (versus some random
parameter not existing)

- Skip tests that hit EACCES when attempting to access a file, and
print a "Root required?" help message. In most cases, the test just
needs to be run with elevated permissions"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (340 commits)
Documentation: KVM: Use unordered list for pre-init VGIC registers
RISC-V: KVM: Avoid re-acquiring memslot in kvm_riscv_gstage_map()
RISC-V: KVM: Use find_vma_intersection() to search for intersecting VMAs
RISC-V: perf/kvm: Add reporting of interrupt events
RISC-V: KVM: Enable ring-based dirty memory tracking
RISC-V: KVM: Fix inclusion of Smnpm in the guest ISA bitmap
RISC-V: KVM: Delegate illegal instruction fault to VS mode
RISC-V: KVM: Pass VMID as parameter to kvm_riscv_hfence_xyz() APIs
RISC-V: KVM: Factor-out g-stage page table management
RISC-V: KVM: Add vmid field to struct kvm_riscv_hfence
RISC-V: KVM: Introduce struct kvm_gstage_mapping
RISC-V: KVM: Factor-out MMU related declarations into separate headers
RISC-V: KVM: Use ncsr_xyz() in kvm_riscv_vcpu_trap_redirect()
RISC-V: KVM: Implement kvm_arch_flush_remote_tlbs_range()
RISC-V: KVM: Don't flush TLB when PTE is unchanged
RISC-V: KVM: Replace KVM_REQ_HFENCE_GVMA_VMID_ALL with KVM_REQ_TLB_FLUSH
RISC-V: KVM: Rename and move kvm_riscv_local_tlb_sanitize()
RISC-V: KVM: Drop the return value of kvm_riscv_vcpu_aia_init()
RISC-V: KVM: Check kvm_riscv_vcpu_alloc_vector_context() return value
KVM: arm64: selftests: Add FEAT_RAS EL2 registers to get-reg-list
...

show more ...


# 1a14928e 28-Jul-2025 Paolo Bonzini <pbonzini@redhat.com>

Merge tag 'kvm-x86-misc-6.17' of https://github.com/kvm-x86/linux into HEAD

KVM x86 misc changes for 6.17

- Prevert the host's DEBUGCTL.FREEZE_IN_SMM (Intel only) when running the
guest. Failu

Merge tag 'kvm-x86-misc-6.17' of https://github.com/kvm-x86/linux into HEAD

KVM x86 misc changes for 6.17

- Prevert the host's DEBUGCTL.FREEZE_IN_SMM (Intel only) when running the
guest. Failure to honor FREEZE_IN_SMM can bleed host state into the guest.

- Explicitly check vmcs12.GUEST_DEBUGCTL on nested VM-Enter (Intel only) to
prevent L1 from running L2 with features that KVM doesn't support, e.g. BTF.

- Intercept SPEC_CTRL on AMD if the MSR shouldn't exist according to the
vCPU's CPUID model.

- Rework the MSR interception code so that the SVM and VMX APIs are more or
less identical.

- Recalculate all MSR intercepts from the "source" on MSR filter changes, and
drop the dedicated "shadow" bitmaps (and their awful "max" size defines).

- WARN and reject loading kvm-amd.ko instead of panicking the kernel if the
nested SVM MSRPM offsets tracker can't handle an MSR.

- Advertise support for LKGS (Load Kernel GS base), a new instruction that's
loosely related to FRED, but is supported and enumerated independently.

- Fix a user-triggerable WARN that syzkaller found by stuffing INIT_RECEIVED,
a.k.a. WFS, and then putting the vCPU into VMX Root Mode (post-VMXON). Use
the same approach KVM uses for dealing with "impossible" emulation when
running a !URG guest, and simply wait until KVM_RUN to detect that the vCPU
has architecturally impossible state.

- Add KVM_X86_DISABLE_EXITS_APERFMPERF to allow disabling interception of
APERF/MPERF reads, so that a "properly" configured VM can "virtualize"
APERF/MPERF (with many caveats).

- Reject KVM_SET_TSC_KHZ if vCPUs have been created, as changing the "default"
frequency is unsupported for VMs with a "secure" TSC, and there's no known
use case for changing the default frequency for other VM types.

show more ...


Revision tags: v6.16, v6.16-rc7, v6.16-rc6, v6.16-rc5, v6.16-rc4
# a7cec208 26-Jun-2025 Jim Mattson <jmattson@google.com>

KVM: x86: Provide a capability to disable APERF/MPERF read intercepts

Allow a guest to read the physical IA32_APERF and IA32_MPERF MSRs
without interception.

The IA32_APERF and IA32_MPERF MSRs are

KVM: x86: Provide a capability to disable APERF/MPERF read intercepts

Allow a guest to read the physical IA32_APERF and IA32_MPERF MSRs
without interception.

The IA32_APERF and IA32_MPERF MSRs are not virtualized. Writes are not
handled at all. The MSR values are not zeroed on vCPU creation, saved
on suspend, or restored on resume. No accommodation is made for
processor migration or for sharing a logical processor with other
tasks. No adjustments are made for non-unit TSC multipliers. The MSRs
do not account for time the same way as the comparable PMU events,
whether the PMU is virtualized by the traditional emulation method or
the new mediated pass-through approach.

Nonetheless, in a properly constrained environment, this capability
can be combined with a guest CPUID table that advertises support for
CPUID.6:ECX.APERFMPERF[bit 0] to induce a Linux guest to report the
effective physical CPU frequency in /proc/cpuinfo. Moreover, there is
no performance cost for this capability.

Signed-off-by: Jim Mattson <jmattson@google.com>
Link: https://lore.kernel.org/r/20250530185239.2335185-3-jmattson@google.com
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Link: https://lore.kernel.org/r/20250626001225.744268-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>

show more ...


# 74f1af95 29-Jun-2025 Rob Clark <robin.clark@oss.qualcomm.com>

Merge remote-tracking branch 'drm/drm-next' into msm-next

Back-merge drm-next to (indirectly) get arm-smmu updates for making
stall-on-fault more reliable.

Signed-off-by: Rob Clark <robin.clark@oss

Merge remote-tracking branch 'drm/drm-next' into msm-next

Back-merge drm-next to (indirectly) get arm-smmu updates for making
stall-on-fault more reliable.

Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>

show more ...


Revision tags: v6.16-rc3, v6.16-rc2
# 6b1dd265 10-Jun-2025 Maxim Levitsky <mlevitsk@redhat.com>

KVM: VMX: Preserve host's DEBUGCTLMSR_FREEZE_IN_SMM while running the guest

Set/clear DEBUGCTLMSR_FREEZE_IN_SMM in GUEST_IA32_DEBUGCTL based on the
host's pre-VM-Enter value, i.e. preserve the host'

KVM: VMX: Preserve host's DEBUGCTLMSR_FREEZE_IN_SMM while running the guest

Set/clear DEBUGCTLMSR_FREEZE_IN_SMM in GUEST_IA32_DEBUGCTL based on the
host's pre-VM-Enter value, i.e. preserve the host's FREEZE_IN_SMM setting
while running the guest. When running with the "default treatment of SMIs"
in effect (the only mode KVM supports), SMIs do not generate a VM-Exit that
is visible to host (non-SMM) software, and instead transitions directly
from VMX non-root to SMM. And critically, DEBUGCTL isn't context switched
by hardware on SMI or RSM, i.e. SMM will run with whatever value was
resident in hardware at the time of the SMI.

Failure to preserve FREEZE_IN_SMM results in the PMU unexpectedly counting
events while the CPU is executing in SMM, which can pollute profiling and
potentially leak information into the guest.

Check for changes in FREEZE_IN_SMM prior to every entry into KVM's inner
run loop, as the bit can be toggled in IRQ context via IPI callback (SMP
function call), by way of /sys/devices/cpu/freeze_on_smi.

Add a field in kvm_x86_ops to communicate which DEBUGCTL bits need to be
preserved, as FREEZE_IN_SMM is only supported and defined for Intel CPUs,
i.e. explicitly checking FREEZE_IN_SMM in common x86 is at best weird, and
at worst could lead to undesirable behavior in the future if AMD CPUs ever
happened to pick up a collision with the bit.

Exempt TDX vCPUs, i.e. protected guests, from the check, as the TDX Module
owns and controls GUEST_IA32_DEBUGCTL.

WARN in SVM if KVM_RUN_LOAD_DEBUGCTL is set, mostly to document that the
lack of handling isn't a KVM bug (TDX already WARNs on any run_flag).

Lastly, explicitly reload GUEST_IA32_DEBUGCTL on a VM-Fail that is missed
by KVM but detected by hardware, i.e. in nested_vmx_restore_host_state().
Doing so avoids the need to track host_debugctl on a per-VMCS basis, as
GUEST_IA32_DEBUGCTL is unconditionally written by prepare_vmcs02() and
load_vmcs12_host_state(). For the VM-Fail case, even though KVM won't
have actually entered the guest, vcpu_enter_guest() will have run with
vmcs02 active and thus could result in vmcs01 being run with a stale value.

Cc: stable@vger.kernel.org
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20250610232010.162191-9-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>

show more ...


# 7d0cce6c 10-Jun-2025 Maxim Levitsky <mlevitsk@redhat.com>

KVM: VMX: Wrap all accesses to IA32_DEBUGCTL with getter/setter APIs

Introduce vmx_guest_debugctl_{read,write}() to handle all accesses to
vmcs.GUEST_IA32_DEBUGCTL. This will allow stuffing FREEZE_I

KVM: VMX: Wrap all accesses to IA32_DEBUGCTL with getter/setter APIs

Introduce vmx_guest_debugctl_{read,write}() to handle all accesses to
vmcs.GUEST_IA32_DEBUGCTL. This will allow stuffing FREEZE_IN_SMM into
GUEST_IA32_DEBUGCTL based on the host setting without bleeding the state
into the guest, and without needing to copy+paste the FREEZE_IN_SMM
logic into every patch that accesses GUEST_IA32_DEBUGCTL.

No functional change intended.

Cc: stable@vger.kernel.org
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
[sean: massage changelog, make inline, use in all prepare_vmcs02() cases]
Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Link: https://lore.kernel.org/r/20250610232010.162191-8-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>

show more ...


# 095686e6 10-Jun-2025 Maxim Levitsky <mlevitsk@redhat.com>

KVM: nVMX: Check vmcs12->guest_ia32_debugctl on nested VM-Enter

Add a consistency check for L2's guest_ia32_debugctl, as KVM only supports
a subset of hardware functionality, i.e. KVM can't rely on

KVM: nVMX: Check vmcs12->guest_ia32_debugctl on nested VM-Enter

Add a consistency check for L2's guest_ia32_debugctl, as KVM only supports
a subset of hardware functionality, i.e. KVM can't rely on hardware to
detect illegal/unsupported values. Failure to check the vmcs12 value
would allow the guest to load any harware-supported value while running L2.

Take care to exempt BTF and LBR from the validity check in order to match
KVM's behavior for writes via WRMSR, but without clobbering vmcs12. Even
if VM_EXIT_SAVE_DEBUG_CONTROLS is set in vmcs12, L1 can reasonably expect
that vmcs12->guest_ia32_debugctl will not be modified if writes to the MSR
are being intercepted.

Arguably, KVM _should_ update vmcs12 if VM_EXIT_SAVE_DEBUG_CONTROLS is set
*and* writes to MSR_IA32_DEBUGCTLMSR are not being intercepted by L1, but
that would incur non-trivial complexity and wouldn't change the fact that
KVM's handling of DEBUGCTL is blatantly broken. I.e. the extra complexity
is not worth carrying.

Cc: stable@vger.kernel.org
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20250610232010.162191-7-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>

show more ...


# c598d5eb 11-Jun-2025 Thomas Zimmermann <tzimmermann@suse.de>

Merge drm/drm-next into drm-misc-next

Backmerging to forward to v6.16-rc1

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>


# 86e2d052 09-Jun-2025 Thomas Hellström <thomas.hellstrom@linux.intel.com>

Merge drm/drm-next into drm-xe-next

Backmerging to bring in 6.16

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>


# 34c55367 09-Jun-2025 Jani Nikula <jani.nikula@intel.com>

Merge drm/drm-next into drm-intel-next

Sync to v6.16-rc1, among other things to get the fixed size GENMASK_U*()
and BIT_U*() macros.

Signed-off-by: Jani Nikula <jani.nikula@intel.com>


Revision tags: v6.16-rc1
# 7f9039c5 02-Jun-2025 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull more kvm updates from Paolo Bonzini:
Generic:

- Clean up locking of all vCPUs for a VM by using the *_nest_lock()
f

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull more kvm updates from Paolo Bonzini:
Generic:

- Clean up locking of all vCPUs for a VM by using the *_nest_lock()
family of functions, and move duplicated code to virt/kvm/. kernel/
patches acked by Peter Zijlstra

- Add MGLRU support to the access tracking perf test

ARM fixes:

- Make the irqbypass hooks resilient to changes in the GSI<->MSI
routing, avoiding behind stale vLPI mappings being left behind. The
fix is to resolve the VGIC IRQ using the host IRQ (which is stable)
and nuking the vLPI mapping upon a routing change

- Close another VGIC race where vCPU creation races with VGIC
creation, leading to in-flight vCPUs entering the kernel w/o
private IRQs allocated

- Fix a build issue triggered by the recently added workaround for
Ampere's AC04_CPU_23 erratum

- Correctly sign-extend the VA when emulating a TLBI instruction
potentially targeting a VNCR mapping

- Avoid dereferencing a NULL pointer in the VGIC debug code, which
can happen if the device doesn't have any mapping yet

s390:

- Fix interaction between some filesystems and Secure Execution

- Some cleanups and refactorings, preparing for an upcoming big
series

x86:

- Wait for target vCPU to ack KVM_REQ_UPDATE_PROTECTED_GUEST_STATE
to fix a race between AP destroy and VMRUN

- Decrypt and dump the VMSA in dump_vmcb() if debugging enabled for
the VM

- Refine and harden handling of spurious faults

- Add support for ALLOWED_SEV_FEATURES

- Add #VMGEXIT to the set of handlers special cased for
CONFIG_RETPOLINE=y

- Treat DEBUGCTL[5:2] as reserved to pave the way for virtualizing
features that utilize those bits

- Don't account temporary allocations in sev_send_update_data()

- Add support for KVM_CAP_X86_BUS_LOCK_EXIT on SVM, via Bus Lock
Threshold

- Unify virtualization of IBRS on nested VM-Exit, and cross-vCPU
IBPB, between SVM and VMX

- Advertise support to userspace for WRMSRNS and PREFETCHI

- Rescan I/O APIC routes after handling EOI that needed to be
intercepted due to the old/previous routing, but not the
new/current routing

- Add a module param to control and enumerate support for device
posted interrupts

- Fix a potential overflow with nested virt on Intel systems running
32-bit kernels

- Flush shadow VMCSes on emergency reboot

- Add support for SNP to the various SEV selftests

- Add a selftest to verify fastops instructions via forced emulation

- Refine and optimize KVM's software processing of the posted
interrupt bitmap, and share the harvesting code between KVM and the
kernel's Posted MSI handler"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (93 commits)
rtmutex_api: provide correct extern functions
KVM: arm64: vgic-debug: Avoid dereferencing NULL ITE pointer
KVM: arm64: vgic-init: Plug vCPU vs. VGIC creation race
KVM: arm64: Unmap vLPIs affected by changes to GSI routing information
KVM: arm64: Resolve vLPI by host IRQ in vgic_v4_unset_forwarding()
KVM: arm64: Protect vLPI translation with vgic_irq::irq_lock
KVM: arm64: Use lock guard in vgic_v4_set_forwarding()
KVM: arm64: Mask out non-VA bits from TLBI VA* on VNCR invalidation
arm64: sysreg: Drag linux/kconfig.h to work around vdso build issue
KVM: s390: Simplify and move pv code
KVM: s390: Refactor and split some gmap helpers
KVM: s390: Remove unneeded srcu lock
s390: Remove unneeded includes
s390/uv: Improve splitting of large folios that cannot be split while dirty
s390/uv: Always return 0 from s390_wiggle_split_folio() if successful
s390/uv: Don't return 0 from make_hva_secure() if the operation was not successful
rust: add helper for mutex_trylock
RISC-V: KVM: use kvm_trylock_all_vcpus when locking all vCPUs
KVM: arm64: use kvm_trylock_all_vcpus when locking all vCPUs
x86: KVM: SVM: use kvm_lock_all_vcpus instead of a custom implementation
...

show more ...


# 4f978603 02-Jun-2025 Dmitry Torokhov <dmitry.torokhov@gmail.com>

Merge branch 'next' into for-linus

Prepare input updates for 6.16 merge window.


# 43db1111 29-May-2025 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
"As far as x86 goes this pull request "only" includes TDX host support.

Quotes are appropr

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
"As far as x86 goes this pull request "only" includes TDX host support.

Quotes are appropriate because (at 6k lines and 100+ commits) it is
much bigger than the rest, which will come later this week and
consists mostly of bugfixes and selftests. s390 changes will also come
in the second batch.

ARM:

- Add large stage-2 mapping (THP) support for non-protected guests
when pKVM is enabled, clawing back some performance.

- Enable nested virtualisation support on systems that support it,
though it is disabled by default.

- Add UBSAN support to the standalone EL2 object used in nVHE/hVHE
and protected modes.

- Large rework of the way KVM tracks architecture features and links
them with the effects of control bits. While this has no functional
impact, it ensures correctness of emulation (the data is
automatically extracted from the published JSON files), and helps
dealing with the evolution of the architecture.

- Significant changes to the way pKVM tracks ownership of pages,
avoiding page table walks by storing the state in the hypervisor's
vmemmap. This in turn enables the THP support described above.

- New selftest checking the pKVM ownership transition rules

- Fixes for FEAT_MTE_ASYNC being accidentally advertised to guests
even if the host didn't have it.

- Fixes for the address translation emulation, which happened to be
rather buggy in some specific contexts.

- Fixes for the PMU emulation in NV contexts, decoupling PMCR_EL0.N
from the number of counters exposed to a guest and addressing a
number of issues in the process.

- Add a new selftest for the SVE host state being corrupted by a
guest.

- Keep HCR_EL2.xMO set at all times for systems running with the
kernel at EL2, ensuring that the window for interrupts is slightly
bigger, and avoiding a pretty bad erratum on the AmpereOne HW.

- Add workaround for AmpereOne's erratum AC04_CPU_23, which suffers
from a pretty bad case of TLB corruption unless accesses to HCR_EL2
are heavily synchronised.

- Add a per-VM, per-ITS debugfs entry to dump the state of the ITS
tables in a human-friendly fashion.

- and the usual random cleanups.

LoongArch:

- Don't flush tlb if the host supports hardware page table walks.

- Add KVM selftests support.

RISC-V:

- Add vector registers to get-reg-list selftest

- VCPU reset related improvements

- Remove scounteren initialization from VCPU reset

- Support VCPU reset from userspace using set_mpstate() ioctl

x86:

- Initial support for TDX in KVM.

This finally makes it possible to use the TDX module to run
confidential guests on Intel processors. This is quite a large
series, including support for private page tables (managed by the
TDX module and mirrored in KVM for efficiency), forwarding some
TDVMCALLs to userspace, and handling several special VM exits from
the TDX module.

This has been in the works for literally years and it's not really
possible to describe everything here, so I'll defer to the various
merge commits up to and including commit 7bcf7246c42a ('Merge
branch 'kvm-tdx-finish-initial' into HEAD')"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (248 commits)
x86/tdx: mark tdh_vp_enter() as __flatten
Documentation: virt/kvm: remove unreferenced footnote
RISC-V: KVM: lock the correct mp_state during reset
KVM: arm64: Fix documentation for vgic_its_iter_next()
KVM: arm64: np-guest CMOs with PMD_SIZE fixmap
KVM: arm64: Stage-2 huge mappings for np-guests
KVM: arm64: Add a range to pkvm_mappings
KVM: arm64: Convert pkvm_mappings to interval tree
KVM: arm64: Add a range to __pkvm_host_test_clear_young_guest()
KVM: arm64: Add a range to __pkvm_host_wrprotect_guest()
KVM: arm64: Add a range to __pkvm_host_unshare_guest()
KVM: arm64: Add a range to __pkvm_host_share_guest()
KVM: arm64: Introduce for_each_hyp_page
KVM: arm64: Handle huge mappings for np-guest CMOs
KVM: arm64: nv: Release faulted-in VNCR page from mmu_lock critical section
KVM: arm64: nv: Handle TLBI S1E2 for VNCR invalidation with mmu_lock held
KVM: arm64: nv: Hold mmu_lock when invalidating VNCR SW-TLB before translating
RISC-V: KVM: add KVM_CAP_RISCV_MP_STATE_RESET
RISC-V: KVM: Remove scounteren initialization
KVM: RISC-V: remove unnecessary SBI reset state
...

show more ...


# bbfd5594 28-May-2025 Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

Merge drm/drm-next into drm-intel-gt-next

Need to pull in a67221b5eb8d ("drm/i915/dp: Return min bpc supported by source instead of 0")
in order to fix build breakage on GCC 9.4.0 (from Ubuntu 20.04

Merge drm/drm-next into drm-intel-gt-next

Need to pull in a67221b5eb8d ("drm/i915/dp: Return min bpc supported by source instead of 0")
in order to fix build breakage on GCC 9.4.0 (from Ubuntu 20.04).

Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

show more ...


# 3e89d5fd 27-May-2025 Paolo Bonzini <pbonzini@redhat.com>

Merge tag 'kvm-x86-vmx-6.16' of https://github.com/kvm-x86/linux into HEAD

KVM VMX changes for 6.16:

- Explicitly check MSR load/store list counts to fix a potential overflow on
32-bit kernels.

Merge tag 'kvm-x86-vmx-6.16' of https://github.com/kvm-x86/linux into HEAD

KVM VMX changes for 6.16:

- Explicitly check MSR load/store list counts to fix a potential overflow on
32-bit kernels.

- Flush shadow VMCSes on emergency reboot.

- Revert mem_enc_ioctl() back to an optional hook, as it's nullified when
SEV or TDX is disabled via Kconfig.

- Macrofy the handling of vt_x86_ops to eliminate a pile of boilerplate code
needed for TDX, and to optimize CONFIG_KVM_INTEL_TDX=n builds.

show more ...


# ebd38b26 27-May-2025 Paolo Bonzini <pbonzini@redhat.com>

Merge tag 'kvm-x86-misc-6.16' of https://github.com/kvm-x86/linux into HEAD

KVM x86 misc changes for 6.16:

- Unify virtualization of IBRS on nested VM-Exit, and cross-vCPU IBPB, between
SVM and

Merge tag 'kvm-x86-misc-6.16' of https://github.com/kvm-x86/linux into HEAD

KVM x86 misc changes for 6.16:

- Unify virtualization of IBRS on nested VM-Exit, and cross-vCPU IBPB, between
SVM and VMX.

- Advertise support to userspace for WRMSRNS and PREFETCHI.

- Rescan I/O APIC routes after handling EOI that needed to be intercepted due
to the old/previous routing, but not the new/current routing.

- Add a module param to control and enumerate support for device posted
interrupts.

- Misc cleanups.

show more ...


# 785cdec4 26-May-2025 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'x86-core-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull core x86 updates from Ingo Molnar:
"Boot code changes:

- A large series of changes to reorganize th

Merge tag 'x86-core-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull core x86 updates from Ingo Molnar:
"Boot code changes:

- A large series of changes to reorganize the x86 boot code into a
better isolated and easier to maintain base of PIC early startup
code in arch/x86/boot/startup/, by Ard Biesheuvel.

Motivation & background:

| Since commit
|
| c88d71508e36 ("x86/boot/64: Rewrite startup_64() in C")
|
| dated Jun 6 2017, we have been using C code on the boot path in a way
| that is not supported by the toolchain, i.e., to execute non-PIC C
| code from a mapping of memory that is different from the one provided
| to the linker. It should have been obvious at the time that this was a
| bad idea, given the need to sprinkle fixup_pointer() calls left and
| right to manipulate global variables (including non-pointer variables)
| without crashing.
|
| This C startup code has been expanding, and in particular, the SEV-SNP
| startup code has been expanding over the past couple of years, and
| grown many of these warts, where the C code needs to use special
| annotations or helpers to access global objects.

This tree includes the first phase of this work-in-progress x86
boot code reorganization.

Scalability enhancements and micro-optimizations:

- Improve code-patching scalability (Eric Dumazet)

- Remove MFENCEs for X86_BUG_CLFLUSH_MONITOR (Andrew Cooper)

CPU features enumeration updates:

- Thorough reorganization and cleanup of CPUID parsing APIs (Ahmed S.
Darwish)

- Fix, refactor and clean up the cacheinfo code (Ahmed S. Darwish,
Thomas Gleixner)

- Update CPUID bitfields to x86-cpuid-db v2.3 (Ahmed S. Darwish)

Memory management changes:

- Allow temporary MMs when IRQs are on (Andy Lutomirski)

- Opt-in to IRQs-off activate_mm() (Andy Lutomirski)

- Simplify choose_new_asid() and generate better code (Borislav
Petkov)

- Simplify 32-bit PAE page table handling (Dave Hansen)

- Always use dynamic memory layout (Kirill A. Shutemov)

- Make SPARSEMEM_VMEMMAP the only memory model (Kirill A. Shutemov)

- Make 5-level paging support unconditional (Kirill A. Shutemov)

- Stop prefetching current->mm->mmap_lock on page faults (Mateusz
Guzik)

- Predict valid_user_address() returning true (Mateusz Guzik)

- Consolidate initmem_init() (Mike Rapoport)

FPU support and vector computing:

- Enable Intel APX support (Chang S. Bae)

- Reorgnize and clean up the xstate code (Chang S. Bae)

- Make task_struct::thread constant size (Ingo Molnar)

- Restore fpu_thread_struct_whitelist() to fix
CONFIG_HARDENED_USERCOPY=y (Kees Cook)

- Simplify the switch_fpu_prepare() + switch_fpu_finish() logic (Oleg
Nesterov)

- Always preserve non-user xfeatures/flags in __state_perm (Sean
Christopherson)

Microcode loader changes:

- Help users notice when running old Intel microcode (Dave Hansen)

- AMD: Do not return error when microcode update is not necessary
(Annie Li)

- AMD: Clean the cache if update did not load microcode (Boris
Ostrovsky)

Code patching (alternatives) changes:

- Simplify, reorganize and clean up the x86 text-patching code (Ingo
Molnar)

- Make smp_text_poke_batch_process() subsume
smp_text_poke_batch_finish() (Nikolay Borisov)

- Refactor the {,un}use_temporary_mm() code (Peter Zijlstra)

Debugging support:

- Add early IDT and GDT loading to debug relocate_kernel() bugs
(David Woodhouse)

- Print the reason for the last reset on modern AMD CPUs (Yazen
Ghannam)

- Add AMD Zen debugging document (Mario Limonciello)

- Fix opcode map (!REX2) superscript tags (Masami Hiramatsu)

- Stop decoding i64 instructions in x86-64 mode at opcode (Masami
Hiramatsu)

CPU bugs and bug mitigations:

- Remove X86_BUG_MMIO_UNKNOWN (Borislav Petkov)

- Fix SRSO reporting on Zen1/2 with SMT disabled (Borislav Petkov)

- Restructure and harmonize the various CPU bug mitigation methods
(David Kaplan)

- Fix spectre_v2 mitigation default on Intel (Pawan Gupta)

MSR API:

- Large MSR code and API cleanup (Xin Li)

- In-kernel MSR API type cleanups and renames (Ingo Molnar)

PKEYS:

- Simplify PKRU update in signal frame (Chang S. Bae)

NMI handling code:

- Clean up, refactor and simplify the NMI handling code (Sohil Mehta)

- Improve NMI duration console printouts (Sohil Mehta)

Paravirt guests interface:

- Restrict PARAVIRT_XXL to 64-bit only (Kirill A. Shutemov)

SEV support:

- Share the sev_secrets_pa value again (Tom Lendacky)

x86 platform changes:

- Introduce the <asm/amd/> header namespace (Ingo Molnar)

- i2c: piix4, x86/platform: Move the SB800 PIIX4 FCH definitions to
<asm/amd/fch.h> (Mario Limonciello)

Fixes and cleanups:

- x86 assembly code cleanups and fixes (Uros Bizjak)

- Misc fixes and cleanups (Andi Kleen, Andy Lutomirski, Andy
Shevchenko, Ard Biesheuvel, Bagas Sanjaya, Baoquan He, Borislav
Petkov, Chang S. Bae, Chao Gao, Dan Williams, Dave Hansen, David
Kaplan, David Woodhouse, Eric Biggers, Ingo Molnar, Josh Poimboeuf,
Juergen Gross, Malaya Kumar Rout, Mario Limonciello, Nathan
Chancellor, Oleg Nesterov, Pawan Gupta, Peter Zijlstra, Shivank
Garg, Sohil Mehta, Thomas Gleixner, Uros Bizjak, Xin Li)"

* tag 'x86-core-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (331 commits)
x86/bugs: Fix spectre_v2 mitigation default on Intel
x86/bugs: Restructure ITS mitigation
x86/xen/msr: Fix uninitialized variable 'err'
x86/msr: Remove a superfluous inclusion of <asm/asm.h>
x86/paravirt: Restrict PARAVIRT_XXL to 64-bit only
x86/mm/64: Make 5-level paging support unconditional
x86/mm/64: Make SPARSEMEM_VMEMMAP the only memory model
x86/mm/64: Always use dynamic memory layout
x86/bugs: Fix indentation due to ITS merge
x86/cpuid: Rename hypervisor_cpuid_base()/for_each_possible_hypervisor_cpuid_base() to cpuid_base_hypervisor()/for_each_possible_cpuid_base_hypervisor()
x86/cpu/intel: Rename CPUID(0x2) descriptors iterator parameter
x86/cacheinfo: Rename CPUID(0x2) descriptors iterator parameter
x86/cpuid: Rename cpuid_get_leaf_0x2_regs() to cpuid_leaf_0x2()
x86/cpuid: Rename have_cpuid_p() to cpuid_feature()
x86/cpuid: Set <asm/cpuid/api.h> as the main CPUID header
x86/cpuid: Move CPUID(0x2) APIs into <cpuid/api.h>
x86/msr: Add rdmsrl_on_cpu() compatibility wrapper
x86/mm: Fix kernel-doc descriptions of various pgtable methods
x86/asm-offsets: Export certain 'struct cpuinfo_x86' fields for 64-bit asm use too
x86/boot: Defer initialization of VM space related global variables
...

show more ...


Revision tags: v6.15, v6.15-rc7
# db5302ae 16-May-2025 Jani Nikula <jani.nikula@intel.com>

Merge drm/drm-next into drm-intel-next

Backmerge to sync with v6.15-rc, xe, and specifically async flip changes
in drm-misc.

Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# d51b9d81 15-May-2025 Dmitry Torokhov <dmitry.torokhov@gmail.com>

Merge tag 'v6.15-rc6' into next

Sync up with mainline to bring in xpad controller changes.


# 1f82e8e1 13-May-2025 Ingo Molnar <mingo@kernel.org>

Merge branch 'x86/msr' into x86/core, to resolve conflicts

Conflicts:
arch/x86/boot/startup/sme.c
arch/x86/coco/sev/core.c
arch/x86/kernel/fpu/core.c
arch/x86/kernel/fpu/xstate.c

Semantic con

Merge branch 'x86/msr' into x86/core, to resolve conflicts

Conflicts:
arch/x86/boot/startup/sme.c
arch/x86/coco/sev/core.c
arch/x86/kernel/fpu/core.c
arch/x86/kernel/fpu/xstate.c

Semantic conflict:
arch/x86/include/asm/sev-internal.h

Signed-off-by: Ingo Molnar <mingo@kernel.org>

show more ...


Revision tags: v6.15-rc6, v6.15-rc5
# efef7f18 01-May-2025 Xin Li (Intel) <xin@zytor.com>

x86/msr: Add explicit includes of <asm/msr.h>

For historic reasons there are some TSC-related functions in the
<asm/msr.h> header, even though there's an <asm/tsc.h> header.

To facilitate the reloc

x86/msr: Add explicit includes of <asm/msr.h>

For historic reasons there are some TSC-related functions in the
<asm/msr.h> header, even though there's an <asm/tsc.h> header.

To facilitate the relocation of rdtsc{,_ordered}() from <asm/msr.h>
to <asm/tsc.h> and to eventually eliminate the inclusion of
<asm/msr.h> in <asm/tsc.h>, add an explicit <asm/msr.h> dependency
to the source files that reference definitions from <asm/msr.h>.

[ mingo: Clarified the changelog. ]

Signed-off-by: Xin Li (Intel) <xin@zytor.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Uros Bizjak <ubizjak@gmail.com>
Link: https://lore.kernel.org/r/20250501054241.1245648-1-xin@zytor.com

show more ...


# 844e31bb 29-Apr-2025 Rob Clark <robdclark@chromium.org>

Merge remote-tracking branch 'drm-misc/drm-misc-next' into msm-next

Merge drm-misc-next to get commit Fixes: fec450ca15af ("drm/display:
hdmi: provide central data authority for ACR params").

Signe

Merge remote-tracking branch 'drm-misc/drm-misc-next' into msm-next

Merge drm-misc-next to get commit Fixes: fec450ca15af ("drm/display:
hdmi: provide central data authority for ACR params").

Signed-off-by: Rob Clark <robdclark@chromium.org>

show more ...


# 54a1a24f 29-Apr-2025 Sean Christopherson <seanjc@google.com>

KVM: x86: Unify cross-vCPU IBPB

Both SVM and VMX have similar implementation for executing an IBPB
between running different vCPUs on the same CPU to create separate
prediction domains for different

KVM: x86: Unify cross-vCPU IBPB

Both SVM and VMX have similar implementation for executing an IBPB
between running different vCPUs on the same CPU to create separate
prediction domains for different vCPUs.

For VMX, when the currently loaded VMCS is changed in
vmx_vcpu_load_vmcs(), an IBPB is executed if there is no 'buddy', which
is the case on vCPU load. The intention is to execute an IBPB when
switching vCPUs, but not when switching the VMCS within the same vCPU.
Executing an IBPB on nested transitions within the same vCPU is handled
separately and conditionally in nested_vmx_vmexit().

For SVM, the current VMCB is tracked on vCPU load and an IBPB is
executed when it is changed. The intention is also to execute an IBPB
when switching vCPUs, although it is possible that in some cases an IBBP
is executed when switching VMCBs for the same vCPU. Executing an IBPB on
nested transitions should be handled separately, and is proposed at [1].

Unify the logic by tracking the last loaded vCPU and execuintg the IBPB
on vCPU change in kvm_arch_vcpu_load() instead. When a vCPU is
destroyed, make sure all references to it are removed from any CPU. This
is similar to how SVM clears the current_vmcb tracking on vCPU
destruction. Remove the current VMCB tracking in SVM as it is no longer
required, as well as the 'buddy' parameter to vmx_vcpu_load_vmcs().

[1] https://lore.kernel.org/lkml/20250221163352.3818347-4-yosry.ahmed@linux.dev

Link: https://lore.kernel.org/all/20250320013759.3965869-1-yosry.ahmed@linux.dev
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
[sean: tweak comment to stay at/under 80 columns]
Signed-off-by: Sean Christopherson <seanjc@google.com>

show more ...


12345678910>>...87