History log of /linux/arch/x86/kvm/Kconfig (Results 1 – 25 of 1518)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# ab93e0dd 06-Aug-2025 Dmitry Torokhov <dmitry.torokhov@gmail.com>

Merge branch 'next' into for-linus

Prepare input updates for 6.17 merge window.


# a7bee4e7 04-Aug-2025 Dmitry Torokhov <dmitry.torokhov@gmail.com>

Merge tag 'ib-mfd-gpio-input-pwm-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd into next

Merge an immutable branch between MFD, GPIO, Input and PWM to resolve
conflicts for the mer

Merge tag 'ib-mfd-gpio-input-pwm-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd into next

Merge an immutable branch between MFD, GPIO, Input and PWM to resolve
conflicts for the merge window pull request.

show more ...


# 63eb28bb 31-Jul-2025 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
"ARM:

- Host driver for GICv5, the next generation interrupt controller for
arm64, i

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
"ARM:

- Host driver for GICv5, the next generation interrupt controller for
arm64, including support for interrupt routing, MSIs, interrupt
translation and wired interrupts

- Use FEAT_GCIE_LEGACY on GICv5 systems to virtualize GICv3 VMs on
GICv5 hardware, leveraging the legacy VGIC interface

- Userspace control of the 'nASSGIcap' GICv3 feature, allowing
userspace to disable support for SGIs w/o an active state on
hardware that previously advertised it unconditionally

- Map supporting endpoints with cacheable memory attributes on
systems with FEAT_S2FWB and DIC where KVM no longer needs to
perform cache maintenance on the address range

- Nested support for FEAT_RAS and FEAT_DoubleFault2, allowing the
guest hypervisor to inject external aborts into an L2 VM and take
traps of masked external aborts to the hypervisor

- Convert more system register sanitization to the config-driven
implementation

- Fixes to the visibility of EL2 registers, namely making VGICv3
system registers accessible through the VGIC device instead of the
ONE_REG vCPU ioctls

- Various cleanups and minor fixes

LoongArch:

- Add stat information for in-kernel irqchip

- Add tracepoints for CPUCFG and CSR emulation exits

- Enhance in-kernel irqchip emulation

- Various cleanups

RISC-V:

- Enable ring-based dirty memory tracking

- Improve perf kvm stat to report interrupt events

- Delegate illegal instruction trap to VS-mode

- MMU improvements related to upcoming nested virtualization

s390x

- Fixes

x86:

- Add CONFIG_KVM_IOAPIC for x86 to allow disabling support for I/O
APIC, PIC, and PIT emulation at compile time

- Share device posted IRQ code between SVM and VMX and harden it
against bugs and runtime errors

- Use vcpu_idx, not vcpu_id, for GA log tag/metadata, to make lookups
O(1) instead of O(n)

- For MMIO stale data mitigation, track whether or not a vCPU has
access to (host) MMIO based on whether the page tables have MMIO
pfns mapped; using VFIO is prone to false negatives

- Rework the MSR interception code so that the SVM and VMX APIs are
more or less identical

- Recalculate all MSR intercepts from scratch on MSR filter changes,
instead of maintaining shadow bitmaps

- Advertise support for LKGS (Load Kernel GS base), a new instruction
that's loosely related to FRED, but is supported and enumerated
independently

- Fix a user-triggerable WARN that syzkaller found by setting the
vCPU in INIT_RECEIVED state (aka wait-for-SIPI), and then putting
the vCPU into VMX Root Mode (post-VMXON). Trying to detect every
possible path leading to architecturally forbidden states is hard
and even risks breaking userspace (if it goes from valid to valid
state but passes through invalid states), so just wait until
KVM_RUN to detect that the vCPU state isn't allowed

- Add KVM_X86_DISABLE_EXITS_APERFMPERF to allow disabling
interception of APERF/MPERF reads, so that a "properly" configured
VM can access APERF/MPERF. This has many caveats (APERF/MPERF
cannot be zeroed on vCPU creation or saved/restored on suspend and
resume, or preserved over thread migration let alone VM migration)
but can be useful whenever you're interested in letting Linux
guests see the effective physical CPU frequency in /proc/cpuinfo

- Reject KVM_SET_TSC_KHZ for vm file descriptors if vCPUs have been
created, as there's no known use case for changing the default
frequency for other VM types and it goes counter to the very reason
why the ioctl was added to the vm file descriptor. And also, there
would be no way to make it work for confidential VMs with a
"secure" TSC, so kill two birds with one stone

- Dynamically allocation the shadow MMU's hashed page list, and defer
allocating the hashed list until it's actually needed (the TDP MMU
doesn't use the list)

- Extract many of KVM's helpers for accessing architectural local
APIC state to common x86 so that they can be shared by guest-side
code for Secure AVIC

- Various cleanups and fixes

x86 (Intel):

- Preserve the host's DEBUGCTL.FREEZE_IN_SMM when running the guest.
Failure to honor FREEZE_IN_SMM can leak host state into guests

- Explicitly check vmcs12.GUEST_DEBUGCTL on nested VM-Enter to
prevent L1 from running L2 with features that KVM doesn't support,
e.g. BTF

x86 (AMD):

- WARN and reject loading kvm-amd.ko instead of panicking the kernel
if the nested SVM MSRPM offsets tracker can't handle an MSR (which
is pretty much a static condition and therefore should never
happen, but still)

- Fix a variety of flaws and bugs in the AVIC device posted IRQ code

- Inhibit AVIC if a vCPU's ID is too big (relative to what hardware
supports) instead of rejecting vCPU creation

- Extend enable_ipiv module param support to SVM, by simply leaving
IsRunning clear in the vCPU's physical ID table entry

- Disable IPI virtualization, via enable_ipiv, if the CPU is affected
by erratum #1235, to allow (safely) enabling AVIC on such CPUs

- Request GA Log interrupts if and only if the target vCPU is
blocking, i.e. only if KVM needs a notification in order to wake
the vCPU

- Intercept SPEC_CTRL on AMD if the MSR shouldn't exist according to
the vCPU's CPUID model

- Accept any SNP policy that is accepted by the firmware with respect
to SMT and single-socket restrictions. An incompatible policy
doesn't put the kernel at risk in any way, so there's no reason for
KVM to care

- Drop a superfluous WBINVD (on all CPUs!) when destroying a VM and
use WBNOINVD instead of WBINVD when possible for SEV cache
maintenance

- When reclaiming memory from an SEV guest, only do cache flushes on
CPUs that have ever run a vCPU for the guest, i.e. don't flush the
caches for CPUs that can't possibly have cache lines with dirty,
encrypted data

Generic:

- Rework irqbypass to track/match producers and consumers via an
xarray instead of a linked list. Using a linked list leads to
O(n^2) insertion times, which is hugely problematic for use cases
that create large numbers of VMs. Such use cases typically don't
actually use irqbypass, but eliminating the pointless registration
is a future problem to solve as it likely requires new uAPI

- Track irqbypass's "token" as "struct eventfd_ctx *" instead of a
"void *", to avoid making a simple concept unnecessarily difficult
to understand

- Decouple device posted IRQs from VFIO device assignment, as binding
a VM to a VFIO group is not a requirement for enabling device
posted IRQs

- Clean up and document/comment the irqfd assignment code

- Disallow binding multiple irqfds to an eventfd with a priority
waiter, i.e. ensure an eventfd is bound to at most one irqfd
through the entire host, and add a selftest to verify eventfd:irqfd
bindings are globally unique

- Add a tracepoint for KVM_SET_MEMORY_ATTRIBUTES to help debug issues
related to private <=> shared memory conversions

- Drop guest_memfd's .getattr() implementation as the VFS layer will
call generic_fillattr() if inode_operations.getattr is NULL

- Fix issues with dirty ring harvesting where KVM doesn't bound the
processing of entries in any way, which allows userspace to keep
KVM in a tight loop indefinitely

- Kill off kvm_arch_{start,end}_assignment() and x86's associated
tracking, now that KVM no longer uses assigned_device_count as a
heuristic for either irqbypass usage or MDS mitigation

Selftests:

- Fix a comment typo

- Verify KVM is loaded when getting any KVM module param so that
attempting to run a selftest without kvm.ko loaded results in a
SKIP message about KVM not being loaded/enabled (versus some random
parameter not existing)

- Skip tests that hit EACCES when attempting to access a file, and
print a "Root required?" help message. In most cases, the test just
needs to be run with elevated permissions"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (340 commits)
Documentation: KVM: Use unordered list for pre-init VGIC registers
RISC-V: KVM: Avoid re-acquiring memslot in kvm_riscv_gstage_map()
RISC-V: KVM: Use find_vma_intersection() to search for intersecting VMAs
RISC-V: perf/kvm: Add reporting of interrupt events
RISC-V: KVM: Enable ring-based dirty memory tracking
RISC-V: KVM: Fix inclusion of Smnpm in the guest ISA bitmap
RISC-V: KVM: Delegate illegal instruction fault to VS mode
RISC-V: KVM: Pass VMID as parameter to kvm_riscv_hfence_xyz() APIs
RISC-V: KVM: Factor-out g-stage page table management
RISC-V: KVM: Add vmid field to struct kvm_riscv_hfence
RISC-V: KVM: Introduce struct kvm_gstage_mapping
RISC-V: KVM: Factor-out MMU related declarations into separate headers
RISC-V: KVM: Use ncsr_xyz() in kvm_riscv_vcpu_trap_redirect()
RISC-V: KVM: Implement kvm_arch_flush_remote_tlbs_range()
RISC-V: KVM: Don't flush TLB when PTE is unchanged
RISC-V: KVM: Replace KVM_REQ_HFENCE_GVMA_VMID_ALL with KVM_REQ_TLB_FLUSH
RISC-V: KVM: Rename and move kvm_riscv_local_tlb_sanitize()
RISC-V: KVM: Drop the return value of kvm_riscv_vcpu_aia_init()
RISC-V: KVM: Check kvm_riscv_vcpu_alloc_vector_context() return value
KVM: arm64: selftests: Add FEAT_RAS EL2 registers to get-reg-list
...

show more ...


# f02b1bcc 28-Jul-2025 Paolo Bonzini <pbonzini@redhat.com>

Merge tag 'kvm-x86-irqs-6.17' of https://github.com/kvm-x86/linux into HEAD

KVM IRQ changes for 6.17

- Rework irqbypass to track/match producers and consumers via an xarray
instead of a linked

Merge tag 'kvm-x86-irqs-6.17' of https://github.com/kvm-x86/linux into HEAD

KVM IRQ changes for 6.17

- Rework irqbypass to track/match producers and consumers via an xarray
instead of a linked list. Using a linked list leads to O(n^2) insertion
times, which is hugely problematic for use cases that create large numbers
of VMs. Such use cases typically don't actually use irqbypass, but
eliminating the pointless registration is a future problem to solve as it
likely requires new uAPI.

- Track irqbypass's "token" as "struct eventfd_ctx *" instead of a "void *",
to avoid making a simple concept unnecessarily difficult to understand.

- Add CONFIG_KVM_IOAPIC for x86 to allow disabling support for I/O APIC, PIC,
and PIT emulation at compile time.

- Drop x86's irq_comm.c, and move a pile of IRQ related code into irq.c.

- Fix a variety of flaws and bugs in the AVIC device posted IRQ code.

- Inhibited AVIC if a vCPU's ID is too big (relative to what hardware
supports) instead of rejecting vCPU creation.

- Extend enable_ipiv module param support to SVM, by simply leaving IsRunning
clear in the vCPU's physical ID table entry.

- Disable IPI virtualization, via enable_ipiv, if the CPU is affected by
erratum #1235, to allow (safely) enabling AVIC on such CPUs.

- Dedup x86's device posted IRQ code, as the vast majority of functionality
can be shared verbatime between SVM and VMX.

- Harden the device posted IRQ code against bugs and runtime errors.

- Use vcpu_idx, not vcpu_id, for GA log tag/metadata, to make lookups O(1)
instead of O(n).

- Generate GA Log interrupts if and only if the target vCPU is blocking, i.e.
only if KVM needs a notification in order to wake the vCPU.

- Decouple device posted IRQs from VFIO device assignment, as binding a VM to
a VFIO group is not a requirement for enabling device posted IRQs.

- Clean up and document/comment the irqfd assignment code.

- Disallow binding multiple irqfds to an eventfd with a priority waiter, i.e.
ensure an eventfd is bound to at most one irqfd through the entire host,
and add a selftest to verify eventfd:irqfd bindings are globally unique.

show more ...


Revision tags: v6.16, v6.16-rc7, v6.16-rc6, v6.16-rc5, v6.16-rc4
# 74f1af95 29-Jun-2025 Rob Clark <robin.clark@oss.qualcomm.com>

Merge remote-tracking branch 'drm/drm-next' into msm-next

Back-merge drm-next to (indirectly) get arm-smmu updates for making
stall-on-fault more reliable.

Signed-off-by: Rob Clark <robin.clark@oss

Merge remote-tracking branch 'drm/drm-next' into msm-next

Back-merge drm-next to (indirectly) get arm-smmu updates for making
stall-on-fault more reliable.

Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>

show more ...


Revision tags: v6.16-rc3, v6.16-rc2
# 628a2773 11-Jun-2025 Sean Christopherson <seanjc@google.com>

KVM: x86: Add CONFIG_KVM_IOAPIC to allow disabling in-kernel I/O APIC

Add a Kconfig to allow building KVM without support for emulating a I/O
APIC, PIC, and PIT, which is desirable for deployments t

KVM: x86: Add CONFIG_KVM_IOAPIC to allow disabling in-kernel I/O APIC

Add a Kconfig to allow building KVM without support for emulating a I/O
APIC, PIC, and PIT, which is desirable for deployments that effectively
don't support a fully in-kernel IRQ chip, i.e. never expect any VMM to
create an in-kernel I/O APIC. E.g. compiling out support eliminates a few
thousand lines of guest-facing code and gives security folks warm fuzzies.

As a bonus, wrapping relevant paths with CONFIG_KVM_IOAPIC #ifdefs makes
it much easier for readers to understand which bits and pieces exist
specifically for fully in-kernel IRQ chips.

Opportunistically convert all two in-kernel uses of __KVM_HAVE_IOAPIC to
CONFIG_KVM_IOAPIC, e.g. rather than add a second #ifdef to generate a stub
for kvm_arch_post_irq_routing_update().

Acked-by: Kai Huang <kai.huang@intel.com>
Link: https://lore.kernel.org/r/20250611213557.294358-15-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>

show more ...


# c598d5eb 11-Jun-2025 Thomas Zimmermann <tzimmermann@suse.de>

Merge drm/drm-next into drm-misc-next

Backmerging to forward to v6.16-rc1

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>


# 86e2d052 09-Jun-2025 Thomas Hellström <thomas.hellstrom@linux.intel.com>

Merge drm/drm-next into drm-xe-next

Backmerging to bring in 6.16

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>


# 34c55367 09-Jun-2025 Jani Nikula <jani.nikula@intel.com>

Merge drm/drm-next into drm-intel-next

Sync to v6.16-rc1, among other things to get the fixed size GENMASK_U*()
and BIT_U*() macros.

Signed-off-by: Jani Nikula <jani.nikula@intel.com>


Revision tags: v6.16-rc1
# 4f978603 02-Jun-2025 Dmitry Torokhov <dmitry.torokhov@gmail.com>

Merge branch 'next' into for-linus

Prepare input updates for 6.16 merge window.


# 43db1111 29-May-2025 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
"As far as x86 goes this pull request "only" includes TDX host support.

Quotes are appropr

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
"As far as x86 goes this pull request "only" includes TDX host support.

Quotes are appropriate because (at 6k lines and 100+ commits) it is
much bigger than the rest, which will come later this week and
consists mostly of bugfixes and selftests. s390 changes will also come
in the second batch.

ARM:

- Add large stage-2 mapping (THP) support for non-protected guests
when pKVM is enabled, clawing back some performance.

- Enable nested virtualisation support on systems that support it,
though it is disabled by default.

- Add UBSAN support to the standalone EL2 object used in nVHE/hVHE
and protected modes.

- Large rework of the way KVM tracks architecture features and links
them with the effects of control bits. While this has no functional
impact, it ensures correctness of emulation (the data is
automatically extracted from the published JSON files), and helps
dealing with the evolution of the architecture.

- Significant changes to the way pKVM tracks ownership of pages,
avoiding page table walks by storing the state in the hypervisor's
vmemmap. This in turn enables the THP support described above.

- New selftest checking the pKVM ownership transition rules

- Fixes for FEAT_MTE_ASYNC being accidentally advertised to guests
even if the host didn't have it.

- Fixes for the address translation emulation, which happened to be
rather buggy in some specific contexts.

- Fixes for the PMU emulation in NV contexts, decoupling PMCR_EL0.N
from the number of counters exposed to a guest and addressing a
number of issues in the process.

- Add a new selftest for the SVE host state being corrupted by a
guest.

- Keep HCR_EL2.xMO set at all times for systems running with the
kernel at EL2, ensuring that the window for interrupts is slightly
bigger, and avoiding a pretty bad erratum on the AmpereOne HW.

- Add workaround for AmpereOne's erratum AC04_CPU_23, which suffers
from a pretty bad case of TLB corruption unless accesses to HCR_EL2
are heavily synchronised.

- Add a per-VM, per-ITS debugfs entry to dump the state of the ITS
tables in a human-friendly fashion.

- and the usual random cleanups.

LoongArch:

- Don't flush tlb if the host supports hardware page table walks.

- Add KVM selftests support.

RISC-V:

- Add vector registers to get-reg-list selftest

- VCPU reset related improvements

- Remove scounteren initialization from VCPU reset

- Support VCPU reset from userspace using set_mpstate() ioctl

x86:

- Initial support for TDX in KVM.

This finally makes it possible to use the TDX module to run
confidential guests on Intel processors. This is quite a large
series, including support for private page tables (managed by the
TDX module and mirrored in KVM for efficiency), forwarding some
TDVMCALLs to userspace, and handling several special VM exits from
the TDX module.

This has been in the works for literally years and it's not really
possible to describe everything here, so I'll defer to the various
merge commits up to and including commit 7bcf7246c42a ('Merge
branch 'kvm-tdx-finish-initial' into HEAD')"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (248 commits)
x86/tdx: mark tdh_vp_enter() as __flatten
Documentation: virt/kvm: remove unreferenced footnote
RISC-V: KVM: lock the correct mp_state during reset
KVM: arm64: Fix documentation for vgic_its_iter_next()
KVM: arm64: np-guest CMOs with PMD_SIZE fixmap
KVM: arm64: Stage-2 huge mappings for np-guests
KVM: arm64: Add a range to pkvm_mappings
KVM: arm64: Convert pkvm_mappings to interval tree
KVM: arm64: Add a range to __pkvm_host_test_clear_young_guest()
KVM: arm64: Add a range to __pkvm_host_wrprotect_guest()
KVM: arm64: Add a range to __pkvm_host_unshare_guest()
KVM: arm64: Add a range to __pkvm_host_share_guest()
KVM: arm64: Introduce for_each_hyp_page
KVM: arm64: Handle huge mappings for np-guest CMOs
KVM: arm64: nv: Release faulted-in VNCR page from mmu_lock critical section
KVM: arm64: nv: Handle TLBI S1E2 for VNCR invalidation with mmu_lock held
KVM: arm64: nv: Hold mmu_lock when invalidating VNCR SW-TLB before translating
RISC-V: KVM: add KVM_CAP_RISCV_MP_STATE_RESET
RISC-V: KVM: Remove scounteren initialization
KVM: RISC-V: remove unnecessary SBI reset state
...

show more ...


# bbfd5594 28-May-2025 Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

Merge drm/drm-next into drm-intel-gt-next

Need to pull in a67221b5eb8d ("drm/i915/dp: Return min bpc supported by source instead of 0")
in order to fix build breakage on GCC 9.4.0 (from Ubuntu 20.04

Merge drm/drm-next into drm-intel-gt-next

Need to pull in a67221b5eb8d ("drm/i915/dp: Return min bpc supported by source instead of 0")
in order to fix build breakage on GCC 9.4.0 (from Ubuntu 20.04).

Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

show more ...


Revision tags: v6.15, v6.15-rc7
# db5302ae 16-May-2025 Jani Nikula <jani.nikula@intel.com>

Merge drm/drm-next into drm-intel-next

Backmerge to sync with v6.15-rc, xe, and specifically async flip changes
in drm-misc.

Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# d51b9d81 15-May-2025 Dmitry Torokhov <dmitry.torokhov@gmail.com>

Merge tag 'v6.15-rc6' into next

Sync up with mainline to bring in xpad controller changes.


Revision tags: v6.15-rc6, v6.15-rc5
# 844e31bb 29-Apr-2025 Rob Clark <robdclark@chromium.org>

Merge remote-tracking branch 'drm-misc/drm-misc-next' into msm-next

Merge drm-misc-next to get commit Fixes: fec450ca15af ("drm/display:
hdmi: provide central data authority for ACR params").

Signe

Merge remote-tracking branch 'drm-misc/drm-misc-next' into msm-next

Merge drm-misc-next to get commit Fixes: fec450ca15af ("drm/display:
hdmi: provide central data authority for ACR params").

Signed-off-by: Rob Clark <robdclark@chromium.org>

show more ...


Revision tags: v6.15-rc4
# 3ab7ae8e 24-Apr-2025 Thomas Hellström <thomas.hellstrom@linux.intel.com>

Merge drm/drm-next into drm-xe-next

Backmerge to bring in linux 6.15-rc.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>


Revision tags: v6.15-rc3, v6.15-rc2
# 9f13acb2 11-Apr-2025 Ingo Molnar <mingo@kernel.org>

Merge tag 'v6.15-rc1' into x86/cpu, to refresh the branch with upstream changes

Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 6ce0fdaa 09-Apr-2025 Ingo Molnar <mingo@kernel.org>

Merge tag 'v6.15-rc1' into x86/asm, to refresh the branch

Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 1260ed77 08-Apr-2025 Thomas Zimmermann <tzimmermann@suse.de>

Merge drm/drm-fixes into drm-misc-fixes

Backmerging to get updates from v6.15-rc1.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>


# 1afba39f 07-Apr-2025 Thomas Zimmermann <tzimmermann@suse.de>

Merge drm/drm-next into drm-misc-next

Backmerging to get v6.15-rc1 into drm-misc-next. Also fixes a
build issue when enabling CONFIG_DRM_SCHED_KUNIT_TEST.

Signed-off-by: Thomas Zimmermann <tzimmerm

Merge drm/drm-next into drm-misc-next

Backmerging to get v6.15-rc1 into drm-misc-next. Also fixes a
build issue when enabling CONFIG_DRM_SCHED_KUNIT_TEST.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>

show more ...


Revision tags: v6.15-rc1, v6.14
# fd02aa45 19-Mar-2025 Paolo Bonzini <pbonzini@redhat.com>

Merge branch 'kvm-tdx-initial' into HEAD

This large commit contains the initial support for TDX in KVM. All x86
parts enable the host-side hypercalls that KVM uses to talk to the TDX
module, a soft

Merge branch 'kvm-tdx-initial' into HEAD

This large commit contains the initial support for TDX in KVM. All x86
parts enable the host-side hypercalls that KVM uses to talk to the TDX
module, a software component that runs in a special CPU mode called SEAM
(Secure Arbitration Mode).

The series is in turn split into multiple sub-series, each with a separate
merge commit:

- Initialization: basic setup for using the TDX module from KVM, plus
ioctls to create TDX VMs and vCPUs.

- MMU: in TDX, private and shared halves of the address space are mapped by
different EPT roots, and the private half is managed by the TDX module.
Using the support that was added to the generic MMU code in 6.14,
add support for TDX's secure page tables to the Intel side of KVM.
Generic KVM code takes care of maintaining a mirror of the secure page
tables so that they can be queried efficiently, and ensuring that changes
are applied to both the mirror and the secure EPT.

- vCPU enter/exit: implement the callbacks that handle the entry of a TDX
vCPU (via the SEAMCALL TDH.VP.ENTER) and the corresponding save/restore
of host state.

- Userspace exits: introduce support for guest TDVMCALLs that KVM forwards to
userspace. These correspond to the usual KVM_EXIT_* "heavyweight vmexits"
but are triggered through a different mechanism, similar to VMGEXIT for
SEV-ES and SEV-SNP.

- Interrupt handling: support for virtual interrupt injection as well as
handling VM-Exits that are caused by vectored events. Exclusive to
TDX are machine-check SMIs, which the kernel already knows how to
handle through the kernel machine check handler (commit 7911f145de5f,
"x86/mce: Implement recovery for errors in TDX/SEAM non-root mode")

- Loose ends: handling of the remaining exits from the TDX module, including
EPT violation/misconfig and several TDVMCALL leaves that are handled in
the kernel (CPUID, HLT, RDMSR/WRMSR, GetTdVmCallInfo); plus returning
an error or ignoring operations that are not supported by TDX guests

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

show more ...


# edb0e8f6 25-Mar-2025 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
"ARM:

- Nested virtualization support for VGICv3, giving the nested
hypervisor contr

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
"ARM:

- Nested virtualization support for VGICv3, giving the nested
hypervisor control of the VGIC hardware when running an L2 VM

- Removal of 'late' nested virtualization feature register masking,
making the supported feature set directly visible to userspace

- Support for emulating FEAT_PMUv3 on Apple silicon, taking advantage
of an IMPLEMENTATION DEFINED trap that covers all PMUv3 registers

- Paravirtual interface for discovering the set of CPU
implementations where a VM may run, addressing a longstanding issue
of guest CPU errata awareness in big-little systems and
cross-implementation VM migration

- Userspace control of the registers responsible for identifying a
particular CPU implementation (MIDR_EL1, REVIDR_EL1, AIDR_EL1),
allowing VMs to be migrated cross-implementation

- pKVM updates, including support for tracking stage-2 page table
allocations in the protected hypervisor in the 'SecPageTable' stat

- Fixes to vPMU, ensuring that userspace updates to the vPMU after
KVM_RUN are reflected into the backing perf events

LoongArch:

- Remove unnecessary header include path

- Assume constant PGD during VM context switch

- Add perf events support for guest VM

RISC-V:

- Disable the kernel perf counter during configure

- KVM selftests improvements for PMU

- Fix warning at the time of KVM module removal

x86:

- Add support for aging of SPTEs without holding mmu_lock.

Not taking mmu_lock allows multiple aging actions to run in
parallel, and more importantly avoids stalling vCPUs. This includes
an implementation of per-rmap-entry locking; aging the gfn is done
with only a per-rmap single-bin spinlock taken, whereas locking an
rmap for write requires taking both the per-rmap spinlock and the
mmu_lock.

Note that this decreases slightly the accuracy of accessed-page
information, because changes to the SPTE outside aging might not
use atomic operations even if they could race against a clear of
the Accessed bit.

This is deliberate because KVM and mm/ tolerate false
positives/negatives for accessed information, and testing has shown
that reducing the latency of aging is far more beneficial to
overall system performance than providing "perfect" young/old
information.

- Defer runtime CPUID updates until KVM emulates a CPUID instruction,
to coalesce updates when multiple pieces of vCPU state are
changing, e.g. as part of a nested transition

- Fix a variety of nested emulation bugs, and add VMX support for
synthesizing nested VM-Exit on interception (instead of injecting
#UD into L2)

- Drop "support" for async page faults for protected guests that do
not set SEND_ALWAYS (i.e. that only want async page faults at CPL3)

- Bring a bit of sanity to x86's VM teardown code, which has
accumulated a lot of cruft over the years. Particularly, destroy
vCPUs before the MMU, despite the latter being a VM-wide operation

- Add common secure TSC infrastructure for use within SNP and in the
future TDX

- Block KVM_CAP_SYNC_REGS if guest state is protected. It does not
make sense to use the capability if the relevant registers are not
available for reading or writing

- Don't take kvm->lock when iterating over vCPUs in the suspend
notifier to fix a largely theoretical deadlock

- Use the vCPU's actual Xen PV clock information when starting the
Xen timer, as the cached state in arch.hv_clock can be stale/bogus

- Fix a bug where KVM could bleed PVCLOCK_GUEST_STOPPED across
different PV clocks; restrict PVCLOCK_GUEST_STOPPED to kvmclock, as
KVM's suspend notifier only accounts for kvmclock, and there's no
evidence that the flag is actually supported by Xen guests

- Clean up the per-vCPU "cache" of its reference pvclock, and instead
only track the vCPU's TSC scaling (multipler+shift) metadata (which
is moderately expensive to compute, and rarely changes for modern
setups)

- Don't write to the Xen hypercall page on MSR writes that are
initiated by the host (userspace or KVM) to fix a class of bugs
where KVM can write to guest memory at unexpected times, e.g.
during vCPU creation if userspace has set the Xen hypercall MSR
index to collide with an MSR that KVM emulates

- Restrict the Xen hypercall MSR index to the unofficial synthetic
range to reduce the set of possible collisions with MSRs that are
emulated by KVM (collisions can still happen as KVM emulates
Hyper-V MSRs, which also reside in the synthetic range)

- Clean up and optimize KVM's handling of Xen MSR writes and
xen_hvm_config

- Update Xen TSC leaves during CPUID emulation instead of modifying
the CPUID entries when updating PV clocks; there is no guarantee PV
clocks will be updated between TSC frequency changes and CPUID
emulation, and guest reads of the TSC leaves should be rare, i.e.
are not a hot path

x86 (Intel):

- Fix a bug where KVM unnecessarily reads XFD_ERR from hardware and
thus modifies the vCPU's XFD_ERR on a #NM due to CR0.TS=1

- Pass XFD_ERR as the payload when injecting #NM, as a preparatory
step for upcoming FRED virtualization support

- Decouple the EPT entry RWX protection bit macros from the EPT
Violation bits, both as a general cleanup and in anticipation of
adding support for emulating Mode-Based Execution Control (MBEC)

- Reject KVM_RUN if userspace manages to gain control and stuff
invalid guest state while KVM is in the middle of emulating nested
VM-Enter

- Add a macro to handle KVM's sanity checks on entry/exit VMCS
control pairs in anticipation of adding sanity checks for secondary
exit controls (the primary field is out of bits)

x86 (AMD):

- Ensure the PSP driver is initialized when both the PSP and KVM
modules are built-in (the initcall framework doesn't handle
dependencies)

- Use long-term pins when registering encrypted memory regions, so
that the pages are migrated out of MIGRATE_CMA/ZONE_MOVABLE and
don't lead to excessive fragmentation

- Add macros and helpers for setting GHCB return/error codes

- Add support for Idle HLT interception, which elides interception if
the vCPU has a pending, unmasked virtual IRQ when HLT is executed

- Fix a bug in INVPCID emulation where KVM fails to check for a
non-canonical address

- Don't attempt VMRUN for SEV-ES+ guests if the vCPU's VMSA is
invalid, e.g. because the vCPU was "destroyed" via SNP's AP
Creation hypercall

- Reject SNP AP Creation if the requested SEV features for the vCPU
don't match the VM's configured set of features

Selftests:

- Fix again the Intel PMU counters test; add a data load and do
CLFLUSH{OPT} on the data instead of executing code. The theory is
that modern Intel CPUs have learned new code prefetching tricks
that bypass the PMU counters

- Fix a flaw in the Intel PMU counters test where it asserts that an
event is counting correctly without actually knowing what the event
counts on the underlying hardware

- Fix a variety of flaws, bugs, and false failures/passes
dirty_log_test, and improve its coverage by collecting all dirty
entries on each iteration

- Fix a few minor bugs related to handling of stats FDs

- Add infrastructure to make vCPU and VM stats FDs available to tests
by default (open the FDs during VM/vCPU creation)

- Relax an assertion on the number of HLT exits in the xAPIC IPI test
when running on a CPU that supports AMD's Idle HLT (which elides
interception of HLT if a virtual IRQ is pending and unmasked)"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (216 commits)
RISC-V: KVM: Optimize comments in kvm_riscv_vcpu_isa_disable_allowed
RISC-V: KVM: Teardown riscv specific bits after kvm_exit
LoongArch: KVM: Register perf callbacks for guest
LoongArch: KVM: Implement arch-specific functions for guest perf
LoongArch: KVM: Add stub for kvm_arch_vcpu_preempted_in_kernel()
LoongArch: KVM: Remove PGD saving during VM context switch
LoongArch: KVM: Remove unnecessary header include path
KVM: arm64: Tear down vGIC on failed vCPU creation
KVM: arm64: PMU: Reload when resetting
KVM: arm64: PMU: Reload when user modifies registers
KVM: arm64: PMU: Fix SET_ONE_REG for vPMC regs
KVM: arm64: PMU: Assume PMU presence in pmu-emul.c
KVM: arm64: PMU: Set raw values from user to PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR}
KVM: arm64: Create each pKVM hyp vcpu after its corresponding host vcpu
KVM: arm64: Factor out pKVM hyp vcpu creation to separate function
KVM: arm64: Initialize HCRX_EL2 traps in pKVM
KVM: arm64: Factor out setting HCRX_EL2 traps into separate function
KVM: x86: block KVM_CAP_SYNC_REGS if guest state is protected
KVM: x86: Add infrastructure for secure TSC
KVM: x86: Push down setting vcpu.arch.user_set_tsc
...

show more ...


# 4286a3ec 19-Mar-2025 Paolo Bonzini <pbonzini@redhat.com>

Merge tag 'kvm-x86-mmu-6.15' of https://github.com/kvm-x86/linux into HEAD

KVM x86/mmu changes for 6.15

Add support for "fast" aging of SPTEs in both the TDP MMU and Shadow MMU, where
"fast" means

Merge tag 'kvm-x86-mmu-6.15' of https://github.com/kvm-x86/linux into HEAD

KVM x86/mmu changes for 6.15

Add support for "fast" aging of SPTEs in both the TDP MMU and Shadow MMU, where
"fast" means "without holding mmu_lock". Not taking mmu_lock allows multiple
aging actions to run in parallel, and more importantly avoids stalling vCPUs,
e.g. due to holding mmu_lock for an extended duration while a vCPU is faulting
in memory.

For the TDP MMU, protect aging via RCU; the page tables are RCU-protected and
KVM doesn't need to access any metadata to age SPTEs.

For the Shadow MMU, use bit 1 of rmap pointers (bit 0 is used to terminate a
list of rmaps) to implement a per-rmap single-bit spinlock. When aging a gfn,
acquire the rmap's spinlock with read-only permissions, which allows hardening
and optimizing the locking and aging, e.g. locking an rmap for write requires
mmu_lock to also be held. The lock is NOT a true R/W spinlock, i.e. multiple
concurrent readers aren't supported.

To avoid forcing all SPTE updates to use atomic operations (clearing the
Accessed bit out of mmu_lock makes it inherently volatile), rework and rename
spte_has_volatile_bits() to spte_needs_atomic_update() and deliberately exclude
the Accessed bit. KVM (and mm/) already tolerates false positives/negatives
for Accessed information, and all testing has shown that reducing the latency
of aging is far more beneficial to overall system performance than providing
"perfect" young/old information.

show more ...


Revision tags: v6.14-rc7, v6.14-rc6
# 0d20742b 06-Mar-2025 Paolo Bonzini <pbonzini@redhat.com>

Merge branch 'kvm-tdx-initialization' into HEAD

This series kicks off the actual interaction of KVM with the TDX module.
This series encompasses the basic setup for using the TDX module from KVM,
an

Merge branch 'kvm-tdx-initialization' into HEAD

This series kicks off the actual interaction of KVM with the TDX module.
This series encompasses the basic setup for using the TDX module from KVM,
and the creation of TD VMs and vCPUs.

The TDX Module is a software component that runs in a special CPU mode
called SEAM (Secure Arbitration Mode). Loading it is mostly handled
outside of KVM by the core kernel. Once it’s loaded KVM can interact with
the TDX Module via a new instruction called SEAMCALL to virtualize a TD
guests. This instruction can be used to make various types of seamcalls,
with names organized into a hierarchy. The format is TDH.[AREA].[ACTION],
where “TDH” stands for “Trust Domain Host”, and differentiates from
another set of calls that can be done by the guest “TDG”. The KVM relevant
areas of SEAMCALLs are:
SYS – TDX module management, static metadata reading.
MNG – TD management. VM scoped things that operate on a TDX module
controlled structure called the TDCS.
VP – vCPU management. vCPU scoped things that operate on TDX module
controlled structures called the TDVPS.
PHYMEM - Operations related to physical memory management (page
reclaiming, cache operations, etc).

This series introduces some TDX specific KVM APIs and stops short of
fully “finalizing” the creation of a TD VM. The part of initializing
a guest where initial private memory is loaded is left to a separate
MMU related series.

show more ...


Revision tags: v6.14-rc5
# 8d032b68 25-Feb-2025 Isaku Yamahata <isaku.yamahata@intel.com>

KVM: TDX: create/destroy VM structure

Implement managing the TDX private KeyID to implement, create, destroy
and free for a TDX guest.

When creating at TDX guest, assign a TDX private KeyID for the

KVM: TDX: create/destroy VM structure

Implement managing the TDX private KeyID to implement, create, destroy
and free for a TDX guest.

When creating at TDX guest, assign a TDX private KeyID for the TDX guest
for memory encryption, and allocate pages for the guest. These are used
for the Trust Domain Root (TDR) and Trust Domain Control Structure (TDCS).

On destruction, free the allocated pages, and the KeyID.

Before tearing down the private page tables, TDX requires the guest TD to
be destroyed by reclaiming the KeyID. Do it in the vm_pre_destroy() kvm_x86_ops
hook. The TDR control structures can be freed in the vm_destroy() hook,
which runs last.

Co-developed-by: Tony Lindgren <tony.lindgren@linux.intel.com>
Signed-off-by: Tony Lindgren <tony.lindgren@linux.intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Co-developed-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
Co-developed-by: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
Co-developed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
---
- Fix build issue in kvm-coco-queue
- Init ret earlier to fix __tdx_td_init() error handling. (Chao)
- Standardize -EAGAIN for __tdx_td_init() retry errors (Rick)
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

show more ...


12345678910>>...61