tdp_iter.c - OpenGrok history log for /linux/arch/x86/kvm/mmu/tdp

Revision (<<< Hide revision tags) (Show revision tags >>>)	Date	Author	Comments
Revision tags: v6.16, v6.16-rc7, v6.16-rc6, v6.16-rc5, v6.16-rc4, v6.16-rc3, v6.16-rc2, v6.16-rc1, v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2
# 1260ed77	08-Apr-2025	Thomas Zimmermann <tzimmermann@suse.de>	Merge drm/drm-fixes into drm-misc-fixes Backmerging to get updates from v6.15-rc1. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Revision tags: v6.15-rc1
# 946661e3	05-Apr-2025	Dmitry Torokhov <dmitry.torokhov@gmail.com>	Merge branch 'next' into for-linus Prepare input updates for 6.15 merge window.
# b3cc7428	26-Mar-2025	Jiri Kosina <jkosina@suse.com>	Merge branch 'for-6.15/amd_sfh' into for-linus From: Mario Limonciello <mario.limonciello@amd.com> Some platforms include a human presence detection (HPD) sensor. When enabled and a user is detecte Merge branch 'for-6.15/amd_sfh' into for-linus From: Mario Limonciello <mario.limonciello@amd.com> Some platforms include a human presence detection (HPD) sensor. When enabled and a user is detected a wake event will be emitted from the sensor fusion hub that software can react to. Example use cases are "wake from suspend on approach" or to "lock when leaving". This is currently enabled by default on supported systems, but users can't control it. This essentially means that wake on approach is enabled which is a really surprising behavior to users that don't expect it. Instead of defaulting to enabled add a sysfs knob that users can use to enable the feature if desirable and set it to disabled by default. show more ...
Revision tags: v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5
# 0410c612	28-Feb-2025	Lucas De Marchi <lucas.demarchi@intel.com>	Merge drm/drm-next into drm-xe-next Sync to fix conlicts between drm-xe-next and drm-intel-next. Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
# 0b119045	26-Feb-2025	Dmitry Torokhov <dmitry.torokhov@gmail.com>	Merge tag 'v6.14-rc4' into next Sync up with the mainline.
Revision tags: v6.14-rc4, v6.14-rc3, v6.14-rc2
# 93c7dd1b	06-Feb-2025	Maxime Ripard <mripard@kernel.org>	Merge drm/drm-next into drm-misc-next Bring rc1 to start the new release dev. Signed-off-by: Maxime Ripard <mripard@kernel.org>
# 9e676a02	05-Feb-2025	Namhyung Kim <namhyung@kernel.org>	Merge tag 'v6.14-rc1' into perf-tools-next To get the various fixes in the current master. Signed-off-by: Namhyung Kim <namhyung@kernel.org>
# ea9f8f2b	05-Feb-2025	Jani Nikula <jani.nikula@intel.com>	Merge drm/drm-next into drm-intel-next Sync with v6.14-rc1. Signed-off-by: Jani Nikula <jani.nikula@intel.com>
# c771600c	05-Feb-2025	Tvrtko Ursulin <tursulin@ursulin.net>	Merge drm/drm-next into drm-intel-gt-next We need 4ba4f1afb6a9 ("perf: Generic hotplug support for a PMU with a scope") in order to land a i915 PMU simplification and a fix. That landed in 6.12 and Merge drm/drm-next into drm-intel-gt-next We need 4ba4f1afb6a9 ("perf: Generic hotplug support for a PMU with a scope") in order to land a i915 PMU simplification and a fix. That landed in 6.12 and we are stuck at 6.9 so lets bump things forward. Signed-off-by: Tvrtko Ursulin <tursulin@ursulin.net> show more ...
Revision tags: v6.14-rc1
# 0f8e26b3	25-Jan-2025	Linus Torvalds <torvalds@linux-foundation.org>	Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull kvm updates from Paolo Bonzini: "Loongarch: - Clear LLBCTL if secondary mmu mapping changes - Add hypercall service s Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull kvm updates from Paolo Bonzini: "Loongarch: - Clear LLBCTL if secondary mmu mapping changes - Add hypercall service support for usermode VMM x86: - Add a comment to kvm_mmu_do_page_fault() to explain why KVM performs a direct call to kvm_tdp_page_fault() when RETPOLINE is enabled - Ensure that all SEV code is compiled out when disabled in Kconfig, even if building with less brilliant compilers - Remove a redundant TLB flush on AMD processors when guest CR4.PGE changes - Use str_enabled_disabled() to replace open coded strings - Drop kvm_x86_ops.hwapic_irr_update() as KVM updates hardware's APICv cache prior to every VM-Enter - Overhaul KVM's CPUID feature infrastructure to track all vCPU capabilities instead of just those where KVM needs to manage state and/or explicitly enable the feature in hardware. Along the way, refactor the code to make it easier to add features, and to make it more self-documenting how KVM is handling each feature - Rework KVM's handling of VM-Exits during event vectoring; this plugs holes where KVM unintentionally puts the vCPU into infinite loops in some scenarios (e.g. if emulation is triggered by the exit), and brings parity between VMX and SVM - Add pending request and interrupt injection information to the kvm_exit and kvm_entry tracepoints respectively - Fix a relatively benign flaw where KVM would end up redoing RDPKRU when loading guest/host PKRU, due to a refactoring of the kernel helpers that didn't account for KVM's pre-checking of the need to do WRPKRU - Make the completion of hypercalls go through the complete_hypercall function pointer argument, no matter if the hypercall exits to userspace or not. Previously, the code assumed that KVM_HC_MAP_GPA_RANGE specifically went to userspace, and all the others did not; the new code need not special case KVM_HC_MAP_GPA_RANGE and in fact does not care at all whether there was an exit to userspace or not - As part of enabling TDX virtual machines, support support separation of private/shared EPT into separate roots. When TDX will be enabled, operations on private pages will need to go through the privileged TDX Module via SEAMCALLs; as a result, they are limited and relatively slow compared to reading a PTE. The patches included in 6.14 allow KVM to keep a mirror of the private EPT in host memory, and define entries in kvm_x86_ops to operate on external page tables such as the TDX private EPT - The recently introduced conversion of the NX-page reclamation kthread to vhost_task moved the task under the main process. The task is created as soon as KVM_CREATE_VM was invoked and this, of course, broke userspace that didn't expect to see any child task of the VM process until it started creating its own userspace threads. In particular crosvm refuses to fork() if procfs shows any child task, so unbreak it by creating the task lazily. This is arguably a userspace bug, as there can be other kinds of legitimate worker tasks and they wouldn't impede fork(); but it's not like userspace has a way to distinguish kernel worker tasks right now. Should they show as "Kthread: 1" in proc/.../status? x86 - Intel: - Fix a bug where KVM updates hardware's APICv cache of the highest ISR bit while L2 is active, while ultimately results in a hardware-accelerated L1 EOI effectively being lost - Honor event priority when emulating Posted Interrupt delivery during nested VM-Enter by queueing KVM_REQ_EVENT instead of immediately handling the interrupt - Rework KVM's processing of the Page-Modification Logging buffer to reap entries in the same order they were created, i.e. to mark gfns dirty in the same order that hardware marked the page/PTE dirty - Misc cleanups Generic: - Cleanup and harden kvm_set_memory_region(); add proper lockdep assertions when setting memory regions and add a dedicated API for setting KVM-internal memory regions. The API can then explicitly disallow all flags for KVM-internal memory regions - Explicitly verify the target vCPU is online in kvm_get_vcpu() to fix a bug where KVM would return a pointer to a vCPU prior to it being fully online, and give kvm_for_each_vcpu() similar treatment to fix a similar flaw - Wait for a vCPU to come online prior to executing a vCPU ioctl, to fix a bug where userspace could coerce KVM into handling the ioctl on a vCPU that isn't yet onlined - Gracefully handle xarray insertion failures; even though such failures are impossible in practice after xa_reserve(), reserving an entry is always followed by xa_store() which does not know (or differentiate) whether there was an xa_reserve() before or not RISC-V: - Zabha, Svvptc, and Ziccrse extension support for guests. None of them require anything in KVM except for detecting them and marking them as supported; Zabha adds byte and halfword atomic operations, while the others are markers for specific operation of the TLB and of LL/SC instructions respectively - Virtualize SBI system suspend extension for Guest/VM - Support firmware counters which can be used by the guests to collect statistics about traps that occur in the host Selftests: - Rework vcpu_get_reg() to return a value instead of using an out-param, and update all affected arch code accordingly - Convert the max_guest_memory_test into a more generic mmu_stress_test. The basic gist of the "conversion" is to have the test do mprotect() on guest memory while vCPUs are accessing said memory, e.g. to verify KVM and mmu_notifiers are working as intended - Play nice with treewrite builds of unsupported architectures, e.g. arm (32-bit), as KVM selftests' Makefile doesn't do anything to ensure the target architecture is actually one KVM selftests supports - Use the kernel's $(ARCH) definition instead of the target triple for arch specific directories, e.g. arm64 instead of aarch64, mainly so as not to be different from the rest of the kernel - Ensure that format strings for logging statements are checked by the compiler even when the logging statement itself is disabled - Attempt to whack the last LLC references/misses mole in the Intel PMU counters test by adding a data load and doing CLFLUSH{OPT} on the data instead of the code being executed. It seems that modern Intel CPUs have learned new code prefetching tricks that bypass the PMU counters - Fix a flaw in the Intel PMU counters test where it asserts that events are counting correctly without actually knowing what the events count given the underlying hardware; this can happen if Intel reuses a formerly microarchitecture-specific event encoding as an architectural event, as was the case for Top-Down Slots" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (151 commits) kvm: defer huge page recovery vhost task to later KVM: x86/mmu: Return RET_PF* instead of 1 in kvm_mmu_page_fault() KVM: Disallow all flags for KVM-internal memslots KVM: x86: Drop double-underscores from __kvm_set_memory_region() KVM: Add a dedicated API for setting KVM-internal memslots KVM: Assert slots_lock is held when setting memory regions KVM: Open code kvm_set_memory_region() into its sole caller (ioctl() API) LoongArch: KVM: Add hypercall service support for usermode VMM LoongArch: KVM: Clear LLBCTL if secondary mmu mapping is changed KVM: SVM: Use str_enabled_disabled() helper in svm_hardware_setup() KVM: VMX: read the PML log in the same order as it was written KVM: VMX: refactor PML terminology KVM: VMX: Fix comment of handle_vmx_instruction() KVM: VMX: Reinstate __exit attribute for vmx_exit() KVM: SVM: Use str_enabled_disabled() helper in sev_hardware_setup() KVM: x86: Avoid double RDPKRU when loading host/guest PKRU KVM: x86: Use LVT_TIMER instead of an open coded literal RISC-V: KVM: Add new exit statstics for redirected traps RISC-V: KVM: Update firmware counters for various events RISC-V: KVM: Redirect instruction access fault trap to guest ... show more ...
# 86eb1aef	20-Jan-2025	Paolo Bonzini <pbonzini@redhat.com>	Merge branch 'kvm-mirror-page-tables' into HEAD As part of enabling TDX virtual machines, support support separation of private/shared EPT into separate roots. Confidential computing solutions almo Merge branch 'kvm-mirror-page-tables' into HEAD As part of enabling TDX virtual machines, support support separation of private/shared EPT into separate roots. Confidential computing solutions almost invariably have concepts of private and shared memory, but they may different a lot in the details. In SEV, for example, the bit is handled more like a permission bit as far as the page tables are concerned: the private/shared bit is not included in the physical address. For TDX, instead, the bit is more like a physical address bit, with the host mapping private memory in one half of the address space and shared in another. Furthermore, the two halves are mapped by different EPT roots and only the shared half is managed by KVM; the private half (also called Secure EPT in Intel documentation) gets managed by the privileged TDX Module via SEAMCALLs. As a result, the operations that actually change the private half of the EPT are limited and relatively slow compared to reading a PTE. For this reason the design for KVM is to keep a mirror of the private EPT in host memory. This allows KVM to quickly walk the EPT and only perform the slower private EPT operations when it needs to actually modify mid-level private PTEs. There are thus three sets of EPT page tables: external, mirror and direct. In the case of TDX (the only user of this framework) the first two cover private memory, whereas the third manages shared memory: external EPT - Hidden within the TDX module, modified via TDX module calls. mirror EPT - Bookkeeping tree used as an optimization by KVM, not used by the processor. direct EPT - Normal EPT that maps unencrypted shared memory. Managed like the EPT of a normal VM. Modifying external EPT ---------------------- Modifications to the mirrored page tables need to also perform the same operations to the private page tables, which will be handled via kvm_x86_ops. Although this prep series does not interact with the TDX module at all to actually configure the private EPT, it does lay the ground work for doing this. In some ways updating the private EPT is as simple as plumbing PTE modifications through to also call into the TDX module; however, the locking is more complicated because inserting a single PTE cannot anymore be done atomically with a single CMPXCHG. For this reason, the existing FROZEN_SPTE mechanism is used whenever a call to the TDX module updates the private EPT. FROZEN_SPTE acts basically as a spinlock on a PTE. Besides protecting operation of KVM, it limits the set of cases in which the TDX module will encounter contention on its own PTE locks. Zapping external EPT -------------------- While the framework tries to be relatively generic, and to be understandable without knowing TDX much in detail, some requirements of TDX sometimes leak; for example the private page tables also cannot be zapped while the range has anything mapped, so the mirrored/private page tables need to be protected from KVM operations that zap any non-leaf PTEs, for example kvm_mmu_reset_context() or kvm_mmu_zap_all_fast(). For normal VMs, guest memory is zapped for several reasons: user memory getting paged out by the guest, memslots getting deleted, passthrough of devices with non-coherent DMA. Confidential computing adds to these the conversion of memory between shared and privates. These operations must not zap any private memory that is in use by the guest. This is possible because the only zapping that is out of the control of KVM/userspace is paging out userspace memory, which cannot apply to guestmemfd operations. Thus a TDX VM will only zap private memory from memslot deletion and from conversion between private and shared memory which is triggered by the guest. To avoid zapping too much memory, enums are introduced so that operations can choose to target only private or shared memory, and thus only direct or mirror EPT. For example: Memslot deletion - Private and shared MMU notifier based zapping - Shared only Conversion to shared - Private only Conversion to private - Shared only Other cases of zapping will not be supported for KVM, for example APICv update or non-coherent DMA status update; for the latter, TDX will simply require that the CPU supports self-snoop and honor guest PAT unconditionally for shared memory. show more ...
Revision tags: v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1
# 3fc3f718	18-Jul-2024	Isaku Yamahata <isaku.yamahata@intel.com>	KVM: x86/mmu: Support GFN direct bits Teach the MMU to map guest GFNs at a massaged position on the TDP, to aid in implementing TDX shared memory. Like other Coco technologies, TDX has the concept KVM: x86/mmu: Support GFN direct bits Teach the MMU to map guest GFNs at a massaged position on the TDP, to aid in implementing TDX shared memory. Like other Coco technologies, TDX has the concept of private and shared memory. For TDX the private and shared mappings are managed on separate EPT roots. The private half is managed indirectly through calls into a protected runtime environment called the TDX module, where the shared half is managed within KVM in normal page tables. For TDX, the shared half will be mapped in the higher alias, with a "shared bit" set in the GPA. However, KVM will still manage it with the same memslots as the private half. This means memslot looks ups and zapping operations will be provided with a GFN without the shared bit set. So KVM will either need to apply or strip the shared bit before mapping or zapping the shared EPT. Having GFNs sometimes have the shared bit and sometimes not would make the code confusing. So instead arrange the code such that GFNs never have shared bit set. Create a concept of "direct bits", that is stripped from the fault address when setting fault->gfn, and applied within the TDP MMU iterator. Calling code will behave as if it is operating on the PTE mapping the GFN (without shared bits) but within the iterator, the actual mappings will be shifted using bits specific for the root. SPs will have the GFN set without the shared bit. In the end the TDP MMU will behave like it is mapping things at the GFN without the shared bit but with a strange page table format where everything is offset by the shared bit. Since TDX only needs to shift the mapping like this for the shared bit, which is mapped as the normal TDP root, add a "gfn_direct_bits" field to the kvm_arch structure for each VM with a default value of 0. It will have the bit set at the position of the GPA shared bit in GFN through TD specific initialization code. Keep TDX specific concepts out of the MMU code by not naming it "shared". Ranged TLB flushes (i.e. flush_remote_tlbs_range()) target specific GFN ranges. In convention established above, these would need to target the shifted GFN range. It won't matter functionally, since the actual implementation will always result in a full flush for the only planned user (TDX). For correctness reasons, future changes can provide a TDX x86_ops.flush_remote_tlbs_range implementation to return -EOPNOTSUPP and force the full flush for TDs. This leaves one problem. Some operations use a concept of max GFN (i.e. kvm_mmu_max_gfn()), to iterate over the whole TDP range. When applying the direct mask to the start of the range, the iterator would end up skipping iterating over the range not covered by the direct mask bit. For safety, make sure the __tdp_mmu_zap_root() operation iterates over the full GFN range supported by the underlying TDP format. Add a new iterator helper, for_each_tdp_pte_min_level_all(), that iterates the entire TDP GFN range, regardless of root. Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> Co-developed-by: Yan Zhao <yan.y.zhao@intel.com> Signed-off-by: Yan Zhao <yan.y.zhao@intel.com> Co-developed-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Message-ID: <20240718211230.1492011-9-rick.p.edgecombe@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> show more ...
# a23e1966	15-Jul-2024	Dmitry Torokhov <dmitry.torokhov@gmail.com>	Merge branch 'next' into for-linus Prepare input updates for 6.11 merge window.
Revision tags: v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2
# 6f47c7ae	28-May-2024	Dmitry Torokhov <dmitry.torokhov@gmail.com>	Merge tag 'v6.9' into next Sync up with the mainline to bring in the new cleanup API.
Revision tags: v6.10-rc1
# 60a2f25d	16-May-2024	Tvrtko Ursulin <tursulin@ursulin.net>	Merge drm/drm-next into drm-intel-gt-next Some display refactoring patches are needed in order to allow conflict- less merging. Signed-off-by: Tvrtko Ursulin <tursulin@ursulin.net>
Revision tags: v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1, v6.8, v6.8-rc7
# 06d07429	29-Feb-2024	Jani Nikula <jani.nikula@intel.com>	Merge drm/drm-next into drm-intel-next Sync to get the drm_printer changes to drm-intel-next. Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Revision tags: v6.8-rc6, v6.8-rc5
# 03c11eb3	14-Feb-2024	Ingo Molnar <mingo@kernel.org>	Merge tag 'v6.8-rc4' into x86/percpu, to resolve conflicts and refresh the branch Conflicts: arch/x86/include/asm/percpu.h arch/x86/include/asm/text-patching.h Signed-off-by: Ingo Molnar <mingo@k Merge tag 'v6.8-rc4' into x86/percpu, to resolve conflicts and refresh the branch Conflicts: arch/x86/include/asm/percpu.h arch/x86/include/asm/text-patching.h Signed-off-by: Ingo Molnar <mingo@kernel.org> show more ...
# 41c177cf	11-Feb-2024	Rob Clark <robdclark@chromium.org>	Merge tag 'drm-misc-next-2024-02-08' into msm-next Merge the drm-misc tree to uprev MSM CI. Signed-off-by: Rob Clark <robdclark@chromium.org>
Revision tags: v6.8-rc4, v6.8-rc3
# 4db102dc	29-Jan-2024	Maxime Ripard <mripard@kernel.org>	Merge drm/drm-next into drm-misc-next Kickstart 6.9 development cycle. Signed-off-by: Maxime Ripard <mripard@kernel.org>
Revision tags: v6.8-rc2
# 42ac0be1	26-Jan-2024	Ingo Molnar <mingo@kernel.org>	Merge branch 'linus' into x86/mm, to refresh the branch and pick up fixes Signed-off-by: Ingo Molnar <mingo@kernel.org>
# be3382ec	22-Jan-2024	Lucas De Marchi <lucas.demarchi@intel.com>	Merge drm/drm-next into drm-xe-next Sync to v6.8-rc1. Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
# cf79f291	22-Jan-2024	Maxime Ripard <mripard@kernel.org>	Merge v6.8-rc1 into drm-misc-fixes Let's kickstart the 6.8 fix cycle. Signed-off-by: Maxime Ripard <mripard@kernel.org>
# a23e1966	15-Jul-2024	Dmitry Torokhov <dmitry.torokhov@gmail.com>	Merge branch 'next' into for-linus Prepare input updates for 6.11 merge window.
Revision tags: v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2
# 6f47c7ae	28-May-2024	Dmitry Torokhov <dmitry.torokhov@gmail.com>	Merge tag 'v6.9' into next Sync up with the mainline to bring in the new cleanup API.
Revision tags: v6.10-rc1
# 60a2f25d	16-May-2024	Tvrtko Ursulin <tursulin@ursulin.net>	Merge drm/drm-next into drm-intel-gt-next Some display refactoring patches are needed in order to allow conflict- less merging. Signed-off-by: Tvrtko Ursulin <tursulin@ursulin.net>
12 3 4 5 6 7 8 9 10 >>...13