| #
5f0ab9d9
|
| 12-Mar-2026 |
Zhenlei Huang <zlei@FreeBSD.org> |
amd64: Make start_all_aps() static
It is not used elsewhere since the change [1].
[1] ac3ede5371af x86/xen: remove PVHv1 code
MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D
amd64: Make start_all_aps() static
It is not used elsewhere since the change [1].
[1] ac3ede5371af x86/xen: remove PVHv1 code
MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D55668
show more ...
|
| #
190d0a96
|
| 26-Oct-2025 |
Justin Hibbits <jhibbits@FreeBSD.org> |
amd64: Add cpu_stop() support to go UP after SMP
Reviewed by: kib Sponsored by: Juniper Networks, Inc. Differential Revision: https://reviews.freebsd.org/D51622
|
| #
fa02551d
|
| 21-Jul-2025 |
Mark Johnston <markj@FreeBSD.org> |
amd64: Remove support for "nooptions SMP"
It does not appear to get much, if any, testing, and doesn't seem to be worth the maintenance overhead. Virtually all amd64 hardware has multiple cores. T
amd64: Remove support for "nooptions SMP"
It does not appear to get much, if any, testing, and doesn't seem to be worth the maintenance overhead. Virtually all amd64 hardware has multiple cores. The CPU and memory usage overhead of the SMP option in single-vCPU VMs is quite marginal and not worth maintaining.
Reviewed by: alc (pmap.c), kib Differential Revision: https://reviews.freebsd.org/D51403 Differential Revision: https://reviews.freebsd.org/D51345
show more ...
|
| #
95ee2897
|
| 16-Aug-2023 |
Warner Losh <imp@FreeBSD.org> |
sys: Remove $FreeBSD$: two-line .h pattern
Remove /^\s*\*\n \*\s+\$FreeBSD\$$\n/
|
| #
d6717f87
|
| 10-Jul-2021 |
Konstantin Belousov <kib@FreeBSD.org> |
amd64: rework AP startup
Stop using temporal page table with 1:1 mapping of low 1G populated over the whole VA. Use 1:1 mapping of low 4G temporarily installed in the normal kernel page table.
The
amd64: rework AP startup
Stop using temporal page table with 1:1 mapping of low 1G populated over the whole VA. Use 1:1 mapping of low 4G temporarily installed in the normal kernel page table.
The features are: - now there is one less step for startup asm to perform - the startup code still needs to be at lower 1G because CPU starts in real mode. But everything else can be located anywhere in low 4G because it is accessed by non-paged 32bit protected mode. Note that kernel page table root page is at low 4G, as well as the kernel itself. - the page table pages can be allocated by normal allocator, there is no need to carve them from the phys_avail segments at very early time. The allocation of the page for startup code still requires some magic. Pages are freed after APs are ignited. - la57 startup for APs is less tricky, we directly load the final page table and do not need to tweak the paging mode.
Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D31121
show more ...
|
| #
ac3ede53
|
| 12-May-2021 |
Roger Pau Monné <royger@FreeBSD.org> |
x86/xen: remove PVHv1 code
PVHv1 was officially removed from Xen in 4.9, so just axe the related code from FreeBSD.
Note FreeBSD supports PVHv2, which is the replacement for PVHv1.
Sponsored by: C
x86/xen: remove PVHv1 code
PVHv1 was officially removed from Xen in 4.9, so just axe the related code from FreeBSD.
Note FreeBSD supports PVHv2, which is the replacement for PVHv1.
Sponsored by: Citrix Systems R&D Reviewed by: kib, Elliott Mitchell Differential Revision: https://reviews.freebsd.org/D30228
show more ...
|
| #
c7aa572c
|
| 31-Jul-2020 |
Glen Barber <gjb@FreeBSD.org> |
MFH
Sponsored by: Rubicon Communications, LLC (netgate.com)
|
| #
aba10e13
|
| 25-Jul-2020 |
Alexander Motin <mav@FreeBSD.org> |
Allow swi_sched() to be called from NMI context.
For purposes of handling hardware error reported via NMIs I need a way to escape NMI context, being too restrictive to do something significant.
To
Allow swi_sched() to be called from NMI context.
For purposes of handling hardware error reported via NMIs I need a way to escape NMI context, being too restrictive to do something significant.
To do it this change introduces new swi_sched() flag SWI_FROMNMI, making it careful about used KPIs. On platforms allowing IPI sending from NMI context (x86 for now) it immediately wakes clk_intr_event via new IPI_SWI, otherwise it works just like SWI_DELAY. To handle the delayed SWIs this patch calls clk_intr_event on every hardclock() tick.
MFC after: 2 weeks Sponsored by: iXsystems, Inc. Differential Revision: https://reviews.freebsd.org/D25754
show more ...
|
| #
e2c0e292
|
| 16-Jul-2020 |
Glen Barber <gjb@FreeBSD.org> |
MFH
Sponsored by: Rubicon Communications, LLC (netgate.com)
|
| #
dc43978a
|
| 14-Jul-2020 |
Konstantin Belousov <kib@FreeBSD.org> |
amd64: allow parallel shootdown IPIs
Stop using smp_ipi_mtx to protect global shootdown state, and move/multiply the global state into pcpu. Now each CPU can initiate shootdown IPI independently fr
amd64: allow parallel shootdown IPIs
Stop using smp_ipi_mtx to protect global shootdown state, and move/multiply the global state into pcpu. Now each CPU can initiate shootdown IPI independently from other CPUs. Initiator enters critical section, then fills its local PCPU shootdown info (pc_smp_tlb_XXX), then clears scoreboard generation at location (cpu, my_cpuid) for each target cpu. After that IPI is sent to all targets which scan for zeroed scoreboard generation words. Upon finding such word the shootdown data is read from corresponding cpu' pcpu, and generation is set. Meantime initiator loops waiting for all zeroed generations in scoreboard to update.
Initiator does not disable interrupts, which should allow non-invalidation IPIs from deadlocking, it only needs to disable preemption to pin itself to the instance of the pcpu smp_tlb data.
The generation is set before the actual invalidation is performed in handler. It is safe because target CPU cannot return to userspace before handler finishes. In principle only NMI can preempt the handler, but NMI would see the kernel handler frame and not touch not-invalidated user page table.
Handlers loop until they do not see zeroed scoreboard generations. This, together with hardware keeping one pending IPI in LAPIC IRR should prevent lost shootdowns.
Notes. 1. The code does protect writes to LAPIC ICR with exclusion. I believe this is fine because we in fact do not send IPIs from interrupt handlers. More for !x2APIC mode where ICR access for write requires two registers write, we disable interrupts around it. If considered incorrect, I can add per-cpu spinlock around ipi_send(). 2. Scoreboard lines owned by given target CPU can be padded to the cache line, to reduce ping-pong.
Reviewed by: markj (previous version) Discussed with: alc Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 3 weeks Differential revision: https://reviews.freebsd.org/D25510
show more ...
|
| #
190d0a96
|
| 26-Oct-2025 |
Justin Hibbits <jhibbits@FreeBSD.org> |
amd64: Add cpu_stop() support to go UP after SMP
Reviewed by: kib Sponsored by: Juniper Networks, Inc. Differential Revision: https://reviews.freebsd.org/D51622
|
| #
fa02551d
|
| 21-Jul-2025 |
Mark Johnston <markj@FreeBSD.org> |
amd64: Remove support for "nooptions SMP"
It does not appear to get much, if any, testing, and doesn't seem to be worth the maintenance overhead. Virtually all amd64 hardware has multiple cores. T
amd64: Remove support for "nooptions SMP"
It does not appear to get much, if any, testing, and doesn't seem to be worth the maintenance overhead. Virtually all amd64 hardware has multiple cores. The CPU and memory usage overhead of the SMP option in single-vCPU VMs is quite marginal and not worth maintaining.
Reviewed by: alc (pmap.c), kib Differential Revision: https://reviews.freebsd.org/D51403 Differential Revision: https://reviews.freebsd.org/D51345
show more ...
|
| #
95ee2897
|
| 16-Aug-2023 |
Warner Losh <imp@FreeBSD.org> |
sys: Remove $FreeBSD$: two-line .h pattern
Remove /^\s*\*\n \*\s+\$FreeBSD\$$\n/
|
| #
d6717f87
|
| 10-Jul-2021 |
Konstantin Belousov <kib@FreeBSD.org> |
amd64: rework AP startup
Stop using temporal page table with 1:1 mapping of low 1G populated over the whole VA. Use 1:1 mapping of low 4G temporarily installed in the normal kernel page table.
The
amd64: rework AP startup
Stop using temporal page table with 1:1 mapping of low 1G populated over the whole VA. Use 1:1 mapping of low 4G temporarily installed in the normal kernel page table.
The features are: - now there is one less step for startup asm to perform - the startup code still needs to be at lower 1G because CPU starts in real mode. But everything else can be located anywhere in low 4G because it is accessed by non-paged 32bit protected mode. Note that kernel page table root page is at low 4G, as well as the kernel itself. - the page table pages can be allocated by normal allocator, there is no need to carve them from the phys_avail segments at very early time. The allocation of the page for startup code still requires some magic. Pages are freed after APs are ignited. - la57 startup for APs is less tricky, we directly load the final page table and do not need to tweak the paging mode.
Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D31121
show more ...
|
| #
ac3ede53
|
| 12-May-2021 |
Roger Pau Monné <royger@FreeBSD.org> |
x86/xen: remove PVHv1 code
PVHv1 was officially removed from Xen in 4.9, so just axe the related code from FreeBSD.
Note FreeBSD supports PVHv2, which is the replacement for PVHv1.
Sponsored by: C
x86/xen: remove PVHv1 code
PVHv1 was officially removed from Xen in 4.9, so just axe the related code from FreeBSD.
Note FreeBSD supports PVHv2, which is the replacement for PVHv1.
Sponsored by: Citrix Systems R&D Reviewed by: kib, Elliott Mitchell Differential Revision: https://reviews.freebsd.org/D30228
show more ...
|
| #
c7aa572c
|
| 31-Jul-2020 |
Glen Barber <gjb@FreeBSD.org> |
MFH
Sponsored by: Rubicon Communications, LLC (netgate.com)
|
| #
aba10e13
|
| 25-Jul-2020 |
Alexander Motin <mav@FreeBSD.org> |
Allow swi_sched() to be called from NMI context.
For purposes of handling hardware error reported via NMIs I need a way to escape NMI context, being too restrictive to do something significant.
To
Allow swi_sched() to be called from NMI context.
For purposes of handling hardware error reported via NMIs I need a way to escape NMI context, being too restrictive to do something significant.
To do it this change introduces new swi_sched() flag SWI_FROMNMI, making it careful about used KPIs. On platforms allowing IPI sending from NMI context (x86 for now) it immediately wakes clk_intr_event via new IPI_SWI, otherwise it works just like SWI_DELAY. To handle the delayed SWIs this patch calls clk_intr_event on every hardclock() tick.
MFC after: 2 weeks Sponsored by: iXsystems, Inc. Differential Revision: https://reviews.freebsd.org/D25754
show more ...
|
| #
e2c0e292
|
| 16-Jul-2020 |
Glen Barber <gjb@FreeBSD.org> |
MFH
Sponsored by: Rubicon Communications, LLC (netgate.com)
|
| #
dc43978a
|
| 14-Jul-2020 |
Konstantin Belousov <kib@FreeBSD.org> |
amd64: allow parallel shootdown IPIs
Stop using smp_ipi_mtx to protect global shootdown state, and move/multiply the global state into pcpu. Now each CPU can initiate shootdown IPI independently fr
amd64: allow parallel shootdown IPIs
Stop using smp_ipi_mtx to protect global shootdown state, and move/multiply the global state into pcpu. Now each CPU can initiate shootdown IPI independently from other CPUs. Initiator enters critical section, then fills its local PCPU shootdown info (pc_smp_tlb_XXX), then clears scoreboard generation at location (cpu, my_cpuid) for each target cpu. After that IPI is sent to all targets which scan for zeroed scoreboard generation words. Upon finding such word the shootdown data is read from corresponding cpu' pcpu, and generation is set. Meantime initiator loops waiting for all zeroed generations in scoreboard to update.
Initiator does not disable interrupts, which should allow non-invalidation IPIs from deadlocking, it only needs to disable preemption to pin itself to the instance of the pcpu smp_tlb data.
The generation is set before the actual invalidation is performed in handler. It is safe because target CPU cannot return to userspace before handler finishes. In principle only NMI can preempt the handler, but NMI would see the kernel handler frame and not touch not-invalidated user page table.
Handlers loop until they do not see zeroed scoreboard generations. This, together with hardware keeping one pending IPI in LAPIC IRR should prevent lost shootdowns.
Notes. 1. The code does protect writes to LAPIC ICR with exclusion. I believe this is fine because we in fact do not send IPIs from interrupt handlers. More for !x2APIC mode where ICR access for write requires two registers write, we disable interrupts around it. If considered incorrect, I can add per-cpu spinlock around ipi_send(). 2. Scoreboard lines owned by given target CPU can be padded to the cache line, to reduce ping-pong.
Reviewed by: markj (previous version) Discussed with: alc Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 3 weeks Differential revision: https://reviews.freebsd.org/D25510
show more ...
|
| #
9dba82a4
|
| 05-Apr-2018 |
Roger Pau Monné <royger@FreeBSD.org> |
x86: improve reservation of AP trampoline memory
So that it doesn't rely on physmap[1] containing an address below 1MiB. Instead scan the full physmap and search for a suitable address to place the
x86: improve reservation of AP trampoline memory
So that it doesn't rely on physmap[1] containing an address below 1MiB. Instead scan the full physmap and search for a suitable address to place the trampoline code (below 1MiB) and the initial memory pages (below 4GiB).
Sponsored by: Citrix Systems R&D Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D14878
show more ...
|
| #
c8f9c1f3
|
| 27-Jan-2018 |
Konstantin Belousov <kib@FreeBSD.org> |
Use PCID to optimize PTI.
Use PCID to avoid complete TLB shootdown when switching between user and kernel mode with PTI enabled.
I use the model close to what I read about KAISER, user-mode PCID ha
Use PCID to optimize PTI.
Use PCID to avoid complete TLB shootdown when switching between user and kernel mode with PTI enabled.
I use the model close to what I read about KAISER, user-mode PCID has 1:1 correspondence to the kernel-mode PCID, by setting bit 11 in PCID. Full kernel-mode TLB shootdown is performed on context switches, since KVA TLB invalidation only works in the current pmap. User-mode part of TLB is flushed on the pmap activations as well.
Similarly, IPI TLB shootdowns must handle both kernel and user address spaces for each address. Note that machines which implement PCID but do not have INVPCID instructions, cause the usual complications in the IPI handlers, due to the need to switch to the target PCID temporary. This is racy, but because for PCID/no-INVPCID we disable the interrupts in pmap_activate_sw(), IPI handler cannot see inconsistent state of CPU PCID vs PCPU pmap/kcr3/ucr3 pointers.
On the other hand, on kernel/user switches, CR3_PCID_SAVE bit is set and we do not clear TLB.
I can imagine alternative use of PCID, where there is only one PCID allocated for the kernel pmap. Then, there is no need to shootdown kernel TLB entries on context switch. But copyout(3) would need to either use method similar to proc_rwmem() to access the userspace data, or (in reverse) provide a temporal mapping for the kernel buffer into user mode PCID and use trampoline for copy.
Reviewed by: markj (previous version) Tested by: pho Discussed with: alc (some aspects) Sponsored by: The FreeBSD Foundation MFC after: 3 weeks Differential revision: https://reviews.freebsd.org/D13985
show more ...
|
| #
bd50262f
|
| 17-Jan-2018 |
Konstantin Belousov <kib@FreeBSD.org> |
PTI for amd64.
The implementation of the Kernel Page Table Isolation (KPTI) for amd64, first version. It provides a workaround for the 'meltdown' vulnerability. PTI is turned off by default for now
PTI for amd64.
The implementation of the Kernel Page Table Isolation (KPTI) for amd64, first version. It provides a workaround for the 'meltdown' vulnerability. PTI is turned off by default for now, enable with the loader tunable vm.pmap.pti=1.
The pmap page table is split into kernel-mode table and user-mode table. Kernel-mode table is identical to the non-PTI table, while usermode table is obtained from kernel table by leaving userspace mappings intact, but only leaving the following parts of the kernel mapped:
kernel text (but not modules text) PCPU GDT/IDT/user LDT/task structures IST stacks for NMI and doublefault handlers.
Kernel switches to user page table before returning to usermode, and restores full kernel page table on the entry. Initial kernel-mode stack for PTI trampoline is allocated in PCPU, it is only 16 qwords. Kernel entry trampoline switches page tables. then the hardware trap frame is copied to the normal kstack, and execution continues.
IST stacks are kept mapped and no trampoline is needed for NMI/doublefault, but of course page table switch is performed.
On return to usermode, the trampoline is used again, iret frame is copied to the trampoline stack, page tables are switched and iretq is executed. The case of iretq faulting due to the invalid usermode context is tricky, since the frame for fault is appended to the trampoline frame. Besides copying the fault frame and original (corrupted) frame to kstack, the fault frame must be patched to make it look as if the fault occured on the kstack, see the comment in doret_iret detection code in trap().
Currently kernel pages which are mapped during trampoline operation are identical for all pmaps. They are registered using pmap_pti_add_kva(). Besides initial registrations done during boot, LDT and non-common TSS segments are registered if user requested their use. In principle, they can be installed into kernel page table per pmap with some work. Similarly, PCPU can be hidden from userspace mapping using trampoline PCPU page, but again I do not see much benefits besides complexity.
PDPE pages for the kernel half of the user page tables are pre-allocated during boot because we need to know pml4 entries which are copied to the top-level paging structure page, in advance on a new pmap creation. I enforce this to avoid iterating over the all existing pmaps if a new PDPE page is needed for PTI kernel mappings. The iteration is a known problematic operation on i386.
The need to flush hidden kernel translations on the switch to user mode make global tables (PG_G) meaningless and even harming, so PG_G use is disabled for PTI case. Our existing use of PCID is incompatible with PTI and is automatically disabled if PTI is enabled. PCID can be forced on only for developer's benefit.
MCE is known to be broken, it requires IST stack to operate completely correctly even for non-PTI case, and absolutely needs dedicated IST stack because MCE delivery while trampoline did not switched from PTI stack is fatal. The fix is pending.
Reviewed by: markj (partially) Tested by: pho (previous version) Discussed with: jeff, jhb Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
show more ...
|
| #
02ebdc78
|
| 31-Oct-2016 |
Dimitry Andric <dim@FreeBSD.org> |
Merge ^/head r307736 through r308146.
|
| #
0a4c51f4
|
| 31-Oct-2016 |
John Baldwin <jhb@FreeBSD.org> |
Move declarations of invpcid_works and pmap_pcid_enabled to pmap.h.
Previously these were only declared under #ifdef SMP in <machine/smp.h>. However, these variables are defind in pmap.c uncondition
Move declarations of invpcid_works and pmap_pcid_enabled to pmap.h.
Previously these were only declared under #ifdef SMP in <machine/smp.h>. However, these variables are defind in pmap.c unconditionally, and efirt.c references them unconditionally. This fixes non-SMP kernel builds.
Discussed with: kib MFC after: 1 week
show more ...
|
| #
b626f5a7
|
| 04-Jan-2016 |
Glen Barber <gjb@FreeBSD.org> |
MFH r289384-r293170
Sponsored by: The FreeBSD Foundation
|