xref: /linux/Documentation/driver-api/cxl/allocation/page-allocator.rst (revision 419dc40b82374cc7c417f0af613b9e6ea1d34095)
1.. SPDX-License-Identifier: GPL-2.0
2
3==================
4The Page Allocator
5==================
6
7The kernel page allocator services all general page allocation requests, such
8as :code:`kmalloc`.  CXL configuration steps affect the behavior of the page
9allocator based on the selected `Memory Zone` and `NUMA node` the capacity is
10placed in.
11
12This section mostly focuses on how these configurations affect the page
13allocator (as of Linux v6.15) rather than the overall page allocator behavior.
14
15NUMA nodes and mempolicy
16========================
17Unless a task explicitly registers a mempolicy, the default memory policy
18of the linux kernel is to allocate memory from the `local NUMA node` first,
19and fall back to other nodes only if the local node is pressured.
20
21Generally, we expect to see local DRAM and CXL memory on separate NUMA nodes,
22with the CXL memory being non-local.  Technically, however, it is possible
23for a compute node to have no local DRAM, and for CXL memory to be the
24`local` capacity for that compute node.
25
26
27Memory Zones
28============
29CXL capacity may be onlined in :code:`ZONE_NORMAL` or :code:`ZONE_MOVABLE`.
30
31As of v6.15, the page allocator attempts to allocate from the highest
32available and compatible ZONE for an allocation from the local node first.
33
34An example of a `zone incompatibility` is attempting to service an allocation
35marked :code:`GFP_KERNEL` from :code:`ZONE_MOVABLE`.  Kernel allocations are
36typically not migratable, and as a result can only be serviced from
37:code:`ZONE_NORMAL` or lower.
38
39To simplify this, the page allocator will prefer :code:`ZONE_MOVABLE` over
40:code:`ZONE_NORMAL` by default, but if :code:`ZONE_MOVABLE` is depleted, it
41will fallback to allocate from :code:`ZONE_NORMAL`.
42
43
44Zone and Node Quirks
45====================
46Let's consider a configuration where the local DRAM capacity is largely onlined
47into :code:`ZONE_NORMAL`, with no :code:`ZONE_MOVABLE` capacity present. The
48CXL capacity has the opposite configuration - all onlined in
49:code:`ZONE_MOVABLE`.
50
51Under the default allocation policy, the page allocator will completely skip
52:code:`ZONE_MOVABLE` as a valid allocation target.  This is because, as of
53Linux v6.15, the page allocator does (approximately) the following: ::
54
55  for (each zone in local_node):
56
57    for (each node in fallback_order):
58
59      attempt_allocation(gfp_flags);
60
61Because the local node does not have :code:`ZONE_MOVABLE`, the CXL node is
62functionally unreachable for direct allocation.  As a result, the only way
63for CXL capacity to be used is via `demotion` in the reclaim path.
64
65This configuration also means that if the DRAM ndoe has :code:`ZONE_MOVABLE`
66capacity - when that capacity is depleted, the page allocator will actually
67prefer CXL :code:`ZONE_MOVABLE` pages over DRAM :code:`ZONE_NORMAL` pages.
68
69We may wish to invert this priority in future Linux versions.
70
71If `demotion` and `swap` are disabled, Linux will begin to cause OOM crashes
72when the DRAM nodes are depleted. See the reclaim section for more details.
73
74
75CGroups and CPUSets
76===================
77Finally, assuming CXL memory is reachable via the page allocation (i.e. onlined
78in :code:`ZONE_NORMAL`), the :code:`cpusets.mems_allowed` may be used by
79containers to limit the accessibility of certain NUMA nodes for tasks in that
80container.  Users may wish to utilize this in multi-tenant systems where some
81tasks prefer not to use slower memory.
82
83In the reclaim section we'll discuss some limitations of this interface to
84prevent demotions of shared data to CXL memory (if demotions are enabled).
85
86