Lines Matching +full:memory +full:- +full:controller

9 conventions of cgroup v2.  It describes all userland-visible aspects
10 of cgroup including core and specific controller behaviors. All
12 v1 is available under :ref:`Documentation/admin-guide/cgroup-v1/index.rst <cgroup-v1>`.
17 1-1. Terminology
18 1-2. What is cgroup?
20 2-1. Mounting
21 2-2. Organizing Processes and Threads
22 2-2-1. Processes
23 2-2-2. Threads
24 2-3. [Un]populated Notification
25 2-4. Controlling Controllers
26 2-4-1. Enabling and Disabling
27 2-4-2. Top-down Constraint
28 2-4-3. No Internal Process Constraint
29 2-5. Delegation
30 2-5-1. Model of Delegation
31 2-5-2. Delegation Containment
32 2-6. Guidelines
33 2-6-1. Organize Once and Control
34 2-6-2. Avoid Name Collisions
36 3-1. Weights
37 3-2. Limits
38 3-3. Protections
39 3-4. Allocations
41 4-1. Format
42 4-2. Conventions
43 4-3. Core Interface Files
45 5-1. CPU
46 5-1-1. CPU Interface Files
47 5-2. Memory
48 5-2-1. Memory Interface Files
49 5-2-2. Usage Guidelines
50 5-2-3. Memory Ownership
51 5-3. IO
52 5-3-1. IO Interface Files
53 5-3-2. Writeback
54 5-3-3. IO Latency
55 5-3-3-1. How IO Latency Throttling Works
56 5-3-3-2. IO Latency Interface Files
57 5-4. PID
58 5-4-1. PID Interface Files
59 5-5. Cpuset
60 5.5-1. Cpuset Interface Files
61 5-6. Device
62 5-7. RDMA
63 5-7-1. RDMA Interface Files
64 5-8. HugeTLB
65 5.8-1. HugeTLB Interface Files
66 5-8. Misc
67 5-8-1. perf_event
68 5-N. Non-normative information
69 5-N-1. CPU controller root cgroup process behaviour
70 5-N-2. IO controller root cgroup process behaviour
72 6-1. Basics
73 6-2. The Root and Views
74 6-3. Migration and setns(2)
75 6-4. Interaction with Other Namespaces
77 P-1. Filesystem Support for Writeback
80 R-1. Multiple Hierarchies
81 R-2. Thread Granularity
82 R-3. Competition Between Inner Nodes and Threads
83 R-4. Other Interface Issues
84 R-5. Controller Issues and Remedies
85 R-5-1. Memory
92 -----------
101 ---------------
107 cgroup is largely composed of two parts - the core and controllers.
109 processes. A cgroup controller is usually responsible for
122 disabled selectively on a cgroup. All controller behaviors are
123 hierarchical - if a controller is enabled on a cgroup, it affects all
125 sub-hierarchy of the cgroup. When a controller is enabled on a nested
135 --------
140 # mount -t cgroup2 none $MOUNT_POINT
149 A controller can be moved across hierarchies only after the controller
150 is no longer referenced in its current hierarchy. Because per-cgroup
151 controller states are destroyed asynchronously and controllers may
152 have lingering references, a controller may not show up immediately on
154 Similarly, a controller should be fully disabled to be moved out of
156 controller to become available for other hierarchies; furthermore, due
157 to inter-controller dependencies, other controllers may need to be
163 the hierarchies and controller associations before starting using the
179 ignored on non-init namespace mounts. Please refer to the
184 Only populate memory.events with data for the current cgroup,
189 option is ignored on non-init namespace mounts.
193 Recursively apply memory.min and memory.low protection to
198 behavior but is a mount-option to avoid regressing setups
204 --------------------------------
210 A child cgroup can be created by creating a sub-directory::
215 structure. Each cgroup has a read-writable interface file
217 belong to the cgroup one-per-line. The PIDs are not ordered and the
248 0::/test-cgroup/test-cgroup-nested
255 0::/test-cgroup/test-cgroup-nested (deleted)
281 constraint - threaded controllers can be enabled on non-leaf cgroups
305 - As the cgroup will join the parent's resource domain. The parent
308 - When the parent is an unthreaded domain, it must not have any domain
312 Topology-wise, a cgroup can be in an invalid state. Please consider
315 A (threaded domain) - B (threaded) - C (domain, just created)
330 threads in the cgroup. Except that the operations are per-thread
331 instead of per-process, "cgroup.threads" has the same format and
346 a threaded controller is enabled inside a threaded subtree, it only
352 constraint, a threaded controller must be able to handle competition
353 between threads in a non-leaf cgroup and its child cgroups. Each
354 threaded controller defines how such competitions are handled.
358 --------------------------
360 Each non-root cgroup has a "cgroup.events" file which contains
361 "populated" field indicating whether the cgroup's sub-hierarchy has
365 example, to start a clean-up operation after all processes of a given
366 sub-hierarchy have exited. The populated state updates and
367 notifications are recursive. Consider the following sub-hierarchy
371 A(4) - B(0) - C(1)
381 -----------------------
390 cpu io memory
392 No controller is enabled by default. Controllers can be enabled and
395 # echo "+cpu +memory -io" > cgroup.subtree_control
399 all succeed or fail. If multiple operations on the same controller
402 Enabling a controller in a cgroup indicates that the distribution of
404 Consider the following sub-hierarchy. The enabled controllers are
407 A(cpu,memory) - B(memory) - C()
410 As A has "cpu" and "memory" enabled, A will control the distribution
411 of CPU cycles and memory to its children, in this case, B. As B has
412 "memory" enabled but not "CPU", C and D will compete freely on CPU
413 cycles but their division of memory available to B will be controlled.
415 As a controller regulates the distribution of the target resource to
416 the cgroup's children, enabling it creates the controller's interface
418 would create the "cpu." prefixed controller interface files in C and
419 D. Likewise, disabling "memory" from B would remove the "memory."
420 prefixed controller interface files from C and D. This means that the
421 controller interface files - anything which doesn't start with
425 Top-down Constraint
428 Resources are distributed top-down and a cgroup can further distribute
430 parent. This means that all non-root "cgroup.subtree_control" files
432 "cgroup.subtree_control" file. A controller can be enabled only if
433 the parent has the controller enabled and a controller can't be
440 Non-root cgroups can distribute domain resources to their children
445 This guarantees that, when a domain controller is looking at the part
454 is up to each controller (for more information on this topic please
455 refer to the Non-normative information section in the Controllers
459 enabled controller in the cgroup's "cgroup.subtree_control". This is
468 ----------
488 delegated, the user can build sub-hierarchy under the directory,
492 happens in the delegated sub-hierarchy, nothing can escape the
496 cgroups in or nesting depth of a delegated sub-hierarchy; however,
503 A delegated sub-hierarchy is contained in the sense that processes
504 can't be moved into or out of the sub-hierarchy by the delegatee.
507 requiring the following conditions for a process with a non-root euid
511 - The writer must have write access to the "cgroup.procs" file.
513 - The writer must have write access to the "cgroup.procs" file of the
517 processes around freely in the delegated sub-hierarchy it can't pull
518 in from or push out to outside the sub-hierarchy.
524 ~~~~~~~~~~~~~ - C0 - C00
527 ~~~~~~~~~~~~~ - C1 - C10
534 will be denied with -EACCES.
539 is not reachable, the migration is rejected with -ENOENT.
543 ----------
549 and stateful resources such as memory are not moved together with the
551 inherent trade-offs between migration and various hot paths in terms
557 resource structure once on start-up. Dynamic adjustments to resource
558 distribution can be made by changing controller configuration through
570 controller's interface files are prefixed with the controller name and
571 a dot. A controller's name is composed of lower case alphabets and
590 -------
596 work-conserving. Due to the dynamic nature, this model is usually
612 ------
615 Limits can be over-committed - the sum of the limits of children can
620 As limits can be over-committed, all configuration combinations are
629 -----------
634 soft boundaries. Protections can also be over-committed in which case
641 As protections can be over-committed, all configuration combinations
645 "memory.low" implements best-effort memory protection and is an
650 -----------
653 resource. Allocations can't be over-committed - the sum of the
660 As allocations can't be over-committed, some configuration
665 "cpu.rt.max" hard-allocates realtime slices and is an example of this
673 ------
678 New-line separated values
686 (when read-only or multiple values can be written at once)
712 -----------
714 - Settings for a single feature should be contained in a single file.
716 - The root cgroup should be exempt from resource control and thus
719 - The default time unit is microseconds. If a different unit is ever
722 - A parts-per quantity should use a percentage decimal with at least
723 two digit fractional part - e.g. 13.40.
725 - If a controller implements weight based resource distribution, its
731 - If a controller implements an absolute resource guarantee and/or
733 respectively. If a controller implements best effort resource
740 - If a setting has a configurable default value and keyed specific
754 # cat cgroup-example-interface-file
760 # echo 125 > cgroup-example-interface-file
764 # echo "default 125" > cgroup-example-interface-file
768 # echo "8:16 170" > cgroup-example-interface-file
772 # echo "8:0 default" > cgroup-example-interface-file
773 # cat cgroup-example-interface-file
777 - For events which are not very high frequency, an interface file
784 --------------------
790 A read-write single value file which exists on non-root
796 - "domain" : A normal valid domain cgroup.
798 - "domain threaded" : A threaded domain cgroup which is
801 - "domain invalid" : A cgroup which is in an invalid state.
805 - "threaded" : A threaded cgroup which is a member of a
812 A read-write new-line separated values file which exists on
816 the cgroup one-per-line. The PIDs are not ordered and the
825 - It must have write access to the "cgroup.procs" file.
827 - It must have write access to the "cgroup.procs" file of the
830 When delegating a sub-hierarchy, write access to this file
838 A read-write new-line separated values file which exists on
842 the cgroup one-per-line. The TIDs are not ordered and the
851 - It must have write access to the "cgroup.threads" file.
853 - The cgroup that the thread is currently in must be in the
856 - It must have write access to the "cgroup.procs" file of the
859 When delegating a sub-hierarchy, write access to this file
863 A read-only space separated values file which exists on all
870 A read-write space separated values file which exists on all
877 Space separated list of controllers prefixed with '+' or '-'
878 can be written to enable or disable controllers. A controller
879 name prefixed with '+' enables the controller and '-'
880 disables. If a controller appears more than once on the list,
885 A read-only flat-keyed file which exists on non-root cgroups.
897 A read-write single value files. The default is "max".
904 A read-write single value files. The default is "max".
911 A read-only flat-keyed file with the following entries:
929 A read-write single value file which exists on non-root cgroups.
952 create new sub-cgroups.
958 ---
961 controller implements weight and absolute bandwidth limit models for
973 the cpu controller can only be enabled when all RT processes are in
977 before the cpu controller can be enabled.
986 A read-only flat-keyed file.
987 This file exists whether the controller is enabled or not.
991 - usage_usec
992 - user_usec
993 - system_usec
995 and the following three when the controller is enabled:
997 - nr_periods
998 - nr_throttled
999 - throttled_usec
1002 A read-write single value file which exists on non-root
1008 A read-write single value file which exists on non-root
1011 The nice value is in the range [-20, 19].
1020 A read-write two value file which exists on non-root cgroups.
1032 A read-only nested-key file which exists on non-root cgroups.
1038 A read-write single value file which exists on non-root cgroups.
1053 A read-write single value file which exists on non-root cgroups.
1065 Memory section in Controllers
1066 ------
1068 The "memory" controller regulates distribution of memory. Memory is
1070 intertwining between memory usage and reclaim pressure and the
1071 stateful nature of memory, the distribution model is relatively
1074 While not completely water-tight, all major memory usages by a given
1075 cgroup are tracked so that the total memory consumption can be
1077 following types of memory usages are tracked.
1079 - Userland memory - page cache and anonymous memory.
1081 - Kernel data structures such as dentries and inodes.
1083 - TCP socket buffers.
1088 Memory Interface Files argument
1091 All memory amounts are in bytes. If a value which is not aligned to
1095 memory.current
1096 A read-only single value file which exists on non-root
1099 The total amount of memory currently being used by the cgroup
1102 memory.min
1103 A read-write single value file which exists on non-root
1106 Hard memory protection. If the memory usage of a cgroup
1107 is within its effective min boundary, the cgroup's memory
1109 unprotected reclaimable memory available, OOM killer
1115 Effective min boundary is limited by memory.min values of
1116 all ancestor cgroups. If there is memory.min overcommitment
1117 (child cgroup or cgroups are requiring more protected memory
1120 actual memory usage below memory.min.
1122 Putting more memory than generally available under this
1125 If a memory cgroup is not populated with processes,
1126 its memory.min is ignored.
1128 memory.low
1129 A read-write single value file which exists on non-root
1132 Best-effort memory protection. If the memory usage of a
1134 memory won't be reclaimed unless there is no reclaimable
1135 memory available in unprotected cgroups.
1141 Effective low boundary is limited by memory.low values of
1142 all ancestor cgroups. If there is memory.low overcommitment
1143 (child cgroup or cgroups are requiring more protected memory
1146 actual memory usage below memory.low.
1148 Putting more memory than generally available under this
1151 memory.high
1152 A read-write single value file which exists on non-root
1155 Memory usage throttle limit. This is the main mechanism to
1156 control memory usage of a cgroup. If a cgroup's usage goes
1163 memory.max
1164 A read-write single value file which exists on non-root
1167 Memory usage hard limit. This is the final protection
1168 mechanism. If a cgroup's memory usage reaches this limit and
1173 In default configuration regular 0-order allocations always
1178 as -ENOMEM or silently ignore in cases like disk readahead.
1184 memory.oom.group
1185 A read-write single value file which exists on non-root
1191 (if the memory cgroup is not a leaf cgroup) are killed
1195 Tasks with the OOM protection (oom_score_adj set to -1000)
1200 memory.oom.group values of ancestor cgroups.
1202 memory.events
1203 A read-only flat-keyed file which exists on non-root cgroups.
1211 memory.events.local.
1215 high memory pressure even though its usage is under
1217 boundary is over-committed.
1221 throttled and routed to perform direct memory reclaim
1222 because the high memory boundary was exceeded. For a
1223 cgroup whose memory usage is capped by the high limit
1224 rather than global memory pressure, this event's
1228 The number of times the cgroup's memory usage was
1233 The number of time the cgroup's memory usage was
1237 considered as an option, e.g. for failed high-order
1244 memory.events.local
1245 Similar to memory.events but the fields in the file are local
1249 memory.stat
1250 A read-only flat-keyed file which exists on non-root cgroups.
1252 This breaks down the cgroup's memory footprint into different
1253 types of memory, type-specific details, and other information
1254 on the state and past events of the memory management system.
1256 All memory amounts are in bytes.
1262 If the entry has no per-node counter(or not show in the
1263 mempry.numa_stat). We use 'npn'(non-per-node) as the tag
1267 Amount of memory used in anonymous mappings such as
1271 Amount of memory used to cache filesystem data,
1272 including tmpfs and shared memory.
1275 Amount of memory allocated to kernel stacks.
1278 Amount of memory used for storing per-cpu kernel
1282 Amount of memory used in network transmission buffers
1285 Amount of cached filesystem data that is swap-backed,
1300 Amount of memory used in anonymous mappings backed by
1304 Amount of memory, swap-backed and filesystem-backed,
1305 on the internal memory management lists used by the
1309 memory management lists), inactive_foo + active_foo may not be equal to
1310 the value for the foo counter, since the foo counter is type-based, not
1311 list-based.
1318 Part of "slab" that cannot be reclaimed on memory
1322 Amount of memory used for storing in-kernel data
1371 Amount of pages postponed to be freed under memory pressure
1386 memory.numa_stat
1387 A read-only nested-keyed file which exists on non-root cgroups.
1389 This breaks down the cgroup's memory footprint into different
1390 types of memory, type-specific details, and other information
1391 per node on the state of the memory management system.
1399 All memory amounts are in bytes.
1401 The output format of memory.numa_stat is::
1409 The entries can refer to the memory.stat.
1411 memory.swap.current
1412 A read-only single value file which exists on non-root
1418 memory.swap.high
1419 A read-write single value file which exists on non-root
1424 allow userspace to implement custom out-of-memory procedures.
1428 during regular operation. Compare to memory.swap.max, which
1430 continue unimpeded as long as other memory can be reclaimed.
1434 memory.swap.max
1435 A read-write single value file which exists on non-root
1439 limit, anonymous memory of the cgroup will not be swapped out.
1441 memory.swap.events
1442 A read-only flat-keyed file which exists on non-root cgroups.
1458 because of running out of swap system-wide or max
1464 reduces the impact on the workload and memory management.
1466 memory.pressure
1467 A read-only nested-key file which exists on non-root cgroups.
1469 Shows pressure stall information for memory. See
1476 "memory.high" is the main mechanism to control memory usage.
1477 Over-committing on high limit (sum of high limits > available memory)
1478 and letting global memory pressure to distribute memory according to
1484 more memory or terminating the workload.
1486 Determining whether a cgroup has enough memory is not trivial as
1487 memory usage doesn't indicate whether the workload can benefit from
1488 more memory. For example, a workload which writes data received from
1489 network to a file can use all available memory but can also operate as
1490 performant with a small amount of memory. A measure of memory
1491 pressure - how much the workload is being impacted due to lack of
1492 memory - is necessary to determine whether a workload needs more
1493 memory; unfortunately, memory pressure monitoring mechanism isn't
1497 Memory Ownership argument
1500 A memory area is charged to the cgroup which instantiated it and stays
1502 to a different cgroup doesn't move the memory usages that it
1505 A memory area may be used by processes belonging to different cgroups.
1506 To which cgroup the area will be charged is in-deterministic; however,
1507 over time, the memory area is likely to end up in a cgroup which has
1508 enough memory allowance to avoid high reclaim pressure.
1510 If a cgroup sweeps a considerable amount of memory which is expected
1512 POSIX_FADV_DONTNEED to relinquish the ownership of memory areas
1513 belonging to the affected files to ensure correct memory ownership.
1517 --
1519 The "io" controller regulates the distribution of IO resources. This
1520 controller implements both weight based and absolute bandwidth or IOPS
1522 only if cfq-iosched is in use and neither scheme is available for
1523 blk-mq devices.
1530 A read-only nested-keyed file.
1550 A read-write nested-keyed file with exists only on the root
1554 model based controller (CONFIG_BLK_CGROUP_IOCOST) which
1562 enable Weight-based control enable
1572 The controller is disabled by default and can be enabled by
1574 to zero and the controller uses internal device saturation
1582 shows that on sdb, the controller is enabled, will consider
1594 devices which show wide temporary behavior changes - e.g. a
1605 A read-write nested-keyed file with exists only on the root
1609 controller (CONFIG_BLK_CGROUP_IOCOST) which currently
1618 model The cost model in use - "linear"
1644 generate device-specific coefficients.
1647 A read-write flat-keyed file which exists on non-root cgroups.
1667 A read-write nested-keyed file which exists on non-root
1681 When writing, any number of nested key-value pairs can be
1706 A read-only nested-key file which exists on non-root cgroups.
1717 mechanism. Writeback sits between the memory and IO domains and
1718 regulates the proportion of dirty memory by balancing dirtying and
1721 The io controller, in conjunction with the memory controller,
1722 implements control of page cache writeback IOs. The memory controller
1723 defines the memory domain that dirty memory ratio is calculated and
1724 maintained for and the io controller defines the io domain which
1725 writes out dirty pages for the memory domain. Both system-wide and
1726 per-cgroup dirty memory states are examined and the more restrictive
1734 There are inherent differences in memory and writeback management
1735 which affects how cgroup ownership is tracked. Memory is tracked per
1740 As cgroup ownership for memory is tracked per page, there can be pages
1752 As memory controller assigns page ownership on the first use and
1763 amount of available memory capped by limits imposed by the
1764 memory controller and system-wide clean memory.
1768 total available memory and applied the same way as
1775 This is a cgroup v2 controller for IO workload protection. You provide a group
1777 controller will throttle any peers that have a lower latency target than the
1797 your real setting, setting at 10-15% higher than the value in io.stat.
1803 target the controller doesn't do anything. Once a group starts missing its
1807 - Queue depth throttling. This is the number of outstanding IO's a group is
1811 - Artificial delay induction. There are certain types of IO that cannot be
1834 If the controller is enabled you will see extra stats in io.stat in
1852 ---
1854 The process number controller is used to allow a cgroup to stop any
1859 controllers cannot prevent, thus warranting its own controller. For
1861 hitting memory restrictions.
1863 Note that PIDs used in this controller refer to TIDs, process IDs as
1871 A read-write single value file which exists on non-root
1877 A read-only single value file which exists on all cgroups.
1887 through fork() or clone(). These will return -EAGAIN if the creation
1892 ------
1894 The "cpuset" controller provides a mechanism for constraining
1895 the CPU and memory node placement of tasks to only the resources
1899 memory placement to reduce cross-node memory access and contention
1902 The "cpuset" controller is hierarchical. That means the controller
1903 cannot use CPUs or memory nodes not allowed in its parent.
1910 A read-write multiple values file which exists on non-root
1911 cpuset-enabled cgroups.
1918 The CPU numbers are comma-separated numbers or ranges.
1922 0-4,6,8-10
1925 setting as the nearest cgroup ancestor with a non-empty
1932 A read-only multiple values file which exists on all
1933 cpuset-enabled cgroups.
1949 A read-write multiple values file which exists on non-root
1950 cpuset-enabled cgroups.
1952 It lists the requested memory nodes to be used by tasks within
1953 this cgroup. The actual list of memory nodes granted, however,
1955 from the requested memory nodes.
1957 The memory node numbers are comma-separated numbers or ranges.
1961 0-1,3
1964 setting as the nearest cgroup ancestor with a non-empty
1965 "cpuset.mems" or all the available memory nodes if none
1969 and won't be affected by any memory nodes hotplug events.
1972 A read-only multiple values file which exists on all
1973 cpuset-enabled cgroups.
1975 It lists the onlined memory nodes that are actually granted to
1976 this cgroup by its parent. These memory nodes are allowed to
1979 If "cpuset.mems" is empty, it shows all the memory nodes from the
1982 the memory nodes listed in "cpuset.mems" can be granted. In this
1985 Its value will be affected by memory nodes hotplug events.
1988 A read-write single value file which exists on non-root
1989 cpuset-enabled cgroups. This flag is owned by the parent cgroup
1994 "root" - a partition root
1995 "member" - a non-root member of a partition
2036 "member" Non-root member of a partition
2061 Device controller
2062 -----------------
2064 Device controller manages access to device files. It includes both
2068 Cgroup v2 device controller has no interface files and is implemented
2073 the attempt will succeed or fail with -EPERM.
2078 If the program returns 0, the attempt fails with -EPERM, otherwise
2086 ----
2088 The "rdma" controller regulates the distribution and accounting of
2095 A readwrite nested-keyed file that exists for all the cgroups
2116 A read-only file that describes current resource usage.
2125 -------
2127 The HugeTLB controller allows to limit the HugeTLB usage per control group and
2128 enforces the controller limit during page fault.
2142 A read-only flat-keyed file which exists on non-root cgroups.
2153 ----
2158 perf_event controller, if not mounted on a legacy hierarchy, is
2160 always be filtered by cgroup v2 path. The controller can still be
2164 Non-normative information
2165 -------------------------
2171 CPU controller root cgroup process behaviour
2181 appropriately so the neutral - nice 0 - value is 100 instead of 1024).
2184 IO controller root cgroup process behaviour
2197 ------
2216 The path '/batchjobs/container_id1' can be considered as system-data
2221 # ls -l /proc/self/ns/cgroup
2222 lrwxrwxrwx 1 root root 0 2014-07-15 10:37 /proc/self/ns/cgroup -> cgroup:[4026531835]
2228 # ls -l /proc/self/ns/cgroup
2229 lrwxrwxrwx 1 root root 0 2014-07-15 10:35 /proc/self/ns/cgroup -> cgroup:[4026532183]
2233 When some thread from a multi-threaded process unshares its cgroup
2245 ------------------
2256 # ~/unshare -c # unshare cgroupns in some cgroup
2264 Each process gets its namespace-specific view of "/proc/$PID/cgroup"
2295 ----------------------
2324 ---------------------------------
2327 running inside a non-init cgroup namespace::
2329 # mount -t cgroup2 none $MOUNT_POINT
2336 the view of cgroup hierarchy by namespace-private cgroupfs mount
2349 --------------------------------
2352 address_space_operations->writepage[s]() to annotate bio's using the
2369 super_block by setting SB_I_CGROUPWB in ->s_iflags. This allows for
2386 - Multiple hierarchies including named ones are not supported.
2388 - All v1 mount options are not supported.
2390 - The "tasks" file is removed and "cgroup.procs" is not sorted.
2392 - "cgroup.clone_children" is removed.
2394 - /proc/cgroups is meaningless for v2. Use "cgroup.controllers" file
2402 --------------------
2408 For example, as there is only one instance of each controller, utility
2415 the specific controller.
2419 each controller on its own hierarchy. Only closely related ones, such
2438 Also, as a controller couldn't have any expectation regarding the
2440 controller had to assume that all other controllers were attached to
2447 depending on the specific controller. In other words, hierarchy may
2450 how memory is distributed beyond a certain level while still wanting
2455 ------------------
2463 Generally, in-process knowledge is available only to the process
2464 itself; thus, unlike service-level organization of processes,
2471 sub-hierarchies and control resource distributions along them. This
2472 effectively raised cgroup to the status of a syscall-like API exposed
2482 that the process would actually be operating on its own sub-hierarchy.
2486 system-management pseudo filesystem. cgroup ended up with interface
2489 individual applications through the ill-defined delegation mechanism
2499 -------------------------------------------
2507 The cpu controller considered threads and cgroups as equivalents and
2510 cycles and the number of internal threads fluctuated - the ratios
2516 The io controller implicitly created a hidden leaf node for each
2524 The memory controller didn't have a way to control what happened
2526 clearly defined. There were attempts to add ad-hoc behaviors and
2540 ----------------------
2544 was how an empty cgroup was notified - a userland helper binary was
2547 to in-kernel event delivery filtering mechanism further complicating
2550 Controller interfaces were problematic too. An extreme example is
2562 formats and units even in the same controller.
2568 Controller Issues and Remedies
2569 ------------------------------
2571 Memory subsection
2576 global reclaim prefers is opt-in, rather than opt-out. The costs for
2586 becomes self-defeating.
2588 The memory.low boundary on the other hand is a top-down allocated
2597 available memory. The memory consumption of workloads varies during
2605 The memory.high boundary on the other hand can be set much more
2611 and make corrections until the minimal memory footprint that still
2618 system than killing the group. Otherwise, memory.max is there to
2622 Setting the original memory.limit_in_bytes below the current usage was
2624 limit setting to fail. memory.max on the other hand will first set the
2626 new limit is met - or the task writing to memory.max is killed.
2628 The combined memory+swap accounting and limiting is replaced by real
2631 The main argument for a combined memory+swap facility in the original
2633 able to swap all anonymous memory of a child group, regardless of the
2635 groups can sabotage swapping by other means - such as referencing its
2636 anonymous memory in a tight loop - and an admin can not assume full