Lines Matching +full:non +full:- +full:default
1 .. _cgroup-v2:
11 conventions of cgroup v2. It describes all userland-visible aspects
14 v1 is available under :ref:`Documentation/admin-guide/cgroup-v1/index.rst <cgroup-v1>`.
19 1-1. Terminology
20 1-2. What is cgroup?
22 2-1. Mounting
23 2-2. Organizing Processes and Threads
24 2-2-1. Processes
25 2-2-2. Threads
26 2-3. [Un]populated Notification
27 2-4. Controlling Controllers
28 2-4-1. Enabling and Disabling
29 2-4-2. Top-down Constraint
30 2-4-3. No Internal Process Constraint
31 2-5. Delegation
32 2-5-1. Model of Delegation
33 2-5-2. Delegation Containment
34 2-6. Guidelines
35 2-6-1. Organize Once and Control
36 2-6-2. Avoid Name Collisions
38 3-1. Weights
39 3-2. Limits
40 3-3. Protections
41 3-4. Allocations
43 4-1. Format
44 4-2. Conventions
45 4-3. Core Interface Files
47 5-1. CPU
48 5-1-1. CPU Interface Files
49 5-2. Memory
50 5-2-1. Memory Interface Files
51 5-2-2. Usage Guidelines
52 5-2-3. Memory Ownership
53 5-3. IO
54 5-3-1. IO Interface Files
55 5-3-2. Writeback
56 5-3-3. IO Latency
57 5-3-3-1. How IO Latency Throttling Works
58 5-3-3-2. IO Latency Interface Files
59 5-3-4. IO Priority
60 5-4. PID
61 5-4-1. PID Interface Files
62 5-5. Cpuset
63 5.5-1. Cpuset Interface Files
64 5-6. Device
65 5-7. RDMA
66 5-7-1. RDMA Interface Files
67 5-8. HugeTLB
68 5.8-1. HugeTLB Interface Files
69 5-9. Misc
70 5.9-1 Miscellaneous cgroup Interface Files
71 5.9-2 Migration and Ownership
72 5-10. Others
73 5-10-1. perf_event
74 5-N. Non-normative information
75 5-N-1. CPU controller root cgroup process behaviour
76 5-N-2. IO controller root cgroup process behaviour
78 6-1. Basics
79 6-2. The Root and Views
80 6-3. Migration and setns(2)
81 6-4. Interaction with Other Namespaces
83 P-1. Filesystem Support for Writeback
86 R-1. Multiple Hierarchies
87 R-2. Thread Granularity
88 R-3. Competition Between Inner Nodes and Threads
89 R-4. Other Interface Issues
90 R-5. Controller Issues and Remedies
91 R-5-1. Memory
98 -----------
107 ---------------
113 cgroup is largely composed of two parts - the core and controllers.
129 hierarchical - if a controller is enabled on a cgroup, it affects all
131 sub-hierarchy of the cgroup. When a controller is enabled on a nested
141 --------
146 # mount -t cgroup2 none $MOUNT_POINT
156 is no longer referenced in its current hierarchy. Because per-cgroup
163 to inter-controller dependencies, other controllers may need to be
184 ignored on non-init namespace mounts. Please refer to the
197 and not any subtrees. This is legacy behaviour, the default
201 option is ignored on non-init namespace mounts.
208 within those subtrees. This should have been the default
209 behavior but is a mount-option to avoid regressing setups
223 controller. The pre-allocated pool does not belong to anyone.
244 --------------------------------
250 A child cgroup can be created by creating a sub-directory::
255 structure. Each cgroup has a read-writable interface file
257 belong to the cgroup one-per-line. The PIDs are not ordered and the
288 0::/test-cgroup/test-cgroup-nested
295 0::/test-cgroup/test-cgroup-nested (deleted)
303 the threads of a group of processes. By default, all threads of a
321 constraint - threaded controllers can be enabled on non-leaf cgroups
345 - As the cgroup will join the parent's resource domain. The parent
348 - When the parent is an unthreaded domain, it must not have any domain
352 Topology-wise, a cgroup can be in an invalid state. Please consider
355 A (threaded domain) - B (threaded) - C (domain, just created)
370 threads in the cgroup. Except that the operations are per-thread
371 instead of per-process, "cgroup.threads" has the same format and
393 between threads in a non-leaf cgroup and its child cgroups. Each
399 - cpu
400 - cpuset
401 - perf_event
402 - pids
405 --------------------------
407 Each non-root cgroup has a "cgroup.events" file which contains
408 "populated" field indicating whether the cgroup's sub-hierarchy has
412 example, to start a clean-up operation after all processes of a given
413 sub-hierarchy have exited. The populated state updates and
414 notifications are recursive. Consider the following sub-hierarchy
418 A(4) - B(0) - C(1)
428 -----------------------
439 No controller is enabled by default. Controllers can be enabled and
442 # echo "+cpu +memory -io" > cgroup.subtree_control
451 Consider the following sub-hierarchy. The enabled controllers are
454 A(cpu,memory) - B(memory) - C()
468 controller interface files - anything which doesn't start with
472 Top-down Constraint
475 Resources are distributed top-down and a cgroup can further distribute
477 parent. This means that all non-root "cgroup.subtree_control" files
487 Non-root cgroups can distribute domain resources to their children
502 refer to the Non-normative information section in the Controllers
515 ----------
535 delegated, the user can build sub-hierarchy under the directory,
539 happens in the delegated sub-hierarchy, nothing can escape the
543 cgroups in or nesting depth of a delegated sub-hierarchy; however,
550 A delegated sub-hierarchy is contained in the sense that processes
551 can't be moved into or out of the sub-hierarchy by the delegatee.
554 requiring the following conditions for a process with a non-root euid
558 - The writer must have write access to the "cgroup.procs" file.
560 - The writer must have write access to the "cgroup.procs" file of the
564 processes around freely in the delegated sub-hierarchy it can't pull
565 in from or push out to outside the sub-hierarchy.
571 ~~~~~~~~~~~~~ - C0 - C00
574 ~~~~~~~~~~~~~ - C1 - C10
581 will be denied with -EACCES.
586 is not reachable, the migration is rejected with -ENOENT.
590 ----------
598 inherent trade-offs between migration and various hot paths in terms
604 resource structure once on start-up. Dynamic adjustments to resource
637 -------
643 work-conserving. Due to the dynamic nature, this model is usually
646 All weights are in the range [1, 10000] with the default at 100. This
658 .. _cgroupv2-limits-distributor:
661 ------
664 Limits can be over-committed - the sum of the limits of children can
669 As limits can be over-committed, all configuration combinations are
676 .. _cgroupv2-protections-distributor:
679 -----------
684 soft boundaries. Protections can also be over-committed in which case
691 As protections can be over-committed, all configuration combinations
695 "memory.low" implements best-effort memory protection and is an
700 -----------
703 resource. Allocations can't be over-committed - the sum of the
710 As allocations can't be over-committed, some configuration
715 "cpu.rt.max" hard-allocates realtime slices and is an example of this
723 ------
728 New-line separated values
736 (when read-only or multiple values can be written at once)
762 -----------
764 - Settings for a single feature should be contained in a single file.
766 - The root cgroup should be exempt from resource control and thus
769 - The default time unit is microseconds. If a different unit is ever
772 - A parts-per quantity should use a percentage decimal with at least
773 two digit fractional part - e.g. 13.40.
775 - If a controller implements weight based resource distribution, its
777 10000] with 100 as the default. The values are chosen to allow
779 intuitive (the default is 100%).
781 - If a controller implements an absolute resource guarantee and/or
790 - If a setting has a configurable default value and keyed specific
791 overrides, the default entry should be keyed with "default" and
794 The default value can be updated by writing either "default $VAL" or
797 When writing to update a specific override, "default" can be used as
799 with "default" as the value must not appear when read.
804 # cat cgroup-example-interface-file
805 default 150
808 The default value can be updated by::
810 # echo 125 > cgroup-example-interface-file
814 # echo "default 125" > cgroup-example-interface-file
818 # echo "8:16 170" > cgroup-example-interface-file
822 # echo "8:0 default" > cgroup-example-interface-file
823 # cat cgroup-example-interface-file
824 default 125
827 - For events which are not very high frequency, an interface file
834 --------------------
839 A read-write single value file which exists on non-root
845 - "domain" : A normal valid domain cgroup.
847 - "domain threaded" : A threaded domain cgroup which is
850 - "domain invalid" : A cgroup which is in an invalid state.
854 - "threaded" : A threaded cgroup which is a member of a
861 A read-write new-line separated values file which exists on
865 the cgroup one-per-line. The PIDs are not ordered and the
874 - It must have write access to the "cgroup.procs" file.
876 - It must have write access to the "cgroup.procs" file of the
879 When delegating a sub-hierarchy, write access to this file
887 A read-write new-line separated values file which exists on
891 the cgroup one-per-line. The TIDs are not ordered and the
900 - It must have write access to the "cgroup.threads" file.
902 - The cgroup that the thread is currently in must be in the
905 - It must have write access to the "cgroup.procs" file of the
908 When delegating a sub-hierarchy, write access to this file
912 A read-only space separated values file which exists on all
919 A read-write space separated values file which exists on all
926 Space separated list of controllers prefixed with '+' or '-'
928 name prefixed with '+' enables the controller and '-'
934 A read-only flat-keyed file which exists on non-root cgroups.
946 A read-write single value files. The default is "max".
953 A read-write single value files. The default is "max".
960 A read-only flat-keyed file with the following entries:
978 A read-write single value file which exists on non-root cgroups.
979 Allowed values are "0" and "1". The default is "0".
1001 create new sub-cgroups.
1004 A write-only single value file which exists in non-root cgroups.
1016 the whole thread-group.
1019 A read-write single value file that allowed values are "0" and "1".
1020 The default is "1".
1023 Writing "1" to the file will re-enable the cgroup PSI accounting.
1031 This may cause non-negligible overhead for some workloads when under
1033 be used to disable PSI accounting in the non-leaf cgroups.
1036 A read-write nested-keyed file.
1044 .. _cgroup-v2-cpu:
1047 ---
1075 A read-only flat-keyed file.
1080 - usage_usec
1081 - user_usec
1082 - system_usec
1086 - nr_periods
1087 - nr_throttled
1088 - throttled_usec
1089 - nr_bursts
1090 - burst_usec
1093 A read-write single value file which exists on non-root
1094 cgroups. The default is "100".
1096 For non idle groups (cpu.idle = 0), the weight is in the
1103 A read-write single value file which exists on non-root
1104 cgroups. The default is "0".
1106 The nice value is in the range [-20, 19].
1115 A read-write two value file which exists on non-root cgroups.
1116 The default is "max 100000".
1127 A read-write single value file which exists on non-root
1128 cgroups. The default is "0".
1133 A read-write nested-keyed file.
1139 A read-write single value file which exists on non-root cgroups.
1140 The default is "0", i.e. no utilization boosting.
1154 A read-write single value file which exists on non-root cgroups.
1155 The default is "max". i.e. no utilization capping
1165 A read-write single value file which exists on non-root cgroups.
1166 The default is 0.
1168 This is the cgroup analog of the per-task SCHED_IDLE sched policy.
1177 ------
1185 While not completely water-tight, all major memory usages by a given
1190 - Userland memory - page cache and anonymous memory.
1192 - Kernel data structures such as dentries and inodes.
1194 - TCP socket buffers.
1207 A read-only single value file which exists on non-root
1214 A read-write single value file which exists on non-root
1215 cgroups. The default is "0".
1240 A read-write single value file which exists on non-root
1241 cgroups. The default is "0".
1243 Best-effort memory protection. If the memory usage of a
1263 A read-write single value file which exists on non-root
1264 cgroups. The default is "max".
1277 A read-write single value file which exists on non-root
1278 cgroups. The default is "max".
1286 In default configuration regular 0-order allocations always
1291 as -ENOMEM or silently ignore in cases like disk readahead.
1294 A write-only nested-keyed file which exists for all cgroups.
1312 specified amount, -EAGAIN is returned.
1322 A read-only single value file which exists on non-root
1329 A read-write single value file which exists on non-root
1330 cgroups. The default value is "0".
1339 Tasks with the OOM protection (oom_score_adj set to -1000)
1347 A read-only flat-keyed file which exists on non-root cgroups.
1361 boundary is over-committed.
1381 considered as an option, e.g. for failed high-order
1397 A read-only flat-keyed file which exists on non-root cgroups.
1400 types of memory, type-specific details, and other information
1409 If the entry has no per-node counter (or not show in the
1410 memory.numa_stat). We use 'npn' (non-per-node) as the tag
1438 Amount of memory used for storing per-cpu kernel
1448 Amount of cached filesystem data that is swap-backed,
1485 Amount of memory, swap-backed and filesystem-backed,
1491 the value for the foo counter, since the foo counter is type-based, not
1492 list-based.
1503 Amount of memory used for storing in-kernel data
1595 A read-only nested-keyed file which exists on non-root cgroups.
1598 types of memory, type-specific details, and other information
1620 A read-only single value file which exists on non-root
1627 A read-write single value file which exists on non-root
1628 cgroups. The default is "max".
1632 allow userspace to implement custom out-of-memory procedures.
1643 A read-only single value file which exists on non-root
1650 A read-write single value file which exists on non-root
1651 cgroups. The default is "max".
1657 A read-only flat-keyed file which exists on non-root cgroups.
1673 because of running out of swap system-wide or max
1682 A read-only single value file which exists on non-root
1689 A read-write single value file which exists on non-root
1690 cgroups. The default is "max".
1697 A read-write single value file. The default value is "1". The
1712 A read-only nested-keyed file.
1722 Over-committing on high limit (sum of high limits > available memory)
1736 pressure - how much the workload is being impacted due to lack of
1737 memory - is necessary to determine whether a workload needs more
1751 To which cgroup the area will be charged is in-deterministic; however,
1762 --
1767 only if cfq-iosched is in use and neither scheme is available for
1768 blk-mq devices.
1775 A read-only nested-keyed file.
1795 A read-write nested-keyed file which exists only on the root
1807 enable Weight-based control enable
1817 The controller is disabled by default and can be enabled by
1818 setting "enable" to 1. "rpct" and "wpct" parameters default
1839 devices which show wide temporary behavior changes - e.g. a
1850 A read-write nested-keyed file which exists only on the root
1863 model The cost model in use - "linear"
1889 generate device-specific coefficients.
1892 A read-write flat-keyed file which exists on non-root cgroups.
1893 The default is "default 100".
1895 The first line is the default weight applied to devices
1901 The default weight can be updated by writing either "default
1903 "$MAJ:$MIN $WEIGHT" and unset by writing "$MAJ:$MIN default".
1907 default 100
1912 A read-write nested-keyed file which exists on non-root
1926 When writing, any number of nested key-value pairs can be
1951 A read-only nested-keyed file.
1970 writes out dirty pages for the memory domain. Both system-wide and
1971 per-cgroup dirty memory states are examined and the more restrictive
2009 memory controller and system-wide clean memory.
2042 your real setting, setting at 10-15% higher than the value in io.stat.
2052 - Queue depth throttling. This is the number of outstanding IO's a group is
2056 - Artificial delay induction. There are certain types of IO that cannot be
2103 no-change
2106 promote-to-rt
2107 For requests that have a non-RT I/O priority class, change it into RT.
2111 restrict-to-be
2121 none-to-rt
2122 Deprecated. Just an alias for promote-to-rt.
2126 +----------------+---+
2127 | no-change | 0 |
2128 +----------------+---+
2129 | promote-to-rt | 1 |
2130 +----------------+---+
2131 | restrict-to-be | 2 |
2132 +----------------+---+
2134 +----------------+---+
2138 +-------------------------------+---+
2140 +-------------------------------+---+
2141 | IOPRIO_CLASS_RT (real-time) | 1 |
2142 +-------------------------------+---+
2144 +-------------------------------+---+
2146 +-------------------------------+---+
2150 - If I/O priority class policy is promote-to-rt, change the request I/O
2153 - If I/O priority class policy is not promote-to-rt, translate the I/O priority
2159 ---
2178 A read-write single value file which exists on non-root
2179 cgroups. The default is "max".
2184 A read-only single value file which exists on all cgroups.
2194 through fork() or clone(). These will return -EAGAIN if the creation
2199 ------
2206 memory placement to reduce cross-node memory access and contention
2217 A read-write multiple values file which exists on non-root
2218 cpuset-enabled cgroups.
2225 The CPU numbers are comma-separated numbers or ranges.
2229 0-4,6,8-10
2232 setting as the nearest cgroup ancestor with a non-empty
2239 A read-only multiple values file which exists on all
2240 cpuset-enabled cgroups.
2256 A read-write multiple values file which exists on non-root
2257 cpuset-enabled cgroups.
2264 The memory node numbers are comma-separated numbers or ranges.
2268 0-1,3
2271 setting as the nearest cgroup ancestor with a non-empty
2278 Setting a non-empty value to "cpuset.mems" causes memory of
2290 A read-only multiple values file which exists on all
2291 cpuset-enabled cgroups.
2306 A read-write multiple values file which exists on non-root
2307 cpuset-enabled cgroups.
2336 A read-only multiple values file which exists on all non-root
2337 cpuset-enabled cgroups.
2349 A read-only and root cgroup only multiple values file.
2356 A read-write single value file which exists on non-root
2357 cpuset-enabled cgroups. This flag is owned by the parent cgroup
2363 "member" Non-root member of a partition
2368 A cpuset partition is a collection of cpuset-enabled cgroups with
2375 There are two types of partitions - local and remote. A local
2391 be changed. All other non-root cgroups start out as "member".
2404 two possible states - valid or invalid. An invalid partition
2415 "member" Non-root member of a partition
2442 A valid non-root parent partition may distribute out all its CPUs
2461 A user can pre-configure certain CPUs to an isolated state
2468 -----------------
2479 on the return value the attempt will succeed or fail with -EPERM.
2484 If the program returns 0, the attempt fails with -EPERM, otherwise it
2492 ----
2501 A readwrite nested-keyed file that exists for all the cgroups
2522 A read-only file that describes current resource usage.
2531 -------
2545 The default value is "max". It exists for all the cgroup except root.
2548 A read-only flat-keyed file which exists on non-root cgroups.
2561 use hugetlb pages are included. The per-node values are in bytes.
2564 ----
2586 A read-only flat-keyed file shown only in the root cgroup. It shows
2595 A read-only flat-keyed file shown in the all cgroups. It shows
2603 A read-write flat-keyed file shown in the non root cgroups. Allowed
2622 A read-only flat-keyed file which exists on non-root cgroups. The
2640 ------
2651 Non-normative information
2652 -------------------------
2668 appropriately so the neutral - nice 0 - value is 100 instead of 1024).
2684 ------
2703 The path '/batchjobs/container_id1' can be considered as system-data
2708 # ls -l /proc/self/ns/cgroup
2709 lrwxrwxrwx 1 root root 0 2014-07-15 10:37 /proc/self/ns/cgroup -> cgroup:[4026531835]
2715 # ls -l /proc/self/ns/cgroup
2716 lrwxrwxrwx 1 root root 0 2014-07-15 10:35 /proc/self/ns/cgroup -> cgroup:[4026532183]
2720 When some thread from a multi-threaded process unshares its cgroup
2732 ------------------
2743 # ~/unshare -c # unshare cgroupns in some cgroup
2751 Each process gets its namespace-specific view of "/proc/$PID/cgroup"
2782 ----------------------
2811 ---------------------------------
2814 running inside a non-init cgroup namespace::
2816 # mount -t cgroup2 none $MOUNT_POINT
2823 the view of cgroup hierarchy by namespace-private cgroupfs mount
2836 --------------------------------
2839 address_space_operations->writepage[s]() to annotate bio's using the
2856 super_block by setting SB_I_CGROUPWB in ->s_iflags. This allows for
2873 - Multiple hierarchies including named ones are not supported.
2875 - All v1 mount options are not supported.
2877 - The "tasks" file is removed and "cgroup.procs" is not sorted.
2879 - "cgroup.clone_children" is removed.
2881 - /proc/cgroups is meaningless for v2. Use "cgroup.controllers" file
2889 --------------------
2942 ------------------
2950 Generally, in-process knowledge is available only to the process
2951 itself; thus, unlike service-level organization of processes,
2958 sub-hierarchies and control resource distributions along them. This
2959 effectively raised cgroup to the status of a syscall-like API exposed
2969 that the process would actually be operating on its own sub-hierarchy.
2973 system-management pseudo filesystem. cgroup ended up with interface
2976 individual applications through the ill-defined delegation mechanism
2986 -------------------------------------------
2997 cycles and the number of internal threads fluctuated - the ratios
3013 clearly defined. There were attempts to add ad-hoc behaviors and
3027 ----------------------
3031 was how an empty cgroup was notified - a userland helper binary was
3034 to in-kernel event delivery filtering mechanism further complicating
3056 ------------------------------
3062 that is per default unset. As a result, the set of cgroups that
3063 global reclaim prefers is opt-in, rather than opt-out. The costs for
3073 becomes self-defeating.
3075 The memory.low boundary on the other hand is a top-down allocated
3113 new limit is met - or the task writing to memory.max is killed.
3122 groups can sabotage swapping by other means - such as referencing its
3123 anonymous memory in a tight loop - and an admin can not assume full