Documentation/x86/resctrl_ui.rst

1 .. SPDX-License-Identifier: GPL-2.0
9 :Authors: - Fenghua Yu <fenghua.yu@intel.com>
10           - Tony Luck <tony.luck@intel.com>
11           - Vikas Shivappa <vikas.shivappa@intel.com>
31  # mount -t resctrl resctrl [-o cdp[,cdpl2][,mba_MBps]] /sys/fs/resctrl
47 pseudo-locking is a unique way of using cache control to "pin" or
49 "Cache Pseudo-Locking".
86 		own settings for cache use which can over-ride
118 			      Corresponding region is pseudo-locked. No
138 		non-linear. This field is purely informational
149 		"per-thread":
168 		counter can be considered for re-use.
181 	mask f7 has non-consecutive 1-bits
224 	When the resource group is in pseudo-locked mode this file will
226 	pseudo-locked region.
237 	Each resource has its own line and format - see below for details.
248 	cache pseudo-locked region is created by first writing
249 	"pseudo-locksetup" to the "mode" file before writing the cache
250 	pseudo-locked region's schemata to the resource group's "schemata"
251 	file. On successful pseudo-locked region creation the mode will
252 	automatically change to "pseudo-locked".
268 -------------------------
273 1) If the task is a member of a non-default group, then the schemata
283 -------------------------
284 1) If a task is a member of a MON group, or non-default CTRL_MON group
305 are evicted and re-used while the occupancy in the new group rises as
320 max_threshold_occupancy - generic concepts
321 ------------------------------------------
327 limbo RMIDs but which are not ready to be used, user may see an -EBUSY
333 Schemata files - general concepts
334 ---------------------------------
340 ---------
352 ---------------------
359 0x3, 0x6 and 0xC are legal 4-bit masks with two bits set, but 0x5, 0x9
360 and 0xA are not.  On a system with a 20-bit mask each bit represents 5%
425 ----------------------------------------------------------------
431 ------------------------------------------------------------------
439 ------------------------
452 ------------------------------------------
460 ---------------------------------------------
468 ---------------------------------
482 Cache Pseudo-Locking
485 application can fill. Cache pseudo-locking builds on the fact that a
486 CPU can still read and write data pre-allocated outside its current
487 allocated area on a cache hit. With cache pseudo-locking, data can be
490 pseudo-locked memory is made accessible to user space where an
492 a region of memory with reduced average read latency.
494 The creation of a cache pseudo-locked region is triggered by a request
496 to be pseudo-locked. The cache pseudo-locked region is created as follows:
498 - Create a CAT allocation CLOSNEW with a CBM matching the schemata
499   from the user of the cache region that will contain the pseudo-locked
502   while the pseudo-locked region exists.
503 - Create a contiguous region of memory of the same size as the cache
505 - Flush the cache, disable hardware prefetchers, disable preemption.
506 - Make CLOSNEW the active CLOS and touch the allocated memory to load
508 - Set the previous CLOS as active.
509 - At this point the closid CLOSNEW can be released - the cache
510   pseudo-locked region is protected as long as its CBM does not appear in
511   any CAT allocation. Even though the cache pseudo-locked region will from
513   any CLOS will be able to access the memory in the pseudo-locked region since
515 - The contiguous region of memory loaded into the cache is exposed to
516   user-space as a character device.
518 Cache pseudo-locking increases the probability that data will remain
522 “locked” data from cache. Power management C-states may shrink or
523 power off cache. Deeper C-states will automatically be restricted on
524 pseudo-locked region creation.
526 It is required that an application using a pseudo-locked region runs
528 with the cache on which the pseudo-locked region resides. A sanity check
529 within the code will not allow an application to map pseudo-locked memory
531 pseudo-locked region resides. The sanity check is only done during the
535 Pseudo-locking is accomplished in two stages:
538    of cache that should be dedicated to pseudo-locking. At this time an
541 2) During the second stage a user-space application maps (mmap()) the
542    pseudo-locked memory into its address space.
544 Cache Pseudo-Locking Interface
545 ------------------------------
546 A pseudo-locked region is created using the resctrl interface as follows:
549 2) Change the new resource group's mode to "pseudo-locksetup" by writing
550    "pseudo-locksetup" to the "mode" file.
551 3) Write the schemata of the pseudo-locked region to the "schemata" file. All
555 On successful pseudo-locked region creation the "mode" file will contain
556 "pseudo-locked" and a new character device with the same name as the resource
558 by user space in order to obtain access to the pseudo-locked memory region.
560 An example of cache pseudo-locked region creation and usage can be found below.
562 Cache Pseudo-Locking Debugging Interface
563 ----------------------------------------
564 The pseudo-locking debugging interface is enabled by default (if
568 location is present in the cache. The pseudo-locking debugging interface uses
570 the pseudo-locked region:
572 1) Memory access latency using the pseudo_lock_mem_latency tracepoint. Data
574    example below). In this test the pseudo-locked region is traversed at
582 When a pseudo-locked region is created a new debugfs directory is created for
584 write-only file, pseudo_lock_measure, is present in this directory. The
585 measurement of the pseudo-locked region depends on the number written to this
589      writing "1" to the pseudo_lock_measure file will trigger the latency
604 Example of latency debugging interface
606 In this example a pseudo-locked region named "newlock" was created. Here is
607 how we can measure the latency in cycles of reading from this region and
612 …# echo 'hist:keys=latency' > /sys/kernel/debug/tracing/events/resctrl/pseudo_lock_mem_latency/trig…
620   # trigger info: hist:keys=latency:vals=hitcount:sort=hitcount:size=2048 [active]
623   { latency:        456 } hitcount:          1
624   { latency:         50 } hitcount:         83
625   { latency:         36 } hitcount:         96
626   { latency:         44 } hitcount:        174
627   { latency:         48 } hitcount:        195
628   { latency:         46 } hitcount:        262
629   { latency:         42 } hitcount:        693
630   { latency:         40 } hitcount:       3204
631   { latency:         38 } hitcount:       3484
640 In this example a pseudo-locked region named "newlock" was created on the L2
653   #                              _-----=> irqs-off
654   #                             / _----=> need-resched
655   #                            | / _---=> hardirq/softirq
656   #                            || / _--=> preempt-depth
658   #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
660   pseudo_lock_mea-1672  [002] ....  3132.860500: pseudo_lock_l2: hits=4097 miss=0
673   # mount -t resctrl resctrl /sys/fs/resctrl
706 Again two sockets, but this time with a more realistic 20-bit mask.
709 processor 1 on socket 0 on a 2-socket and dual core machine. To avoid noisy
710 neighbors, each of the two real-time tasks exclusively occupies one quarter
714   # mount -t resctrl resctrl /sys/fs/resctrl
737   # taskset -cp 1 1234
744   # taskset -cp 2 5678
753   # echo -e "L3:0=f8000;1=fffff\nMB:0=20;1=100" > p0/schemata
759   # echo -e "L3:0=f8000;1=fffff\nMB:0=20;1=100" > p0/schemata
763 A single socket system which has real-time tasks running on core 4-7 and
764 non real-time workload assigned to core 0-3. The real-time tasks share text
770   # mount -t resctrl resctrl /sys/fs/resctrl
787 Finally we move core 4-7 over to the new group and make sure that the
789 also get 50% of memory bandwidth assuming that the cores 4-7 are SMT
790 siblings and only the real time threads are scheduled on the cores 4-7.
803 system with two L2 cache instances that can be configured with an 8-bit
808   # mount -t resctrl resctrl /sys/fs/resctrl/
825   -sh: echo: write error: Invalid argument
860   -sh: echo: write error: Invalid argument
864 Example of Cache Pseudo-Locking
866 Lock portion of L2 cache from cache id 1 using CBM 0x3. Pseudo-locked
871   # mount -t resctrl resctrl /sys/fs/resctrl/
874 Ensure that there are bits available that can be pseudo-locked, since only
875 unused bits can be pseudo-locked the bits to be pseudo-locked needs to be
884 Create a new resource group that will be associated with the pseudo-locked
885 region, indicate that it will be used for a pseudo-locked region, and
886 configure the requested pseudo-locked region capacity bitmask::
889   # echo pseudo-locksetup > newlock/mode
892 On success the resource group's mode will change to pseudo-locked, the
893 bit_usage will reflect the pseudo-locked region, and the character device
894 exposing the pseudo-locked region will exist::
897   pseudo-locked
900   # ls -l /dev/pseudo_lock/newlock
901   crw------- 1 root root 243, 0 Apr  3 05:01 /dev/pseudo_lock/newlock
906   * Example code to access one page of pseudo-locked cache region
919   * cores associated with the pseudo-locked region. Here the cpu
939       exit(EXIT_FAILURE);
945       exit(EXIT_FAILURE);
953       exit(EXIT_FAILURE);
956     /* Application interacts with pseudo-locked memory @mapping */
962       exit(EXIT_FAILURE);
966     exit(EXIT_SUCCESS);
970 ----------------------------
978   1. Read the cbmmasks from each directory or the per-resource "bit_usage"
1009   $ flock -s /sys/fs/resctrl/ find /sys/fs/resctrl
1013   $ cat create-dir.sh
1015   mask = function-of(output.txt)
1019   $ flock /sys/fs/resctrl/ ./create-dir.sh
1038       exit(-1);
1050       exit(-1);
1062       exit(-1);
1071     if (fd == -1) {
1073       exit(-1);
1087 ----------------------
1094 ------------------------------------------------------------------------
1098   # mount -t resctrl resctrl /sys/fs/resctrl
1138 --------------------------------------------
1141   # mount -t resctrl resctrl /sys/fs/resctrl
1158 ---------------------------------------------------------------------
1169   # mount -t resctrl resctrl /sys/fs/resctrl
1193 -----------------------------------
1195 A single socket system which has real time tasks running on cores 4-7
1200   # mount -t resctrl resctrl /sys/fs/resctrl
1204 Move the cpus 4-7 over to p1::