1======== 2zsmalloc 3======== 4 5This allocator is designed for use with zram. Thus, the allocator is 6supposed to work well under low memory conditions. In particular, it 7never attempts higher order page allocation which is very likely to 8fail under memory pressure. On the other hand, if we just use single 9(0-order) pages, it would suffer from very high fragmentation -- 10any object of size PAGE_SIZE/2 or larger would occupy an entire page. 11This was one of the major issues with its predecessor (xvmalloc). 12 13To overcome these issues, zsmalloc allocates a bunch of 0-order pages 14and links them together using various 'struct page' fields. These linked 15pages act as a single higher-order page i.e. an object can span 0-order 16page boundaries. The code refers to these linked pages as a single entity 17called zspage. 18 19For simplicity, zsmalloc can only allocate objects of size up to PAGE_SIZE 20since this satisfies the requirements of all its current users (in the 21worst case, page is incompressible and is thus stored "as-is" i.e. in 22uncompressed form). For allocation requests larger than this size, failure 23is returned (see zs_malloc). 24 25Additionally, zs_malloc() does not return a dereferenceable pointer. 26Instead, it returns an opaque handle (unsigned long) which encodes actual 27location of the allocated object. The reason for this indirection is that 28zsmalloc does not keep zspages permanently mapped since that would cause 29issues on 32-bit systems where the VA region for kernel space mappings 30is very small. So, using the allocated memory should be done through the 31proper handle-based APIs. 32 33stat 34==== 35 36With CONFIG_ZSMALLOC_STAT, we could see zsmalloc internal information via 37``/sys/kernel/debug/zsmalloc/<user name>``. Here is a sample of stat output:: 38 39 # cat /sys/kernel/debug/zsmalloc/zram0/classes 40 41 class size 10% 20% 30% 40% 50% 60% 70% 80% 90% 99% 100% obj_allocated obj_used pages_used pages_per_zspage freeable 42 ... 43 ... 44 30 512 0 12 4 1 0 1 0 0 1 0 414 3464 3346 433 1 14 45 31 528 2 7 2 2 1 0 1 0 0 2 117 4154 3793 536 4 44 46 32 544 6 3 4 1 2 1 0 0 0 1 260 4170 3965 556 2 26 47 ... 48 ... 49 50 51class 52 index 53size 54 object size zspage stores 5510% 56 the number of zspages with usage ratio less than 10% (see below) 5720% 58 the number of zspages with usage ratio between 10% and 20% 5930% 60 the number of zspages with usage ratio between 20% and 30% 6140% 62 the number of zspages with usage ratio between 30% and 40% 6350% 64 the number of zspages with usage ratio between 40% and 50% 6560% 66 the number of zspages with usage ratio between 50% and 60% 6770% 68 the number of zspages with usage ratio between 60% and 70% 6980% 70 the number of zspages with usage ratio between 70% and 80% 7190% 72 the number of zspages with usage ratio between 80% and 90% 7399% 74 the number of zspages with usage ratio between 90% and 99% 75100% 76 the number of zspages with usage ratio 100% 77obj_allocated 78 the number of objects allocated 79obj_used 80 the number of objects allocated to the user 81pages_used 82 the number of pages allocated for the class 83pages_per_zspage 84 the number of 0-order pages to make a zspage 85freeable 86 the approximate number of pages class compaction can free 87 88Each zspage maintains inuse counter which keeps track of the number of 89objects stored in the zspage. The inuse counter determines the zspage's 90"fullness group" which is calculated as the ratio of the "inuse" objects to 91the total number of objects the zspage can hold (objs_per_zspage). The 92closer the inuse counter is to objs_per_zspage, the better. 93 94Internals 95========= 96 97zsmalloc has 255 size classes, each of which can hold a number of zspages. 98Each zspage can contain up to ZSMALLOC_CHAIN_SIZE physical (0-order) pages. 99The optimal zspage chain size for each size class is calculated during the 100creation of the zsmalloc pool (see calculate_zspage_chain_size()). 101 102As an optimization, zsmalloc merges size classes that have similar 103characteristics in terms of the number of pages per zspage and the number 104of objects that each zspage can store. 105 106For instance, consider the following size classes::: 107 108 class size 10% .... 100% obj_allocated obj_used pages_used pages_per_zspage freeable 109 ... 110 94 1536 0 .... 0 0 0 0 3 0 111 100 1632 0 .... 0 0 0 0 2 0 112 ... 113 114 115Size classes #95-99 are merged with size class #100. This means that when we 116need to store an object of size, say, 1568 bytes, we end up using size class 117#100 instead of size class #96. Size class #100 is meant for objects of size 1181632 bytes, so each object of size 1568 bytes wastes 1632-1568=64 bytes. 119 120Size class #100 consists of zspages with 2 physical pages each, which can 121hold a total of 5 objects. If we need to store 13 objects of size 1568, we 122end up allocating three zspages, or 6 physical pages. 123 124However, if we take a closer look at size class #96 (which is meant for 125objects of size 1568 bytes) and trace `calculate_zspage_chain_size()`, we 126find that the most optimal zspage configuration for this class is a chain 127of 5 physical pages::: 128 129 pages per zspage wasted bytes used% 130 1 960 76 131 2 352 95 132 3 1312 89 133 4 704 95 134 5 96 99 135 136This means that a class #96 configuration with 5 physical pages can store 13 137objects of size 1568 in a single zspage, using a total of 5 physical pages. 138This is more efficient than the class #100 configuration, which would use 6 139physical pages to store the same number of objects. 140 141As the zspage chain size for class #96 increases, its key characteristics 142such as pages per-zspage and objects per-zspage also change. This leads to 143dewer class mergers, resulting in a more compact grouping of classes, which 144reduces memory wastage. 145 146Let's take a closer look at the bottom of `/sys/kernel/debug/zsmalloc/zramX/classes`::: 147 148 class size 10% .... 100% obj_allocated obj_used pages_used pages_per_zspage freeable 149 150 ... 151 202 3264 0 .. 0 0 0 0 4 0 152 254 4096 0 .. 0 0 0 0 1 0 153 ... 154 155Size class #202 stores objects of size 3264 bytes and has a maximum of 4 pages 156per zspage. Any object larger than 3264 bytes is considered huge and belongs 157to size class #254, which stores each object in its own physical page (objects 158in huge classes do not share pages). 159 160Increasing the size of the chain of zspages also results in a higher watermark 161for the huge size class and fewer huge classes overall. This allows for more 162efficient storage of large objects. 163 164For zspage chain size of 8, huge class watermark becomes 3632 bytes::: 165 166 class size 10% .... 100% obj_allocated obj_used pages_used pages_per_zspage freeable 167 168 ... 169 202 3264 0 .. 0 0 0 0 4 0 170 211 3408 0 .. 0 0 0 0 5 0 171 217 3504 0 .. 0 0 0 0 6 0 172 222 3584 0 .. 0 0 0 0 7 0 173 225 3632 0 .. 0 0 0 0 8 0 174 254 4096 0 .. 0 0 0 0 1 0 175 ... 176 177For zspage chain size of 16, huge class watermark becomes 3840 bytes::: 178 179 class size 10% .... 100% obj_allocated obj_used pages_used pages_per_zspage freeable 180 181 ... 182 202 3264 0 .. 0 0 0 0 4 0 183 206 3328 0 .. 0 0 0 0 13 0 184 207 3344 0 .. 0 0 0 0 9 0 185 208 3360 0 .. 0 0 0 0 14 0 186 211 3408 0 .. 0 0 0 0 5 0 187 212 3424 0 .. 0 0 0 0 16 0 188 214 3456 0 .. 0 0 0 0 11 0 189 217 3504 0 .. 0 0 0 0 6 0 190 219 3536 0 .. 0 0 0 0 13 0 191 222 3584 0 .. 0 0 0 0 7 0 192 223 3600 0 .. 0 0 0 0 15 0 193 225 3632 0 .. 0 0 0 0 8 0 194 228 3680 0 .. 0 0 0 0 9 0 195 230 3712 0 .. 0 0 0 0 10 0 196 232 3744 0 .. 0 0 0 0 11 0 197 234 3776 0 .. 0 0 0 0 12 0 198 235 3792 0 .. 0 0 0 0 13 0 199 236 3808 0 .. 0 0 0 0 14 0 200 238 3840 0 .. 0 0 0 0 15 0 201 254 4096 0 .. 0 0 0 0 1 0 202 ... 203 204Overall the combined zspage chain size effect on zsmalloc pool configuration::: 205 206 pages per zspage number of size classes (clusters) huge size class watermark 207 4 69 3264 208 5 86 3408 209 6 93 3504 210 7 112 3584 211 8 123 3632 212 9 140 3680 213 10 143 3712 214 11 159 3744 215 12 164 3776 216 13 180 3792 217 14 183 3808 218 15 188 3840 219 16 191 3840 220 221 222A synthetic test 223---------------- 224 225zram as a build artifacts storage (Linux kernel compilation). 226 227* `CONFIG_ZSMALLOC_CHAIN_SIZE=4` 228 229 zsmalloc classes stats::: 230 231 class size 10% .... 100% obj_allocated obj_used pages_used pages_per_zspage freeable 232 233 ... 234 Total 13 .. 51 413836 412973 159955 3 235 236 zram mm_stat::: 237 238 1691783168 628083717 655175680 0 655175680 60 0 34048 34049 239 240 241* `CONFIG_ZSMALLOC_CHAIN_SIZE=8` 242 243 zsmalloc classes stats::: 244 245 class size 10% .... 100% obj_allocated obj_used pages_used pages_per_zspage freeable 246 247 ... 248 Total 18 .. 87 414852 412978 156666 0 249 250 zram mm_stat::: 251 252 1691803648 627793930 641703936 0 641703936 60 0 33591 33591 253 254Using larger zspage chains may result in using fewer physical pages, as seen 255in the example where the number of physical pages used decreased from 159955 256to 156666, at the same time maximum zsmalloc pool memory usage went down from 257655175680 to 641703936 bytes. 258 259However, this advantage may be offset by the potential for increased system 260memory pressure (as some zspages have larger chain sizes) in cases where there 261is heavy internal fragmentation and zspool compaction is unable to relocate 262objects and release zspages. In these cases, it is recommended to decrease 263the limit on the size of the zspage chains (as specified by the 264CONFIG_ZSMALLOC_CHAIN_SIZE option). 265 266Functions 267========= 268 269.. kernel-doc:: mm/zsmalloc.c 270