1========
2zsmalloc
3========
4
5This allocator is designed for use with zram. Thus, the allocator is
6supposed to work well under low memory conditions. In particular, it
7never attempts higher order page allocation which is very likely to
8fail under memory pressure. On the other hand, if we just use single
9(0-order) pages, it would suffer from very high fragmentation --
10any object of size PAGE_SIZE/2 or larger would occupy an entire page.
11This was one of the major issues with its predecessor (xvmalloc).
12
13To overcome these issues, zsmalloc allocates a bunch of 0-order pages
14and links them together using various 'struct page' fields. These linked
15pages act as a single higher-order page i.e. an object can span 0-order
16page boundaries. The code refers to these linked pages as a single entity
17called zspage.
18
19For simplicity, zsmalloc can only allocate objects of size up to PAGE_SIZE
20since this satisfies the requirements of all its current users (in the
21worst case, page is incompressible and is thus stored "as-is" i.e. in
22uncompressed form). For allocation requests larger than this size, failure
23is returned (see zs_malloc).
24
25Additionally, zs_malloc() does not return a dereferenceable pointer.
26Instead, it returns an opaque handle (unsigned long) which encodes actual
27location of the allocated object. The reason for this indirection is that
28zsmalloc does not keep zspages permanently mapped since that would cause
29issues on 32-bit systems where the VA region for kernel space mappings
30is very small. So, using the allocated memory should be done through the
31proper handle-based APIs.
32
33stat
34====
35
36With CONFIG_ZSMALLOC_STAT, we could see zsmalloc internal information via
37``/sys/kernel/debug/zsmalloc/<user name>``. Here is a sample of stat output::
38
39 # cat /sys/kernel/debug/zsmalloc/zram0/classes
40
41 class  size       10%       20%       30%       40%       50%       60%       70%       80%       90%       99%      100% obj_allocated   obj_used pages_used pages_per_zspage freeable
42    ...
43    ...
44    30   512         0        12         4         1         0         1         0         0         1         0       414          3464       3346        433                1       14
45    31   528         2         7         2         2         1         0         1         0         0         2       117          4154       3793        536                4       44
46    32   544         6         3         4         1         2         1         0         0         0         1       260          4170       3965        556                2       26
47    ...
48    ...
49
50
51class
52	index
53size
54	object size zspage stores
5510%
56	the number of zspages with usage ratio less than 10% (see below)
5720%
58	the number of zspages with usage ratio between 10% and 20%
5930%
60	the number of zspages with usage ratio between 20% and 30%
6140%
62	the number of zspages with usage ratio between 30% and 40%
6350%
64	the number of zspages with usage ratio between 40% and 50%
6560%
66	the number of zspages with usage ratio between 50% and 60%
6770%
68	the number of zspages with usage ratio between 60% and 70%
6980%
70	the number of zspages with usage ratio between 70% and 80%
7190%
72	the number of zspages with usage ratio between 80% and 90%
7399%
74	the number of zspages with usage ratio between 90% and 99%
75100%
76	the number of zspages with usage ratio 100%
77obj_allocated
78	the number of objects allocated
79obj_used
80	the number of objects allocated to the user
81pages_used
82	the number of pages allocated for the class
83pages_per_zspage
84	the number of 0-order pages to make a zspage
85freeable
86	the approximate number of pages class compaction can free
87
88Each zspage maintains inuse counter which keeps track of the number of
89objects stored in the zspage.  The inuse counter determines the zspage's
90"fullness group" which is calculated as the ratio of the "inuse" objects to
91the total number of objects the zspage can hold (objs_per_zspage). The
92closer the inuse counter is to objs_per_zspage, the better.
93
94Internals
95=========
96
97zsmalloc has 255 size classes, each of which can hold a number of zspages.
98Each zspage can contain up to ZSMALLOC_CHAIN_SIZE physical (0-order) pages.
99The optimal zspage chain size for each size class is calculated during the
100creation of the zsmalloc pool (see calculate_zspage_chain_size()).
101
102As an optimization, zsmalloc merges size classes that have similar
103characteristics in terms of the number of pages per zspage and the number
104of objects that each zspage can store.
105
106For instance, consider the following size classes:::
107
108  class  size       10%   ....    100% obj_allocated   obj_used pages_used pages_per_zspage freeable
109  ...
110     94  1536        0    ....       0             0          0          0                3        0
111    100  1632        0    ....       0             0          0          0                2        0
112  ...
113
114
115Size classes #95-99 are merged with size class #100. This means that when we
116need to store an object of size, say, 1568 bytes, we end up using size class
117#100 instead of size class #96. Size class #100 is meant for objects of size
1181632 bytes, so each object of size 1568 bytes wastes 1632-1568=64 bytes.
119
120Size class #100 consists of zspages with 2 physical pages each, which can
121hold a total of 5 objects. If we need to store 13 objects of size 1568, we
122end up allocating three zspages, or 6 physical pages.
123
124However, if we take a closer look at size class #96 (which is meant for
125objects of size 1568 bytes) and trace `calculate_zspage_chain_size()`, we
126find that the most optimal zspage configuration for this class is a chain
127of 5 physical pages:::
128
129    pages per zspage      wasted bytes     used%
130           1                  960           76
131           2                  352           95
132           3                 1312           89
133           4                  704           95
134           5                   96           99
135
136This means that a class #96 configuration with 5 physical pages can store 13
137objects of size 1568 in a single zspage, using a total of 5 physical pages.
138This is more efficient than the class #100 configuration, which would use 6
139physical pages to store the same number of objects.
140
141As the zspage chain size for class #96 increases, its key characteristics
142such as pages per-zspage and objects per-zspage also change. This leads to
143dewer class mergers, resulting in a more compact grouping of classes, which
144reduces memory wastage.
145
146Let's take a closer look at the bottom of `/sys/kernel/debug/zsmalloc/zramX/classes`:::
147
148  class  size       10%   ....    100% obj_allocated   obj_used pages_used pages_per_zspage freeable
149
150  ...
151    202  3264         0   ..         0             0          0          0                4        0
152    254  4096         0   ..         0             0          0          0                1        0
153  ...
154
155Size class #202 stores objects of size 3264 bytes and has a maximum of 4 pages
156per zspage. Any object larger than 3264 bytes is considered huge and belongs
157to size class #254, which stores each object in its own physical page (objects
158in huge classes do not share pages).
159
160Increasing the size of the chain of zspages also results in a higher watermark
161for the huge size class and fewer huge classes overall. This allows for more
162efficient storage of large objects.
163
164For zspage chain size of 8, huge class watermark becomes 3632 bytes:::
165
166  class  size       10%   ....    100% obj_allocated   obj_used pages_used pages_per_zspage freeable
167
168  ...
169    202  3264         0   ..         0             0          0          0                4        0
170    211  3408         0   ..         0             0          0          0                5        0
171    217  3504         0   ..         0             0          0          0                6        0
172    222  3584         0   ..         0             0          0          0                7        0
173    225  3632         0   ..         0             0          0          0                8        0
174    254  4096         0   ..         0             0          0          0                1        0
175  ...
176
177For zspage chain size of 16, huge class watermark becomes 3840 bytes:::
178
179  class  size       10%   ....    100% obj_allocated   obj_used pages_used pages_per_zspage freeable
180
181  ...
182    202  3264         0   ..         0             0          0          0                4        0
183    206  3328         0   ..         0             0          0          0               13        0
184    207  3344         0   ..         0             0          0          0                9        0
185    208  3360         0   ..         0             0          0          0               14        0
186    211  3408         0   ..         0             0          0          0                5        0
187    212  3424         0   ..         0             0          0          0               16        0
188    214  3456         0   ..         0             0          0          0               11        0
189    217  3504         0   ..         0             0          0          0                6        0
190    219  3536         0   ..         0             0          0          0               13        0
191    222  3584         0   ..         0             0          0          0                7        0
192    223  3600         0   ..         0             0          0          0               15        0
193    225  3632         0   ..         0             0          0          0                8        0
194    228  3680         0   ..         0             0          0          0                9        0
195    230  3712         0   ..         0             0          0          0               10        0
196    232  3744         0   ..         0             0          0          0               11        0
197    234  3776         0   ..         0             0          0          0               12        0
198    235  3792         0   ..         0             0          0          0               13        0
199    236  3808         0   ..         0             0          0          0               14        0
200    238  3840         0   ..         0             0          0          0               15        0
201    254  4096         0   ..         0             0          0          0                1        0
202  ...
203
204Overall the combined zspage chain size effect on zsmalloc pool configuration:::
205
206  pages per zspage   number of size classes (clusters)   huge size class watermark
207         4                        69                               3264
208         5                        86                               3408
209         6                        93                               3504
210         7                       112                               3584
211         8                       123                               3632
212         9                       140                               3680
213        10                       143                               3712
214        11                       159                               3744
215        12                       164                               3776
216        13                       180                               3792
217        14                       183                               3808
218        15                       188                               3840
219        16                       191                               3840
220
221
222A synthetic test
223----------------
224
225zram as a build artifacts storage (Linux kernel compilation).
226
227* `CONFIG_ZSMALLOC_CHAIN_SIZE=4`
228
229  zsmalloc classes stats:::
230
231    class  size       10%   ....    100% obj_allocated   obj_used pages_used pages_per_zspage freeable
232
233    ...
234    Total              13   ..        51        413836     412973     159955                         3
235
236  zram mm_stat:::
237
238   1691783168 628083717 655175680        0 655175680       60        0    34048    34049
239
240
241* `CONFIG_ZSMALLOC_CHAIN_SIZE=8`
242
243  zsmalloc classes stats:::
244
245    class  size       10%   ....    100% obj_allocated   obj_used pages_used pages_per_zspage freeable
246
247    ...
248    Total              18   ..        87        414852     412978     156666                         0
249
250  zram mm_stat:::
251
252    1691803648 627793930 641703936        0 641703936       60        0    33591    33591
253
254Using larger zspage chains may result in using fewer physical pages, as seen
255in the example where the number of physical pages used decreased from 159955
256to 156666, at the same time maximum zsmalloc pool memory usage went down from
257655175680 to 641703936 bytes.
258
259However, this advantage may be offset by the potential for increased system
260memory pressure (as some zspages have larger chain sizes) in cases where there
261is heavy internal fragmentation and zspool compaction is unable to relocate
262objects and release zspages. In these cases, it is recommended to decrease
263the limit on the size of the zspage chains (as specified by the
264CONFIG_ZSMALLOC_CHAIN_SIZE option).
265
266Functions
267=========
268
269.. kernel-doc:: mm/zsmalloc.c
270