xref: /linux/Documentation/admin-guide/device-mapper/cache-policies.rst (revision 6cf2a73cb2bc422a03984b285a63632c27f8c4e4)
1*f0ba4377SMauro Carvalho Chehab=============================
2f2836352SJoe ThornberGuidance for writing policies
3f2836352SJoe Thornber=============================
4f2836352SJoe Thornber
5f2836352SJoe ThornberTry to keep transactionality out of it.  The core is careful to
6f2836352SJoe Thornberavoid asking about anything that is migrating.  This is a pain, but
7f2836352SJoe Thornbermakes it easier to write the policies.
8f2836352SJoe Thornber
9f2836352SJoe ThornberMappings are loaded into the policy at construction time.
10f2836352SJoe Thornber
11f2836352SJoe ThornberEvery bio that is mapped by the target is referred to the policy.
12f2836352SJoe ThornberThe policy can return a simple HIT or MISS or issue a migration.
13f2836352SJoe Thornber
14f2836352SJoe ThornberCurrently there's no way for the policy to issue background work,
15492d48dbSMike Snitzere.g. to start writing back dirty blocks that are going to be evicted
16f2836352SJoe Thornbersoon.
17f2836352SJoe Thornber
18f2836352SJoe ThornberBecause we map bios, rather than requests it's easy for the policy
19f2836352SJoe Thornberto get fooled by many small bios.  For this reason the core target
20f2836352SJoe Thornberissues periodic ticks to the policy.  It's suggested that the policy
21f2836352SJoe Thornberdoesn't update states (eg, hit counts) for a block more than once
22f2836352SJoe Thornberfor each tick.  The core ticks by watching bios complete, and so
23f2836352SJoe Thornbertrying to see when the io scheduler has let the ios run.
24f2836352SJoe Thornber
25f2836352SJoe Thornber
26f2836352SJoe ThornberOverview of supplied cache replacement policies
27f2836352SJoe Thornber===============================================
28f2836352SJoe Thornber
29bccab6a0SMike Snitzermultiqueue (mq)
30bccab6a0SMike Snitzer---------------
31f2836352SJoe Thornber
329ed84698SJoe ThornberThis policy is now an alias for smq (see below).
33f2836352SJoe Thornber
34*f0ba4377SMauro Carvalho ChehabThe following tunables are accepted, but have no effect::
3501911c19SJoe Thornber
3678e03d69SJoe Thornber	'sequential_threshold <#nr_sequential_ios>'
3778e03d69SJoe Thornber	'random_threshold <#nr_random_ios>'
3878e03d69SJoe Thornber	'read_promote_adjustment <value>'
3978e03d69SJoe Thornber	'write_promote_adjustment <value>'
4078e03d69SJoe Thornber	'discard_promote_adjustment <value>'
41f2836352SJoe Thornber
42bccab6a0SMike SnitzerStochastic multiqueue (smq)
43bccab6a0SMike Snitzer---------------------------
44bccab6a0SMike Snitzer
45bccab6a0SMike SnitzerThis policy is the default.
46bccab6a0SMike Snitzer
47bccab6a0SMike SnitzerThe stochastic multi-queue (smq) policy addresses some of the problems
48bccab6a0SMike Snitzerwith the multiqueue (mq) policy.
49bccab6a0SMike Snitzer
50bccab6a0SMike SnitzerThe smq policy (vs mq) offers the promise of less memory utilization,
51bccab6a0SMike Snitzerimproved performance and increased adaptability in the face of changing
52492d48dbSMike Snitzerworkloads.  smq also does not have any cumbersome tuning knobs.
53bccab6a0SMike Snitzer
54bccab6a0SMike SnitzerUsers may switch from "mq" to "smq" simply by appropriately reloading a
55bccab6a0SMike SnitzerDM table that is using the cache target.  Doing so will cause all of the
56bccab6a0SMike Snitzermq policy's hints to be dropped.  Also, performance of the cache may
57bccab6a0SMike Snitzerdegrade slightly until smq recalculates the origin device's hotspots
58bccab6a0SMike Snitzerthat should be cached.
59bccab6a0SMike Snitzer
60*f0ba4377SMauro Carvalho ChehabMemory usage
61*f0ba4377SMauro Carvalho Chehab^^^^^^^^^^^^
62*f0ba4377SMauro Carvalho Chehab
63492d48dbSMike SnitzerThe mq policy used a lot of memory; 88 bytes per cache block on a 64
64bccab6a0SMike Snitzerbit machine.
65bccab6a0SMike Snitzer
6667721046Smulhernsmq uses 28bit indexes to implement its data structures rather than
67bccab6a0SMike Snitzerpointers.  It avoids storing an explicit hit count for each block.  It
68492d48dbSMike Snitzerhas a 'hotspot' queue, rather than a pre-cache, which uses a quarter of
69bccab6a0SMike Snitzerthe entries (each hotspot block covers a larger area than a single
70bccab6a0SMike Snitzercache block).
71bccab6a0SMike Snitzer
72492d48dbSMike SnitzerAll this means smq uses ~25bytes per cache block.  Still a lot of
73bccab6a0SMike Snitzermemory, but a substantial improvement nontheless.
74bccab6a0SMike Snitzer
75*f0ba4377SMauro Carvalho ChehabLevel balancing
76*f0ba4377SMauro Carvalho Chehab^^^^^^^^^^^^^^^
77*f0ba4377SMauro Carvalho Chehab
78492d48dbSMike Snitzermq placed entries in different levels of the multiqueue structures
79492d48dbSMike Snitzerbased on their hit count (~ln(hit count)).  This meant the bottom
80492d48dbSMike Snitzerlevels generally had the most entries, and the top ones had very
81492d48dbSMike Snitzerfew.  Having unbalanced levels like this reduced the efficacy of the
82bccab6a0SMike Snitzermultiqueue.
83bccab6a0SMike Snitzer
84492d48dbSMike Snitzersmq does not maintain a hit count, instead it swaps hit entries with
85bccab6a0SMike Snitzerthe least recently used entry from the level above.  The overall
86bccab6a0SMike Snitzerordering being a side effect of this stochastic process.  With this
87bccab6a0SMike Snitzerscheme we can decide how many entries occupy each multiqueue level,
88bccab6a0SMike Snitzerresulting in better promotion/demotion decisions.
89bccab6a0SMike Snitzer
90bccab6a0SMike SnitzerAdaptability:
91492d48dbSMike SnitzerThe mq policy maintained a hit count for each cache block.  For a
9267721046Smulherndifferent block to get promoted to the cache its hit count has to
93492d48dbSMike Snitzerexceed the lowest currently in the cache.  This meant it could take a
94bccab6a0SMike Snitzerlong time for the cache to adapt between varying IO patterns.
95bccab6a0SMike Snitzer
96492d48dbSMike Snitzersmq doesn't maintain hit counts, so a lot of this problem just goes
97bccab6a0SMike Snitzeraway.  In addition it tracks performance of the hotspot queue, which
98bccab6a0SMike Snitzeris used to decide which blocks to promote.  If the hotspot queue is
99bccab6a0SMike Snitzerperforming badly then it starts moving entries more quickly between
100bccab6a0SMike Snitzerlevels.  This lets it adapt to new IO patterns very quickly.
101bccab6a0SMike Snitzer
102*f0ba4377SMauro Carvalho ChehabPerformance
103*f0ba4377SMauro Carvalho Chehab^^^^^^^^^^^
104*f0ba4377SMauro Carvalho Chehab
105492d48dbSMike SnitzerTesting smq shows substantially better performance than mq.
106bccab6a0SMike Snitzer
1078735a813SHeinz Mauelshagencleaner
1088735a813SHeinz Mauelshagen-------
1098735a813SHeinz Mauelshagen
1108735a813SHeinz MauelshagenThe cleaner writes back all dirty blocks in a cache to decommission it.
1118735a813SHeinz Mauelshagen
112f2836352SJoe ThornberExamples
113f2836352SJoe Thornber========
114f2836352SJoe Thornber
115*f0ba4377SMauro Carvalho ChehabThe syntax for a table is::
116*f0ba4377SMauro Carvalho Chehab
117f2836352SJoe Thornber	cache <metadata dev> <cache dev> <origin dev> <block size>
118f2836352SJoe Thornber	<#feature_args> [<feature arg>]*
119f2836352SJoe Thornber	<policy> <#policy_args> [<policy arg>]*
120f2836352SJoe Thornber
121*f0ba4377SMauro Carvalho ChehabThe syntax to send a message using the dmsetup command is::
122*f0ba4377SMauro Carvalho Chehab
123f2836352SJoe Thornber	dmsetup message <mapped device> 0 sequential_threshold 1024
124f2836352SJoe Thornber	dmsetup message <mapped device> 0 random_threshold 8
125f2836352SJoe Thornber
126*f0ba4377SMauro Carvalho ChehabUsing dmsetup::
127*f0ba4377SMauro Carvalho Chehab
128f2836352SJoe Thornber	dmsetup create blah --table "0 268435456 cache /dev/sdb /dev/sdc \
129f2836352SJoe Thornber	    /dev/sdd 512 0 mq 4 sequential_threshold 1024 random_threshold 8"
130f2836352SJoe Thornber	creates a 128GB large mapped device named 'blah' with the
131f2836352SJoe Thornber	sequential threshold set to 1024 and the random_threshold set to 8.
132