1*f0ba4377SMauro Carvalho Chehab============================= 2f2836352SJoe ThornberGuidance for writing policies 3f2836352SJoe Thornber============================= 4f2836352SJoe Thornber 5f2836352SJoe ThornberTry to keep transactionality out of it. The core is careful to 6f2836352SJoe Thornberavoid asking about anything that is migrating. This is a pain, but 7f2836352SJoe Thornbermakes it easier to write the policies. 8f2836352SJoe Thornber 9f2836352SJoe ThornberMappings are loaded into the policy at construction time. 10f2836352SJoe Thornber 11f2836352SJoe ThornberEvery bio that is mapped by the target is referred to the policy. 12f2836352SJoe ThornberThe policy can return a simple HIT or MISS or issue a migration. 13f2836352SJoe Thornber 14f2836352SJoe ThornberCurrently there's no way for the policy to issue background work, 15492d48dbSMike Snitzere.g. to start writing back dirty blocks that are going to be evicted 16f2836352SJoe Thornbersoon. 17f2836352SJoe Thornber 18f2836352SJoe ThornberBecause we map bios, rather than requests it's easy for the policy 19f2836352SJoe Thornberto get fooled by many small bios. For this reason the core target 20f2836352SJoe Thornberissues periodic ticks to the policy. It's suggested that the policy 21f2836352SJoe Thornberdoesn't update states (eg, hit counts) for a block more than once 22f2836352SJoe Thornberfor each tick. The core ticks by watching bios complete, and so 23f2836352SJoe Thornbertrying to see when the io scheduler has let the ios run. 24f2836352SJoe Thornber 25f2836352SJoe Thornber 26f2836352SJoe ThornberOverview of supplied cache replacement policies 27f2836352SJoe Thornber=============================================== 28f2836352SJoe Thornber 29bccab6a0SMike Snitzermultiqueue (mq) 30bccab6a0SMike Snitzer--------------- 31f2836352SJoe Thornber 329ed84698SJoe ThornberThis policy is now an alias for smq (see below). 33f2836352SJoe Thornber 34*f0ba4377SMauro Carvalho ChehabThe following tunables are accepted, but have no effect:: 3501911c19SJoe Thornber 3678e03d69SJoe Thornber 'sequential_threshold <#nr_sequential_ios>' 3778e03d69SJoe Thornber 'random_threshold <#nr_random_ios>' 3878e03d69SJoe Thornber 'read_promote_adjustment <value>' 3978e03d69SJoe Thornber 'write_promote_adjustment <value>' 4078e03d69SJoe Thornber 'discard_promote_adjustment <value>' 41f2836352SJoe Thornber 42bccab6a0SMike SnitzerStochastic multiqueue (smq) 43bccab6a0SMike Snitzer--------------------------- 44bccab6a0SMike Snitzer 45bccab6a0SMike SnitzerThis policy is the default. 46bccab6a0SMike Snitzer 47bccab6a0SMike SnitzerThe stochastic multi-queue (smq) policy addresses some of the problems 48bccab6a0SMike Snitzerwith the multiqueue (mq) policy. 49bccab6a0SMike Snitzer 50bccab6a0SMike SnitzerThe smq policy (vs mq) offers the promise of less memory utilization, 51bccab6a0SMike Snitzerimproved performance and increased adaptability in the face of changing 52492d48dbSMike Snitzerworkloads. smq also does not have any cumbersome tuning knobs. 53bccab6a0SMike Snitzer 54bccab6a0SMike SnitzerUsers may switch from "mq" to "smq" simply by appropriately reloading a 55bccab6a0SMike SnitzerDM table that is using the cache target. Doing so will cause all of the 56bccab6a0SMike Snitzermq policy's hints to be dropped. Also, performance of the cache may 57bccab6a0SMike Snitzerdegrade slightly until smq recalculates the origin device's hotspots 58bccab6a0SMike Snitzerthat should be cached. 59bccab6a0SMike Snitzer 60*f0ba4377SMauro Carvalho ChehabMemory usage 61*f0ba4377SMauro Carvalho Chehab^^^^^^^^^^^^ 62*f0ba4377SMauro Carvalho Chehab 63492d48dbSMike SnitzerThe mq policy used a lot of memory; 88 bytes per cache block on a 64 64bccab6a0SMike Snitzerbit machine. 65bccab6a0SMike Snitzer 6667721046Smulhernsmq uses 28bit indexes to implement its data structures rather than 67bccab6a0SMike Snitzerpointers. It avoids storing an explicit hit count for each block. It 68492d48dbSMike Snitzerhas a 'hotspot' queue, rather than a pre-cache, which uses a quarter of 69bccab6a0SMike Snitzerthe entries (each hotspot block covers a larger area than a single 70bccab6a0SMike Snitzercache block). 71bccab6a0SMike Snitzer 72492d48dbSMike SnitzerAll this means smq uses ~25bytes per cache block. Still a lot of 73bccab6a0SMike Snitzermemory, but a substantial improvement nontheless. 74bccab6a0SMike Snitzer 75*f0ba4377SMauro Carvalho ChehabLevel balancing 76*f0ba4377SMauro Carvalho Chehab^^^^^^^^^^^^^^^ 77*f0ba4377SMauro Carvalho Chehab 78492d48dbSMike Snitzermq placed entries in different levels of the multiqueue structures 79492d48dbSMike Snitzerbased on their hit count (~ln(hit count)). This meant the bottom 80492d48dbSMike Snitzerlevels generally had the most entries, and the top ones had very 81492d48dbSMike Snitzerfew. Having unbalanced levels like this reduced the efficacy of the 82bccab6a0SMike Snitzermultiqueue. 83bccab6a0SMike Snitzer 84492d48dbSMike Snitzersmq does not maintain a hit count, instead it swaps hit entries with 85bccab6a0SMike Snitzerthe least recently used entry from the level above. The overall 86bccab6a0SMike Snitzerordering being a side effect of this stochastic process. With this 87bccab6a0SMike Snitzerscheme we can decide how many entries occupy each multiqueue level, 88bccab6a0SMike Snitzerresulting in better promotion/demotion decisions. 89bccab6a0SMike Snitzer 90bccab6a0SMike SnitzerAdaptability: 91492d48dbSMike SnitzerThe mq policy maintained a hit count for each cache block. For a 9267721046Smulherndifferent block to get promoted to the cache its hit count has to 93492d48dbSMike Snitzerexceed the lowest currently in the cache. This meant it could take a 94bccab6a0SMike Snitzerlong time for the cache to adapt between varying IO patterns. 95bccab6a0SMike Snitzer 96492d48dbSMike Snitzersmq doesn't maintain hit counts, so a lot of this problem just goes 97bccab6a0SMike Snitzeraway. In addition it tracks performance of the hotspot queue, which 98bccab6a0SMike Snitzeris used to decide which blocks to promote. If the hotspot queue is 99bccab6a0SMike Snitzerperforming badly then it starts moving entries more quickly between 100bccab6a0SMike Snitzerlevels. This lets it adapt to new IO patterns very quickly. 101bccab6a0SMike Snitzer 102*f0ba4377SMauro Carvalho ChehabPerformance 103*f0ba4377SMauro Carvalho Chehab^^^^^^^^^^^ 104*f0ba4377SMauro Carvalho Chehab 105492d48dbSMike SnitzerTesting smq shows substantially better performance than mq. 106bccab6a0SMike Snitzer 1078735a813SHeinz Mauelshagencleaner 1088735a813SHeinz Mauelshagen------- 1098735a813SHeinz Mauelshagen 1108735a813SHeinz MauelshagenThe cleaner writes back all dirty blocks in a cache to decommission it. 1118735a813SHeinz Mauelshagen 112f2836352SJoe ThornberExamples 113f2836352SJoe Thornber======== 114f2836352SJoe Thornber 115*f0ba4377SMauro Carvalho ChehabThe syntax for a table is:: 116*f0ba4377SMauro Carvalho Chehab 117f2836352SJoe Thornber cache <metadata dev> <cache dev> <origin dev> <block size> 118f2836352SJoe Thornber <#feature_args> [<feature arg>]* 119f2836352SJoe Thornber <policy> <#policy_args> [<policy arg>]* 120f2836352SJoe Thornber 121*f0ba4377SMauro Carvalho ChehabThe syntax to send a message using the dmsetup command is:: 122*f0ba4377SMauro Carvalho Chehab 123f2836352SJoe Thornber dmsetup message <mapped device> 0 sequential_threshold 1024 124f2836352SJoe Thornber dmsetup message <mapped device> 0 random_threshold 8 125f2836352SJoe Thornber 126*f0ba4377SMauro Carvalho ChehabUsing dmsetup:: 127*f0ba4377SMauro Carvalho Chehab 128f2836352SJoe Thornber dmsetup create blah --table "0 268435456 cache /dev/sdb /dev/sdc \ 129f2836352SJoe Thornber /dev/sdd 512 0 mq 4 sequential_threshold 1024 random_threshold 8" 130f2836352SJoe Thornber creates a 128GB large mapped device named 'blah' with the 131f2836352SJoe Thornber sequential threshold set to 1024 and the random_threshold set to 8. 132