xref: /linux/Documentation/admin-guide/mm/soft-dirty.rst (revision 3eb66e91a25497065c5322b1268cbc3953642227)
1*0015190aSMike Rapoport.. _soft_dirty:
2*0015190aSMike Rapoport
3*0015190aSMike Rapoport===============
4*0015190aSMike RapoportSoft-Dirty PTEs
5*0015190aSMike Rapoport===============
60f8975ecSPavel Emelyanov
70f8975ecSPavel EmelyanovThe soft-dirty is a bit on a PTE which helps to track which pages a task
80f8975ecSPavel Emelyanovwrites to. In order to do this tracking one should
90f8975ecSPavel Emelyanov
100f8975ecSPavel Emelyanov  1. Clear soft-dirty bits from the task's PTEs.
110f8975ecSPavel Emelyanov
12*0015190aSMike Rapoport     This is done by writing "4" into the ``/proc/PID/clear_refs`` file of the
130f8975ecSPavel Emelyanov     task in question.
140f8975ecSPavel Emelyanov
150f8975ecSPavel Emelyanov  2. Wait some time.
160f8975ecSPavel Emelyanov
170f8975ecSPavel Emelyanov  3. Read soft-dirty bits from the PTEs.
180f8975ecSPavel Emelyanov
19*0015190aSMike Rapoport     This is done by reading from the ``/proc/PID/pagemap``. The bit 55 of the
200f8975ecSPavel Emelyanov     64-bit qword is the soft-dirty one. If set, the respective PTE was
210f8975ecSPavel Emelyanov     written to since step 1.
220f8975ecSPavel Emelyanov
230f8975ecSPavel Emelyanov
240f8975ecSPavel EmelyanovInternally, to do this tracking, the writable bit is cleared from PTEs
250f8975ecSPavel Emelyanovwhen the soft-dirty bit is cleared. So, after this, when the task tries to
260f8975ecSPavel Emelyanovmodify a page at some virtual address the #PF occurs and the kernel sets
270f8975ecSPavel Emelyanovthe soft-dirty bit on the respective PTE.
280f8975ecSPavel Emelyanov
290f8975ecSPavel EmelyanovNote, that although all the task's address space is marked as r/o after the
300f8975ecSPavel Emelyanovsoft-dirty bits clear, the #PF-s that occur after that are processed fast.
310f8975ecSPavel EmelyanovThis is so, since the pages are still mapped to physical memory, and thus all
320f8975ecSPavel Emelyanovthe kernel does is finds this fact out and puts both writable and soft-dirty
330f8975ecSPavel Emelyanovbits on the PTE.
340f8975ecSPavel Emelyanov
35d9104d1cSCyrill GorcunovWhile in most cases tracking memory changes by #PF-s is more than enough
36d9104d1cSCyrill Gorcunovthere is still a scenario when we can lose soft dirty bits -- a task
37d9104d1cSCyrill Gorcunovunmaps a previously mapped memory region and then maps a new one at exactly
38d9104d1cSCyrill Gorcunovthe same place. When unmap is called, the kernel internally clears PTE values
39d9104d1cSCyrill Gorcunovincluding soft dirty bits. To notify user space application about such
40d9104d1cSCyrill Gorcunovmemory region renewal the kernel always marks new memory regions (and
41d9104d1cSCyrill Gorcunovexpanded regions) as soft dirty.
420f8975ecSPavel Emelyanov
430f8975ecSPavel EmelyanovThis feature is actively used by the checkpoint-restore project. You
440f8975ecSPavel Emelyanovcan find more details about it on http://criu.org
450f8975ecSPavel Emelyanov
460f8975ecSPavel Emelyanov
470f8975ecSPavel Emelyanov-- Pavel Emelyanov, Apr 9, 2013
48