1*0015190aSMike Rapoport.. _soft_dirty: 2*0015190aSMike Rapoport 3*0015190aSMike Rapoport=============== 4*0015190aSMike RapoportSoft-Dirty PTEs 5*0015190aSMike Rapoport=============== 60f8975ecSPavel Emelyanov 70f8975ecSPavel EmelyanovThe soft-dirty is a bit on a PTE which helps to track which pages a task 80f8975ecSPavel Emelyanovwrites to. In order to do this tracking one should 90f8975ecSPavel Emelyanov 100f8975ecSPavel Emelyanov 1. Clear soft-dirty bits from the task's PTEs. 110f8975ecSPavel Emelyanov 12*0015190aSMike Rapoport This is done by writing "4" into the ``/proc/PID/clear_refs`` file of the 130f8975ecSPavel Emelyanov task in question. 140f8975ecSPavel Emelyanov 150f8975ecSPavel Emelyanov 2. Wait some time. 160f8975ecSPavel Emelyanov 170f8975ecSPavel Emelyanov 3. Read soft-dirty bits from the PTEs. 180f8975ecSPavel Emelyanov 19*0015190aSMike Rapoport This is done by reading from the ``/proc/PID/pagemap``. The bit 55 of the 200f8975ecSPavel Emelyanov 64-bit qword is the soft-dirty one. If set, the respective PTE was 210f8975ecSPavel Emelyanov written to since step 1. 220f8975ecSPavel Emelyanov 230f8975ecSPavel Emelyanov 240f8975ecSPavel EmelyanovInternally, to do this tracking, the writable bit is cleared from PTEs 250f8975ecSPavel Emelyanovwhen the soft-dirty bit is cleared. So, after this, when the task tries to 260f8975ecSPavel Emelyanovmodify a page at some virtual address the #PF occurs and the kernel sets 270f8975ecSPavel Emelyanovthe soft-dirty bit on the respective PTE. 280f8975ecSPavel Emelyanov 290f8975ecSPavel EmelyanovNote, that although all the task's address space is marked as r/o after the 300f8975ecSPavel Emelyanovsoft-dirty bits clear, the #PF-s that occur after that are processed fast. 310f8975ecSPavel EmelyanovThis is so, since the pages are still mapped to physical memory, and thus all 320f8975ecSPavel Emelyanovthe kernel does is finds this fact out and puts both writable and soft-dirty 330f8975ecSPavel Emelyanovbits on the PTE. 340f8975ecSPavel Emelyanov 35d9104d1cSCyrill GorcunovWhile in most cases tracking memory changes by #PF-s is more than enough 36d9104d1cSCyrill Gorcunovthere is still a scenario when we can lose soft dirty bits -- a task 37d9104d1cSCyrill Gorcunovunmaps a previously mapped memory region and then maps a new one at exactly 38d9104d1cSCyrill Gorcunovthe same place. When unmap is called, the kernel internally clears PTE values 39d9104d1cSCyrill Gorcunovincluding soft dirty bits. To notify user space application about such 40d9104d1cSCyrill Gorcunovmemory region renewal the kernel always marks new memory regions (and 41d9104d1cSCyrill Gorcunovexpanded regions) as soft dirty. 420f8975ecSPavel Emelyanov 430f8975ecSPavel EmelyanovThis feature is actively used by the checkpoint-restore project. You 440f8975ecSPavel Emelyanovcan find more details about it on http://criu.org 450f8975ecSPavel Emelyanov 460f8975ecSPavel Emelyanov 470f8975ecSPavel Emelyanov-- Pavel Emelyanov, Apr 9, 2013 48