xref: /linux/Documentation/mm/page_frags.rst (revision ee86588960e259760236a537323d5c7d945e378c)
14a832588SMike Rapoport==============
24d09d0f4SAlexander DuyckPage fragments
34a832588SMike Rapoport==============
44d09d0f4SAlexander Duyck
54d09d0f4SAlexander DuyckA page fragment is an arbitrary-length arbitrary-offset area of memory
64d09d0f4SAlexander Duyckwhich resides within a 0 or higher order compound page.  Multiple
74d09d0f4SAlexander Duyckfragments within that page are individually refcounted, in the page's
84d09d0f4SAlexander Duyckreference counter.
94d09d0f4SAlexander Duyck
104d09d0f4SAlexander DuyckThe page_frag functions, page_frag_alloc and page_frag_free, provide a
114d09d0f4SAlexander Duycksimple allocation framework for page fragments.  This is used by the
124d09d0f4SAlexander Duycknetwork stack and network device drivers to provide a backing region of
134d09d0f4SAlexander Duyckmemory for use as either an sk_buff->head, or to be used in the "frags"
144d09d0f4SAlexander Duyckportion of skb_shared_info.
154d09d0f4SAlexander Duyck
164d09d0f4SAlexander DuyckIn order to make use of the page fragment APIs a backing page fragment
174d09d0f4SAlexander Duyckcache is needed.  This provides a central point for the fragment allocation
184d09d0f4SAlexander Duyckand tracks allows multiple calls to make use of a cached page.  The
194d09d0f4SAlexander Duyckadvantage to doing this is that multiple calls to get_page can be avoided
204d09d0f4SAlexander Duyckwhich can be expensive at allocation time.  However due to the nature of
214d09d0f4SAlexander Duyckthis caching it is required that any calls to the cache be protected by
224d09d0f4SAlexander Duyckeither a per-cpu limitation, or a per-cpu limitation and forcing interrupts
234d09d0f4SAlexander Duyckto be disabled when executing the fragment allocation.
244d09d0f4SAlexander Duyck
254d09d0f4SAlexander DuyckThe network stack uses two separate caches per CPU to handle fragment
264d09d0f4SAlexander Duyckallocation.  The netdev_alloc_cache is used by callers making use of the
27*ea8fdf1aSKevin Haonetdev_alloc_frag and __netdev_alloc_skb calls.  The napi_alloc_cache is
284d09d0f4SAlexander Duyckused by callers of the __napi_alloc_frag and napi_alloc_skb calls.  The
294d09d0f4SAlexander Duyckmain difference between these two calls is the context in which they may be
304d09d0f4SAlexander Duyckcalled.  The "netdev" prefixed functions are usable in any context as these
314d09d0f4SAlexander Duyckfunctions will disable interrupts, while the "napi" prefixed functions are
324d09d0f4SAlexander Duyckonly usable within the softirq context.
334d09d0f4SAlexander Duyck
344d09d0f4SAlexander DuyckMany network device drivers use a similar methodology for allocating page
354d09d0f4SAlexander Duyckfragments, but the page fragments are cached at the ring or descriptor
364d09d0f4SAlexander Duycklevel.  In order to enable these cases it is necessary to provide a generic
374d09d0f4SAlexander Duyckway of tearing down a page cache.  For this reason __page_frag_cache_drain
384d09d0f4SAlexander Duyckwas implemented.  It allows for freeing multiple references from a single
394d09d0f4SAlexander Duyckpage via a single call.  The advantage to doing this is that it allows for
404d09d0f4SAlexander Duyckcleaning up the multiple references that were added to a page in order to
414d09d0f4SAlexander Duyckavoid calling get_page per allocation.
424d09d0f4SAlexander Duyck
434d09d0f4SAlexander DuyckAlexander Duyck, Nov 29, 2016.
44