1.. SPDX-License-Identifier: GPL-2.0 2 3================ 4FUSE Passthrough 5================ 6 7Introduction 8============ 9 10FUSE (Filesystem in Userspace) passthrough is a feature designed to improve the 11performance of FUSE filesystems for I/O operations. Typically, FUSE operations 12involve communication between the kernel and a userspace FUSE daemon, which can 13incur overhead. Passthrough allows certain operations on a FUSE file to bypass 14the userspace daemon and be executed directly by the kernel on an underlying 15"backing file". 16 17This is achieved by the FUSE daemon registering a file descriptor (pointing to 18the backing file on a lower filesystem) with the FUSE kernel module. The kernel 19then receives an identifier (``backing_id``) for this registered backing file. 20When a FUSE file is subsequently opened, the FUSE daemon can, in its response to 21the ``OPEN`` request, include this ``backing_id`` and set the 22``FOPEN_PASSTHROUGH`` flag. This establishes a direct link for specific 23operations. 24 25Currently, passthrough is supported for operations like ``read(2)``/``write(2)`` 26(via ``read_iter``/``write_iter``), ``splice(2)``, and ``mmap(2)``. 27 28Enabling Passthrough 29==================== 30 31To use FUSE passthrough: 32 33 1. The FUSE filesystem must be compiled with ``CONFIG_FUSE_PASSTHROUGH`` 34 enabled. 35 2. The FUSE daemon, during the ``FUSE_INIT`` handshake, must negotiate the 36 ``FUSE_PASSTHROUGH`` capability and specify its desired 37 ``max_stack_depth``. 38 3. The (privileged) FUSE daemon uses the ``FUSE_DEV_IOC_BACKING_OPEN`` ioctl 39 on its connection file descriptor (e.g., ``/dev/fuse``) to register a 40 backing file descriptor and obtain a ``backing_id``. 41 4. When handling an ``OPEN`` or ``CREATE`` request for a FUSE file, the daemon 42 replies with the ``FOPEN_PASSTHROUGH`` flag set in 43 ``fuse_open_out::open_flags`` and provides the corresponding ``backing_id`` 44 in ``fuse_open_out::backing_id``. 45 5. The FUSE daemon should eventually call ``FUSE_DEV_IOC_BACKING_CLOSE`` with 46 the ``backing_id`` to release the kernel's reference to the backing file 47 when it's no longer needed for passthrough setups. 48 49Privilege Requirements 50====================== 51 52Setting up passthrough functionality currently requires the FUSE daemon to 53possess the ``CAP_SYS_ADMIN`` capability. This requirement stems from several 54security and resource management considerations that are actively being 55discussed and worked on. The primary reasons for this restriction are detailed 56below. 57 58Resource Accounting and Visibility 59---------------------------------- 60 61The core mechanism for passthrough involves the FUSE daemon opening a file 62descriptor to a backing file and registering it with the FUSE kernel module via 63the ``FUSE_DEV_IOC_BACKING_OPEN`` ioctl. This ioctl returns a ``backing_id`` 64associated with a kernel-internal ``struct fuse_backing`` object, which holds a 65reference to the backing ``struct file``. 66 67A significant concern arises because the FUSE daemon can close its own file 68descriptor to the backing file after registration. The kernel, however, will 69still hold a reference to the ``struct file`` via the ``struct fuse_backing`` 70object as long as it's associated with a ``backing_id`` (or subsequently, with 71an open FUSE file in passthrough mode). 72 73This behavior leads to two main issues for unprivileged FUSE daemons: 74 75 1. **Invisibility to lsof and other inspection tools**: Once the FUSE 76 daemon closes its file descriptor, the open backing file held by the kernel 77 becomes "hidden." Standard tools like ``lsof``, which typically inspect 78 process file descriptor tables, would not be able to identify that this 79 file is still open by the system on behalf of the FUSE filesystem. This 80 makes it difficult for system administrators to track resource usage or 81 debug issues related to open files (e.g., preventing unmounts). 82 83 2. **Bypassing RLIMIT_NOFILE**: The FUSE daemon process is subject to 84 resource limits, including the maximum number of open file descriptors 85 (``RLIMIT_NOFILE``). If an unprivileged daemon could register backing files 86 and then close its own FDs, it could potentially cause the kernel to hold 87 an unlimited number of open ``struct file`` references without these being 88 accounted against the daemon's ``RLIMIT_NOFILE``. This could lead to a 89 denial-of-service (DoS) by exhausting system-wide file resources. 90 91The ``CAP_SYS_ADMIN`` requirement acts as a safeguard against these issues, 92restricting this powerful capability to trusted processes. 93 94**NOTE**: ``io_uring`` solves this similar issue by exposing its "fixed files", 95which are visible via ``fdinfo`` and accounted under the registering user's 96``RLIMIT_NOFILE``. 97 98Filesystem Stacking and Shutdown Loops 99-------------------------------------- 100 101Another concern relates to the potential for creating complex and problematic 102filesystem stacking scenarios if unprivileged users could set up passthrough. 103A FUSE passthrough filesystem might use a backing file that resides: 104 105 * On the *same* FUSE filesystem. 106 * On another filesystem (like OverlayFS) which itself might have an upper or 107 lower layer that is a FUSE filesystem. 108 109These configurations could create dependency loops, particularly during 110filesystem shutdown or unmount sequences, leading to deadlocks or system 111instability. This is conceptually similar to the risks associated with the 112``LOOP_SET_FD`` ioctl, which also requires ``CAP_SYS_ADMIN``. 113 114To mitigate this, FUSE passthrough already incorporates checks based on 115filesystem stacking depth (``sb->s_stack_depth`` and ``fc->max_stack_depth``). 116For example, during the ``FUSE_INIT`` handshake, the FUSE daemon can negotiate 117the ``max_stack_depth`` it supports. When a backing file is registered via 118``FUSE_DEV_IOC_BACKING_OPEN``, the kernel checks if the backing file's 119filesystem stack depth is within the allowed limit. 120 121The ``CAP_SYS_ADMIN`` requirement provides an additional layer of security, 122ensuring that only privileged users can create these potentially complex 123stacking arrangements. 124 125General Security Posture 126------------------------ 127 128As a general principle for new kernel features that allow userspace to instruct 129the kernel to perform direct operations on its behalf based on user-provided 130file descriptors, starting with a higher privilege requirement (like 131``CAP_SYS_ADMIN``) is a conservative and common security practice. This allows 132the feature to be used and tested while further security implications are 133evaluated and addressed. 134