xref: /linux/Documentation/core-api/watch_queue.rst (revision f5461124d59bfb62bd9e231ee64cbaf757343ad5)
1c73be61cSDavid Howells==============================
2c73be61cSDavid HowellsGeneral notification mechanism
3c73be61cSDavid Howells==============================
4c73be61cSDavid Howells
5c73be61cSDavid HowellsThe general notification mechanism is built on top of the standard pipe driver
6c73be61cSDavid Howellswhereby it effectively splices notification messages from the kernel into pipes
7c73be61cSDavid Howellsopened by userspace.  This can be used in conjunction with::
8c73be61cSDavid Howells
9c73be61cSDavid Howells  * Key/keyring notifications
10c73be61cSDavid Howells
11c73be61cSDavid Howells
12c73be61cSDavid HowellsThe notifications buffers can be enabled by:
13c73be61cSDavid Howells
14c73be61cSDavid Howells	"General setup"/"General notification queue"
15c73be61cSDavid Howells	(CONFIG_WATCH_QUEUE)
16c73be61cSDavid Howells
17c73be61cSDavid HowellsThis document has the following sections:
18c73be61cSDavid Howells
19c73be61cSDavid Howells.. contents:: :local:
20c73be61cSDavid Howells
21c73be61cSDavid Howells
22c73be61cSDavid HowellsOverview
23c73be61cSDavid Howells========
24c73be61cSDavid Howells
25c73be61cSDavid HowellsThis facility appears as a pipe that is opened in a special mode.  The pipe's
26c73be61cSDavid Howellsinternal ring buffer is used to hold messages that are generated by the kernel.
27c73be61cSDavid HowellsThese messages are then read out by read().  Splice and similar are disabled on
28c73be61cSDavid Howellssuch pipes due to them wanting to, under some circumstances, revert their
29c73be61cSDavid Howellsadditions to the ring - which might end up interleaved with notification
30c73be61cSDavid Howellsmessages.
31c73be61cSDavid Howells
32c73be61cSDavid HowellsThe owner of the pipe has to tell the kernel which sources it would like to
33c73be61cSDavid Howellswatch through that pipe.  Only sources that have been connected to a pipe will
34c73be61cSDavid Howellsinsert messages into it.  Note that a source may be bound to multiple pipes and
35c73be61cSDavid Howellsinsert messages into all of them simultaneously.
36c73be61cSDavid Howells
37c73be61cSDavid HowellsFilters may also be emplaced on a pipe so that certain source types and
38c73be61cSDavid Howellssubevents can be ignored if they're not of interest.
39c73be61cSDavid Howells
40c73be61cSDavid HowellsA message will be discarded if there isn't a slot available in the ring or if
41c73be61cSDavid Howellsno preallocated message buffer is available.  In both of these cases, read()
42c73be61cSDavid Howellswill insert a WATCH_META_LOSS_NOTIFICATION message into the output buffer after
43c73be61cSDavid Howellsthe last message currently in the buffer has been read.
44c73be61cSDavid Howells
45c73be61cSDavid HowellsNote that when producing a notification, the kernel does not wait for the
46c73be61cSDavid Howellsconsumers to collect it, but rather just continues on.  This means that
47c73be61cSDavid Howellsnotifications can be generated whilst spinlocks are held and also protects the
48c73be61cSDavid Howellskernel from being held up indefinitely by a userspace malfunction.
49c73be61cSDavid Howells
50c73be61cSDavid Howells
51c73be61cSDavid HowellsMessage Structure
52c73be61cSDavid Howells=================
53c73be61cSDavid Howells
54c73be61cSDavid HowellsNotification messages begin with a short header::
55c73be61cSDavid Howells
56c73be61cSDavid Howells	struct watch_notification {
57c73be61cSDavid Howells		__u32	type:24;
58c73be61cSDavid Howells		__u32	subtype:8;
59c73be61cSDavid Howells		__u32	info;
60c73be61cSDavid Howells	};
61c73be61cSDavid Howells
62c73be61cSDavid Howells"type" indicates the source of the notification record and "subtype" indicates
63c73be61cSDavid Howellsthe type of record from that source (see the Watch Sources section below).  The
64c73be61cSDavid Howellstype may also be "WATCH_TYPE_META".  This is a special record type generated
65c73be61cSDavid Howellsinternally by the watch queue itself.  There are two subtypes:
66c73be61cSDavid Howells
67c73be61cSDavid Howells  * WATCH_META_REMOVAL_NOTIFICATION
68c73be61cSDavid Howells  * WATCH_META_LOSS_NOTIFICATION
69c73be61cSDavid Howells
70c73be61cSDavid HowellsThe first indicates that an object on which a watch was installed was removed
71c73be61cSDavid Howellsor destroyed and the second indicates that some messages have been lost.
72c73be61cSDavid Howells
73c73be61cSDavid Howells"info" indicates a bunch of things, including:
74c73be61cSDavid Howells
75c73be61cSDavid Howells  * The length of the message in bytes, including the header (mask with
76c73be61cSDavid Howells    WATCH_INFO_LENGTH and shift by WATCH_INFO_LENGTH__SHIFT).  This indicates
77c73be61cSDavid Howells    the size of the record, which may be between 8 and 127 bytes.
78c73be61cSDavid Howells
79c73be61cSDavid Howells  * The watch ID (mask with WATCH_INFO_ID and shift by WATCH_INFO_ID__SHIFT).
80c73be61cSDavid Howells    This indicates that caller's ID of the watch, which may be between 0
81c73be61cSDavid Howells    and 255.  Multiple watches may share a queue, and this provides a means to
82c73be61cSDavid Howells    distinguish them.
83c73be61cSDavid Howells
84c73be61cSDavid Howells  * A type-specific field (WATCH_INFO_TYPE_INFO).  This is set by the
85c73be61cSDavid Howells    notification producer to indicate some meaning specific to the type and
86c73be61cSDavid Howells    subtype.
87c73be61cSDavid Howells
88c73be61cSDavid HowellsEverything in info apart from the length can be used for filtering.
89c73be61cSDavid Howells
90c73be61cSDavid HowellsThe header can be followed by supplementary information.  The format of this is
91c73be61cSDavid Howellsat the discretion is defined by the type and subtype.
92c73be61cSDavid Howells
93c73be61cSDavid Howells
94c73be61cSDavid HowellsWatch List (Notification Source) API
95c73be61cSDavid Howells====================================
96c73be61cSDavid Howells
97c73be61cSDavid HowellsA "watch list" is a list of watchers that are subscribed to a source of
98c73be61cSDavid Howellsnotifications.  A list may be attached to an object (say a key or a superblock)
99c73be61cSDavid Howellsor may be global (say for device events).  From a userspace perspective, a
100c73be61cSDavid Howellsnon-global watch list is typically referred to by reference to the object it
101c73be61cSDavid Howellsbelongs to (such as using KEYCTL_NOTIFY and giving it a key serial number to
102c73be61cSDavid Howellswatch that specific key).
103c73be61cSDavid Howells
104c73be61cSDavid HowellsTo manage a watch list, the following functions are provided:
105c73be61cSDavid Howells
106*50f32634SMauro Carvalho Chehab  * ::
107*50f32634SMauro Carvalho Chehab
108*50f32634SMauro Carvalho Chehab	void init_watch_list(struct watch_list *wlist,
109*50f32634SMauro Carvalho Chehab			     void (*release_watch)(struct watch *wlist));
110c73be61cSDavid Howells
111c73be61cSDavid Howells    Initialise a watch list.  If ``release_watch`` is not NULL, then this
112c73be61cSDavid Howells    indicates a function that should be called when the watch_list object is
113c73be61cSDavid Howells    destroyed to discard any references the watch list holds on the watched
114c73be61cSDavid Howells    object.
115c73be61cSDavid Howells
116c73be61cSDavid Howells  * ``void remove_watch_list(struct watch_list *wlist);``
117c73be61cSDavid Howells
118c73be61cSDavid Howells    This removes all of the watches subscribed to a watch_list and frees them
119c73be61cSDavid Howells    and then destroys the watch_list object itself.
120c73be61cSDavid Howells
121c73be61cSDavid Howells
122c73be61cSDavid HowellsWatch Queue (Notification Output) API
123c73be61cSDavid Howells=====================================
124c73be61cSDavid Howells
125c73be61cSDavid HowellsA "watch queue" is the buffer allocated by an application that notification
126c73be61cSDavid Howellsrecords will be written into.  The workings of this are hidden entirely inside
127c73be61cSDavid Howellsof the pipe device driver, but it is necessary to gain a reference to it to set
128c73be61cSDavid Howellsa watch.  These can be managed with:
129c73be61cSDavid Howells
130c73be61cSDavid Howells  * ``struct watch_queue *get_watch_queue(int fd);``
131c73be61cSDavid Howells
132c73be61cSDavid Howells    Since watch queues are indicated to the kernel by the fd of the pipe that
133c73be61cSDavid Howells    implements the buffer, userspace must hand that fd through a system call.
134c73be61cSDavid Howells    This can be used to look up an opaque pointer to the watch queue from the
135c73be61cSDavid Howells    system call.
136c73be61cSDavid Howells
137c73be61cSDavid Howells  * ``void put_watch_queue(struct watch_queue *wqueue);``
138c73be61cSDavid Howells
139c73be61cSDavid Howells    This discards the reference obtained from ``get_watch_queue()``.
140c73be61cSDavid Howells
141c73be61cSDavid Howells
142c73be61cSDavid HowellsWatch Subscription API
143c73be61cSDavid Howells======================
144c73be61cSDavid Howells
145c73be61cSDavid HowellsA "watch" is a subscription on a watch list, indicating the watch queue, and
146c73be61cSDavid Howellsthus the buffer, into which notification records should be written.  The watch
147c73be61cSDavid Howellsqueue object may also carry filtering rules for that object, as set by
148c73be61cSDavid Howellsuserspace.  Some parts of the watch struct can be set by the driver::
149c73be61cSDavid Howells
150c73be61cSDavid Howells	struct watch {
151c73be61cSDavid Howells		union {
152c73be61cSDavid Howells			u32		info_id;	/* ID to be OR'd in to info field */
153c73be61cSDavid Howells			...
154c73be61cSDavid Howells		};
155c73be61cSDavid Howells		void			*private;	/* Private data for the watched object */
156c73be61cSDavid Howells		u64			id;		/* Internal identifier */
157c73be61cSDavid Howells		...
158c73be61cSDavid Howells	};
159c73be61cSDavid Howells
160c73be61cSDavid HowellsThe ``info_id`` value should be an 8-bit number obtained from userspace and
161c73be61cSDavid Howellsshifted by WATCH_INFO_ID__SHIFT.  This is OR'd into the WATCH_INFO_ID field of
162c73be61cSDavid Howellsstruct watch_notification::info when and if the notification is written into
163c73be61cSDavid Howellsthe associated watch queue buffer.
164c73be61cSDavid Howells
165c73be61cSDavid HowellsThe ``private`` field is the driver's data associated with the watch_list and
166c73be61cSDavid Howellsis cleaned up by the ``watch_list::release_watch()`` method.
167c73be61cSDavid Howells
168c73be61cSDavid HowellsThe ``id`` field is the source's ID.  Notifications that are posted with a
169c73be61cSDavid Howellsdifferent ID are ignored.
170c73be61cSDavid Howells
171c73be61cSDavid HowellsThe following functions are provided to manage watches:
172c73be61cSDavid Howells
173c73be61cSDavid Howells  * ``void init_watch(struct watch *watch, struct watch_queue *wqueue);``
174c73be61cSDavid Howells
175c73be61cSDavid Howells    Initialise a watch object, setting its pointer to the watch queue, using
176c73be61cSDavid Howells    appropriate barriering to avoid lockdep complaints.
177c73be61cSDavid Howells
178c73be61cSDavid Howells  * ``int add_watch_to_object(struct watch *watch, struct watch_list *wlist);``
179c73be61cSDavid Howells
180c73be61cSDavid Howells    Subscribe a watch to a watch list (notification source).  The
181c73be61cSDavid Howells    driver-settable fields in the watch struct must have been set before this
182c73be61cSDavid Howells    is called.
183c73be61cSDavid Howells
184*50f32634SMauro Carvalho Chehab  * ::
185*50f32634SMauro Carvalho Chehab
186*50f32634SMauro Carvalho Chehab	int remove_watch_from_object(struct watch_list *wlist,
187c73be61cSDavid Howells				     struct watch_queue *wqueue,
188*50f32634SMauro Carvalho Chehab				     u64 id, false);
189c73be61cSDavid Howells
190c73be61cSDavid Howells    Remove a watch from a watch list, where the watch must match the specified
191c73be61cSDavid Howells    watch queue (``wqueue``) and object identifier (``id``).  A notification
192c73be61cSDavid Howells    (``WATCH_META_REMOVAL_NOTIFICATION``) is sent to the watch queue to
193c73be61cSDavid Howells    indicate that the watch got removed.
194c73be61cSDavid Howells
195c73be61cSDavid Howells  * ``int remove_watch_from_object(struct watch_list *wlist, NULL, 0, true);``
196c73be61cSDavid Howells
197c73be61cSDavid Howells    Remove all the watches from a watch list.  It is expected that this will be
198c73be61cSDavid Howells    called preparatory to destruction and that the watch list will be
199c73be61cSDavid Howells    inaccessible to new watches by this point.  A notification
200c73be61cSDavid Howells    (``WATCH_META_REMOVAL_NOTIFICATION``) is sent to the watch queue of each
201c73be61cSDavid Howells    subscribed watch to indicate that the watch got removed.
202c73be61cSDavid Howells
203c73be61cSDavid Howells
204c73be61cSDavid HowellsNotification Posting API
205c73be61cSDavid Howells========================
206c73be61cSDavid Howells
207c73be61cSDavid HowellsTo post a notification to watch list so that the subscribed watches can see it,
208c73be61cSDavid Howellsthe following function should be used::
209c73be61cSDavid Howells
210c73be61cSDavid Howells	void post_watch_notification(struct watch_list *wlist,
211c73be61cSDavid Howells				     struct watch_notification *n,
212c73be61cSDavid Howells				     const struct cred *cred,
213c73be61cSDavid Howells				     u64 id);
214c73be61cSDavid Howells
215c73be61cSDavid HowellsThe notification should be preformatted and a pointer to the header (``n``)
216c73be61cSDavid Howellsshould be passed in.  The notification may be larger than this and the size in
217c73be61cSDavid Howellsunits of buffer slots is noted in ``n->info & WATCH_INFO_LENGTH``.
218c73be61cSDavid Howells
219c73be61cSDavid HowellsThe ``cred`` struct indicates the credentials of the source (subject) and is
220c73be61cSDavid Howellspassed to the LSMs, such as SELinux, to allow or suppress the recording of the
221c73be61cSDavid Howellsnote in each individual queue according to the credentials of that queue
222c73be61cSDavid Howells(object).
223c73be61cSDavid Howells
224c73be61cSDavid HowellsThe ``id`` is the ID of the source object (such as the serial number on a key).
225c73be61cSDavid HowellsOnly watches that have the same ID set in them will see this notification.
226c73be61cSDavid Howells
227c73be61cSDavid Howells
228c73be61cSDavid HowellsWatch Sources
229c73be61cSDavid Howells=============
230c73be61cSDavid Howells
231c73be61cSDavid HowellsAny particular buffer can be fed from multiple sources.  Sources include:
232c73be61cSDavid Howells
233c73be61cSDavid Howells  * WATCH_TYPE_KEY_NOTIFY
234c73be61cSDavid Howells
235c73be61cSDavid Howells    Notifications of this type indicate changes to keys and keyrings, including
236c73be61cSDavid Howells    the changes of keyring contents or the attributes of keys.
237c73be61cSDavid Howells
238c73be61cSDavid Howells    See Documentation/security/keys/core.rst for more information.
239c73be61cSDavid Howells
240c73be61cSDavid Howells
241c73be61cSDavid HowellsEvent Filtering
242c73be61cSDavid Howells===============
243c73be61cSDavid Howells
244c73be61cSDavid HowellsOnce a watch queue has been created, a set of filters can be applied to limit
245c73be61cSDavid Howellsthe events that are received using::
246c73be61cSDavid Howells
247c73be61cSDavid Howells	struct watch_notification_filter filter = {
248c73be61cSDavid Howells		...
249c73be61cSDavid Howells	};
250c73be61cSDavid Howells	ioctl(fd, IOC_WATCH_QUEUE_SET_FILTER, &filter)
251c73be61cSDavid Howells
252c73be61cSDavid HowellsThe filter description is a variable of type::
253c73be61cSDavid Howells
254c73be61cSDavid Howells	struct watch_notification_filter {
255c73be61cSDavid Howells		__u32	nr_filters;
256c73be61cSDavid Howells		__u32	__reserved;
257c73be61cSDavid Howells		struct watch_notification_type_filter filters[];
258c73be61cSDavid Howells	};
259c73be61cSDavid Howells
260c73be61cSDavid HowellsWhere "nr_filters" is the number of filters in filters[] and "__reserved"
261c73be61cSDavid Howellsshould be 0.  The "filters" array has elements of the following type::
262c73be61cSDavid Howells
263c73be61cSDavid Howells	struct watch_notification_type_filter {
264c73be61cSDavid Howells		__u32	type;
265c73be61cSDavid Howells		__u32	info_filter;
266c73be61cSDavid Howells		__u32	info_mask;
267c73be61cSDavid Howells		__u32	subtype_filter[8];
268c73be61cSDavid Howells	};
269c73be61cSDavid Howells
270c73be61cSDavid HowellsWhere:
271c73be61cSDavid Howells
272c73be61cSDavid Howells  * ``type`` is the event type to filter for and should be something like
273c73be61cSDavid Howells    "WATCH_TYPE_KEY_NOTIFY"
274c73be61cSDavid Howells
275c73be61cSDavid Howells  * ``info_filter`` and ``info_mask`` act as a filter on the info field of the
276c73be61cSDavid Howells    notification record.  The notification is only written into the buffer if::
277c73be61cSDavid Howells
278c73be61cSDavid Howells	(watch.info & info_mask) == info_filter
279c73be61cSDavid Howells
280c73be61cSDavid Howells    This could be used, for example, to ignore events that are not exactly on
281c73be61cSDavid Howells    the watched point in a mount tree.
282c73be61cSDavid Howells
283c73be61cSDavid Howells  * ``subtype_filter`` is a bitmask indicating the subtypes that are of
284c73be61cSDavid Howells    interest.  Bit 0 of subtype_filter[0] corresponds to subtype 0, bit 1 to
285c73be61cSDavid Howells    subtype 1, and so on.
286c73be61cSDavid Howells
287c73be61cSDavid HowellsIf the argument to the ioctl() is NULL, then the filters will be removed and
288c73be61cSDavid Howellsall events from the watched sources will come through.
289c73be61cSDavid Howells
290c73be61cSDavid Howells
291c73be61cSDavid HowellsUserspace Code Example
292c73be61cSDavid Howells======================
293c73be61cSDavid Howells
294c73be61cSDavid HowellsA buffer is created with something like the following::
295c73be61cSDavid Howells
296c73be61cSDavid Howells	pipe2(fds, O_TMPFILE);
297c73be61cSDavid Howells	ioctl(fds[1], IOC_WATCH_QUEUE_SET_SIZE, 256);
298c73be61cSDavid Howells
299c73be61cSDavid HowellsIt can then be set to receive keyring change notifications::
300c73be61cSDavid Howells
301c73be61cSDavid Howells	keyctl(KEYCTL_WATCH_KEY, KEY_SPEC_SESSION_KEYRING, fds[1], 0x01);
302c73be61cSDavid Howells
303c73be61cSDavid HowellsThe notifications can then be consumed by something like the following::
304c73be61cSDavid Howells
305c73be61cSDavid Howells	static void consumer(int rfd, struct watch_queue_buffer *buf)
306c73be61cSDavid Howells	{
307c73be61cSDavid Howells		unsigned char buffer[128];
308c73be61cSDavid Howells		ssize_t buf_len;
309c73be61cSDavid Howells
310c73be61cSDavid Howells		while (buf_len = read(rfd, buffer, sizeof(buffer)),
311c73be61cSDavid Howells		       buf_len > 0
312c73be61cSDavid Howells		       ) {
313c73be61cSDavid Howells			void *p = buffer;
314c73be61cSDavid Howells			void *end = buffer + buf_len;
315c73be61cSDavid Howells			while (p < end) {
316c73be61cSDavid Howells				union {
317c73be61cSDavid Howells					struct watch_notification n;
318c73be61cSDavid Howells					unsigned char buf1[128];
319c73be61cSDavid Howells				} n;
320c73be61cSDavid Howells				size_t largest, len;
321c73be61cSDavid Howells
322c73be61cSDavid Howells				largest = end - p;
323c73be61cSDavid Howells				if (largest > 128)
324c73be61cSDavid Howells					largest = 128;
325c73be61cSDavid Howells				memcpy(&n, p, largest);
326c73be61cSDavid Howells
327c73be61cSDavid Howells				len = (n->info & WATCH_INFO_LENGTH) >>
328c73be61cSDavid Howells					WATCH_INFO_LENGTH__SHIFT;
329c73be61cSDavid Howells				if (len == 0 || len > largest)
330c73be61cSDavid Howells					return;
331c73be61cSDavid Howells
332c73be61cSDavid Howells				switch (n.n.type) {
333c73be61cSDavid Howells				case WATCH_TYPE_META:
334c73be61cSDavid Howells					got_meta(&n.n);
335c73be61cSDavid Howells				case WATCH_TYPE_KEY_NOTIFY:
336c73be61cSDavid Howells					saw_key_change(&n.n);
337c73be61cSDavid Howells					break;
338c73be61cSDavid Howells				}
339c73be61cSDavid Howells
340c73be61cSDavid Howells				p += len;
341c73be61cSDavid Howells			}
342c73be61cSDavid Howells		}
343c73be61cSDavid Howells	}
344