History log of /linux/fs/netfs/read_retry.c (Results 51 – 75 of 95)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: v6.14-rc1
# 220ed690 30-Jan-2025 Lucas De Marchi <lucas.demarchi@intel.com>

Merge drm/drm-next into drm-xe-next

Backmerge drm-next to get the common APIs and refactors as well as
getting the display changes from i915 in xe so the probe order can be
improved.

Signed-off-by:

Merge drm/drm-next into drm-xe-next

Backmerge drm-next to get the common APIs and refactors as well as
getting the display changes from i915 in xe so the probe order can be
improved.

Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>

show more ...


# 37ba6c7f 23-Jan-2025 Maarten Lankhorst <dev@lankhorst.se>

Merge remote-tracking branch 'drm/drm-next' into drm-misc-next-fixes

A regression was caused by commit e4b5ccd392b9 ("drm/v3d: Ensure job pointer is set to NULL
after job completion"), but this comm

Merge remote-tracking branch 'drm/drm-next' into drm-misc-next-fixes

A regression was caused by commit e4b5ccd392b9 ("drm/v3d: Ensure job pointer is set to NULL
after job completion"), but this commit is not yet in next-fixes,
fast-forward it.

Try #2, first one didn't have v6.13 in it.

Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>

show more ...


# 07c5b277 23-Jan-2025 Simona Vetter <simona.vetter@ffwll.ch>

Merge v6.13 into drm-next

A regression was caused by commit e4b5ccd392b9 ("drm/v3d: Ensure job
pointer is set to NULL after job completion"), but this commit is not
yet in next-fixes, fast-forward i

Merge v6.13 into drm-next

A regression was caused by commit e4b5ccd392b9 ("drm/v3d: Ensure job
pointer is set to NULL after job completion"), but this commit is not
yet in next-fixes, fast-forward it.

Note that this recreates Linus merge in 96c84703f1cf ("Merge tag
'drm-next-2025-01-17' of https://gitlab.freedesktop.org/drm/kernel")
because I didn't want to backmerge a random point in the merge window.

Signed-off-by: Simona Vetter <simona.vetter@ffwll.ch>

show more ...


# 25768de5 21-Jan-2025 Dmitry Torokhov <dmitry.torokhov@gmail.com>

Merge branch 'next' into for-linus

Prepare input updates for 6.14 merge window.


# ca56a74a 20-Jan-2025 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'vfs-6.14-rc1.netfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs netfs updates from Christian Brauner:
"This contains read performance improvements and support for m

Merge tag 'vfs-6.14-rc1.netfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs netfs updates from Christian Brauner:
"This contains read performance improvements and support for monolithic
single-blob objects that have to be read/written as such (e.g. AFS
directory contents). The implementation of the two parts is interwoven
as each makes the other possible.

- Read performance improvements

The read performance improvements are intended to speed up some
loss of performance detected in cifs and to a lesser extend in afs.

The problem is that we queue too many work items during the
collection of read results: each individual subrequest is collected
by its own work item, and then they have to interact with each
other when a series of subrequests don't exactly align with the
pattern of folios that are being read by the overall request.

Whilst the processing of the pages covered by individual
subrequests as they complete potentially allows folios to be woken
in parallel and with minimum delay, it can shuffle wakeups for
sequential reads out of order - and that is the most common I/O
pattern.

The final assessment and cleanup of an operation is then held up
until the last I/O completes - and for a synchronous sequential
operation, this means the bouncing around of work items just adds
latency.

Two changes have been made to make this work:

(1) All collection is now done in a single "work item" that works
progressively through the subrequests as they complete (and
also dispatches retries as necessary).

(2) For readahead and AIO, this work item be done on a workqueue
and can run in parallel with the ultimate consumer of the data;
for synchronous direct or unbuffered reads, the collection is
run in the application thread and not offloaded.

Functions such as smb2_readv_callback() then just tell netfslib
that the subrequest has terminated; netfslib does a minimal bit of
processing on the spot - stat counting and tracing mostly - and
then queues/wakes up the worker. This simplifies the logic as the
collector just walks sequentially through the subrequests as they
complete and walks through the folios, if buffered, unlocking them
as it goes. It also keeps to a minimum the amount of latency
injected into the filesystem's low-level I/O handling

The way netfs supports filesystems using the deprecated
PG_private_2 flag is changed: folios are flagged and added to a
write request as they complete and that takes care of scheduling
the writes to the cache. The originating read request can then just
unlock the pages whatever happens.

- Single-blob object support

Single-blob objects are files for which the content of the file
must be read from or written to the server in a single operation
because reading them in parts may yield inconsistent results. AFS
directories are an example of this as there exists the possibility
that the contents are generated on the fly and would differ between
reads or might change due to third party interference.

Such objects will be written to and retrieved from the cache if one
is present, though we allow/may need to propose multiple
subrequests to do so. The important part is that read from/write to
the *server* is monolithic.

Single blob reading is, for the moment, fully synchronous and does
result collection in the application thread and, also for the
moment, the API is supplied the buffer in the form of a folio_queue
chain rather than using the pagecache.

- Related afs changes

This series makes a number of changes to the kafs filesystem,
primarily in the area of directory handling:

- AFS's FetchData RPC reply processing is made partially
asynchronous which allows the netfs_io_request's outstanding
operation counter to be removed as part of reducing the
collection to a single work item.

- Directory and symlink reading are plumbed through netfslib using
the single-blob object API and are now cacheable with fscache.
This also allows the afs_read struct to be eliminated and
netfs_io_subrequest to be used directly instead.

- Directory and symlink content are now stored in a folio_queue
buffer rather than in the pagecache. This means we don't require
the RCU read lock and xarray iteration to access it, and folios
won't randomly disappear under us because the VM wants them
back.

- The vnode operation lock is changed from a mutex struct to a
private lock implementation. The problem is that the lock now
needs to be dropped in a separate thread and mutexes don't
permit that.

- When a new directory or symlink is created, we now initialise it
locally and mark it valid rather than downloading it (we know
what it's likely to look like).

- We now use the in-directory hashtable to reduce the number of
entries we need to scan when doing a lookup. The edit routines
have to maintain the hash chains.

- Cancellation (e.g. by signal) of an async call after the
rxrpc_call has been set up is now offloaded to the worker thread
as there will be a notification from rxrpc upon completion. This
avoids a double cleanup.

- A "rolling buffer" implementation is created to abstract out the
two separate folio_queue chaining implementations I had (one for
read and one for write).

- Functions are provided to create/extend a buffer in a folio_queue
chain and tear it down again.

This is used to handle AFS directories, but could also be used to
create bounce buffers for content crypto and transport crypto.

- The was_async argument is dropped from netfs_read_subreq_terminated()

Instead we wake the read collection work item by either queuing it
or waking up the app thread.

- We don't need to use BH-excluding locks when communicating between
the issuing thread and the collection thread as neither of them now
run in BH context.

- Also included are a number of new tracepoints; a split of the
netfslib write collection code to put retrying into its own file
(it gets more complicated with content encryption).

- There are also some minor fixes AFS included, including fixing the
AFS directory format struct layout, reducing some directory
over-invalidation and making afs_mkdir() translate EEXIST to
ENOTEMPY (which is not available on all systems the servers
support).

- Finally, there's a patch to try and detect entry into the folio
unlock function with no folio_queue structs in the buffer (which
isn't allowed in the cases that can get there).

This is a debugging patch, but should be minimal overhead"

* tag 'vfs-6.14-rc1.netfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (31 commits)
netfs: Report on NULL folioq in netfs_writeback_unlock_folios()
afs: Add a tracepoint for afs_read_receive()
afs: Locally initialise the contents of a new symlink on creation
afs: Use the contained hashtable to search a directory
afs: Make afs_mkdir() locally initialise a new directory's content
netfs: Change the read result collector to only use one work item
afs: Make {Y,}FS.FetchData an asynchronous operation
afs: Fix cleanup of immediately failed async calls
afs: Eliminate afs_read
afs: Use netfslib for symlinks, allowing them to be cached
afs: Use netfslib for directories
afs: Make afs_init_request() get a key if not given a file
netfs: Add support for caching single monolithic objects such as AFS dirs
netfs: Add functions to build/clean a buffer in a folio_queue
afs: Add more tracepoints to do with tracking validity
cachefiles: Add auxiliary data trace
cachefiles: Add some subrequest tracepoints
netfs: Remove some extraneous directory invalidations
afs: Fix directory format encoding struct
afs: Fix EEXIST error returned from afs_rmdir() to be ENOTEMPTY
...

show more ...


Revision tags: v6.13
# 2ee738e9 16-Jan-2025 Jakub Kicinski <kuba@kernel.org>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Cross-merge networking fixes after downstream PR (net-6.13-rc8).

Conflicts:

drivers/net/ethernet/realtek/r8169_main.c
1f691a1fc4be

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Cross-merge networking fixes after downstream PR (net-6.13-rc8).

Conflicts:

drivers/net/ethernet/realtek/r8169_main.c
1f691a1fc4be ("r8169: remove redundant hwmon support")
152d00a91396 ("r8169: simplify setting hwmon attribute visibility")
https://lore.kernel.org/20250115122152.760b4e8d@canb.auug.org.au

Adjacent changes:

drivers/net/ethernet/broadcom/bnxt/bnxt.c
152f4da05aee ("bnxt_en: add support for rx-copybreak ethtool command")
f0aa6a37a3db ("eth: bnxt: always recalculate features after XDP clearing, fix null-deref")

drivers/net/ethernet/intel/ice/ice_type.h
50327223a8bb ("ice: add lock to protect low latency interface")
dc26548d729e ("ice: Fix quad registers read on E825")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>

show more ...


# 5cf32aff 15-Jan-2025 Paolo Bonzini <pbonzini@redhat.com>

Merge tag 'loongarch-kvm-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson into HEAD

LoongArch KVM changes for v6.14

1. Clear LLBCTL if secondary mmu mapping changed.

Merge tag 'loongarch-kvm-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson into HEAD

LoongArch KVM changes for v6.14

1. Clear LLBCTL if secondary mmu mapping changed.
2. Add hypercall service support for usermode VMM.

This is a really small changeset, because the Chinese New Year
(Spring Festival) is coming. Happy New Year!

show more ...


# 568bfce0 13-Jan-2025 Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Merge 6.13-rc7 into tty-next

We need the serial driver fixes in here to build on top of.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# b37333c8 13-Jan-2025 Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Merge 6.13-rc7 into staging next

We need the gpib changes in here as well to build on top of.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# dd19f411 13-Jan-2025 Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Merge 6.13-rc7 into driver-core-next

We need the debugfs / driver-core fixes in here as well for testing and
to build on top of.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# be887fca 13-Jan-2025 Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Merge 6.13-rc4 into char-misc-next

We need the IIO fixes in here as well, and it resolves a merge conflict
in:
drivers/iio/adc/ti-ads1119.c

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Sig

Merge 6.13-rc4 into char-misc-next

We need the IIO fixes in here as well, and it resolves a merge conflict
in:
drivers/iio/adc/ti-ads1119.c

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

show more ...


# 2919c4a3 13-Jan-2025 Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Merge 6.13-rc7 into usb-next

We need the USB fixes in here as well for testing.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


Revision tags: v6.13-rc7
# 7110f24f 10-Jan-2025 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'vfs-6.13-rc7.fixes.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs fixes from Christian Brauner:
"afs:

- Fix the maximum cell name length

- Fix merge prefere

Merge tag 'vfs-6.13-rc7.fixes.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs fixes from Christian Brauner:
"afs:

- Fix the maximum cell name length

- Fix merge preference rule failure condition

fuse:

- Fix fuse_get_user_pages() so it doesn't risk misleading the caller
to think pages have been allocated when they actually haven't

- Fix direct-io folio offset and length calculation

netfs:

- Fix async direct-io handling

- Fix read-retry for filesystems that don't provide a
->prepare_read() method

vfs:

- Prevent truncating 64-bit offsets to 32-bits in iomap

- Fix memory barrier interactions when polling

- Remove MNT_ONRB to fix concurrent modification of @mnt->mnt_flags
leading to MNT_ONRB to not be raised and invalid access to a list
member"

* tag 'vfs-6.13-rc7.fixes.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
poll: kill poll_does_not_wait()
sock_poll_wait: kill the no longer necessary barrier after poll_wait()
io_uring_poll: kill the no longer necessary barrier after poll_wait()
poll_wait: kill the obsolete wait_address check
poll_wait: add mb() to fix theoretical race between waitqueue_active() and .poll()
afs: Fix merge preference rule failure condition
netfs: Fix read-retry for fs with no ->prepare_read()
netfs: Fix kernel async DIO
fs: kill MNT_ONRB
iomap: avoid avoid truncating 64-bit offset to 32 bits
afs: Fix the maximum cell name length
fuse: Set *nbytesp=0 in fuse_get_user_pages on allocation failure
fuse: fix direct io folio offset and length calculation

show more ...


# 14ea4cd1 10-Jan-2025 Jakub Kicinski <kuba@kernel.org>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Cross-merge networking fixes after downstream PR (net-6.13-rc7).

Conflicts:
a42d71e322a8 ("net_sched: sch_cake: Add drop reasons")

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Cross-merge networking fixes after downstream PR (net-6.13-rc7).

Conflicts:
a42d71e322a8 ("net_sched: sch_cake: Add drop reasons")
737d4d91d35b ("sched: sch_cake: add bounds checks to host bulk flow fairness counts")

Adjacent changes:

drivers/net/ethernet/meta/fbnic/fbnic.h
3a856ab34726 ("eth: fbnic: add IRQ reuse support")
95978931d55f ("eth: fbnic: Revert "eth: fbnic: Add hardware monitoring support via HWMON interface"")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>

show more ...


# 904abff4 07-Jan-2025 David Howells <dhowells@redhat.com>

netfs: Fix read-retry for fs with no ->prepare_read()

Fix netfslib's read-retry to only call ->prepare_read() in the backing
filesystem such a function is provided. We can get to this point if a
th

netfs: Fix read-retry for fs with no ->prepare_read()

Fix netfslib's read-retry to only call ->prepare_read() in the backing
filesystem such a function is provided. We can get to this point if a
there's an active cache as failed reads from the cache need negotiating
with the server instead.

Fixes: ee4cdf7ba857 ("netfs: Speed up buffered reading")
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/529329.1736261010@warthog.procyon.org.uk
cc: Jeff Layton <jlayton@kernel.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>

show more ...


# fbfd64d2 06-Jan-2025 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'vfs-6.13-rc7.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs fixes from Christian Brauner:

- Relax assertions on failure to encode file handles

The ->encode

Merge tag 'vfs-6.13-rc7.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs fixes from Christian Brauner:

- Relax assertions on failure to encode file handles

The ->encode_fh() method can fail for various reasons. None of them
warrant a WARN_ON().

- Fix overlayfs file handle encoding by allowing encoding an fid from
an inode without an alias

- Make sure fuse_dir_open() handles FOPEN_KEEP_CACHE. If it's not
specified fuse needs to invaludate the directory inode page cache

- Fix qnx6 so it builds with gcc-15

- Various fixes for netfslib and ceph and nfs filesystems:
- Ignore silly rename files from afs and nfs when building header
archives
- Fix read result collection in netfslib with multiple subrequests
- Handle ENOMEM for netfslib buffered reads
- Fix oops in nfs_netfs_init_request()
- Parse the secctx command immediately in cachefiles
- Remove a redundant smp_rmb() in netfslib
- Handle recursion in read retry in netfslib
- Fix clearing of folio_queue
- Fix missing cancellation of copy-to_cache when the cache for a
file is temporarly disabled in netfslib

- Sanity check the hfs root record

- Fix zero padding data issues in concurrent write scenarios

- Fix is_mnt_ns_file() after converting nsfs to path_from_stashed()

- Fix missing declaration of init_files

- Increase I/O priority when writing revoke records in jbd2

- Flush filesystem device before updating tail sequence in jbd2

* tag 'vfs-6.13-rc7.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (23 commits)
ovl: support encoding fid from inode with no alias
ovl: pass realinode to ovl_encode_real_fh() instead of realdentry
fuse: respect FOPEN_KEEP_CACHE on opendir
netfs: Fix is-caching check in read-retry
netfs: Fix the (non-)cancellation of copy when cache is temporarily disabled
netfs: Fix ceph copy to cache on write-begin
netfs: Work around recursion by abandoning retry if nothing read
netfs: Fix missing barriers by using clear_and_wake_up_bit()
netfs: Remove redundant use of smp_rmb()
cachefiles: Parse the "secctx" immediately
nfs: Fix oops in nfs_netfs_init_request() when copying to cache
netfs: Fix enomem handling in buffered reads
netfs: Fix non-contiguous donation between completed reads
kheaders: Ignore silly-rename files
fs: relax assertions on failure to encode file handles
fs: fix missing declaration of init_files
fs: fix is_mnt_ns_file()
iomap: fix zero padding data issue in concurrent append writes
iomap: pass byte granular end position to iomap_add_to_ioend
jbd2: flush filesystem device before updating tail sequence
...

show more ...


Revision tags: v6.13-rc6, v6.13-rc5, v6.13-rc4
# 7a47db23 20-Dec-2024 Christian Brauner <brauner@kernel.org>

Merge patch series "netfs: Read performance improvements and "single-blob" support"

David Howells <dhowells@redhat.com> says:

This set of patches is primarily about two things: improving read
perfo

Merge patch series "netfs: Read performance improvements and "single-blob" support"

David Howells <dhowells@redhat.com> says:

This set of patches is primarily about two things: improving read
performance and supporting monolithic single-blob objects that have to be
read/written as such (e.g. AFS directory contents). The implementation of
the two parts is interwoven as each makes the other possible.

READ PERFORMANCE
================

The read performance improvements are intended to speed up some loss of
performance detected in cifs and to a lesser extend in afs. The problem is
that we queue too many work items during the collection of read results:
each individual subrequest is collected by its own work item, and then they
have to interact with each other when a series of subrequests don't exactly
align with the pattern of folios that are being read by the overall
request.

Whilst the processing of the pages covered by individual subrequests as
they complete potentially allows folios to be woken in parallel and with
minimum delay, it can shuffle wakeups for sequential reads out of order -
and that is the most common I/O pattern.

The final assessment and cleanup of an operation is then held up until the
last I/O completes - and for a synchronous sequential operation, this means
the bouncing around of work items just adds latency.

Two changes have been made to make this work:

(1) All collection is now done in a single "work item" that works
progressively through the subrequests as they complete (and also
dispatches retries as necessary).

(2) For readahead and AIO, this work item be done on a workqueue and can
run in parallel with the ultimate consumer of the data; for
synchronous direct or unbuffered reads, the collection is run in the
application thread and not offloaded.

Functions such as smb2_readv_callback() then just tell netfslib that the
subrequest has terminated; netfslib does a minimal bit of processing on the
spot - stat counting and tracing mostly - and then queues/wakes up the
worker. This simplifies the logic as the collector just walks sequentially
through the subrequests as they complete and walks through the folios, if
buffered, unlocking them as it goes. It also keeps to a minimum the amount
of latency injected into the filesystem's low-level I/O handling

The way netfs supports filesystems using the deprecated PG_private_2 flag
is changed: folios are flagged and added to a write request as they
complete and that takes care of scheduling the writes to the cache. The
originating read request can then just unlock the pages whatever happens.

SINGLE-BLOB OBJECT SUPPORT
==========================

Single-blob objects are files for which the content of the file must be
read from or written to the server in a single operation because reading
them in parts may yield inconsistent results. AFS directories are an
example of this as there exists the possibility that the contents are
generated on the fly and would differ between reads or might change due to
third party interference.

Such objects will be written to and retrieved from the cache if one is
present, though we allow/may need to propose multiple subrequests to do so.
The important part is that read from/write to the *server* is monolithic.

Single blob reading is, for the moment, fully synchronous and does result
collection in the application thread and, also for the moment, the API is
supplied the buffer in the form of a folio_queue chain rather than using
the pagecache.

AFS CHANGES
===========

This series makes a number of changes to the kafs filesystem, primarily in
the area of directory handling:

(1) AFS's FetchData RPC reply processing is made partially asynchronous
which allows the netfs_io_request's outstanding operation counter to
be removed as part of reducing the collection to a single work item.

(2) Directory and symlink reading are plumbed through netfslib using the
single-blob object API and are now cacheable with fscache. This also
allows the afs_read struct to be eliminated and netfs_io_subrequest to
be used directly instead.

(3) Directory and symlink content are now stored in a folio_queue buffer
rather than in the pagecache. This means we don't require the RCU
read lock and xarray iteration to access it, and folios won't randomly
disappear under us because the VM wants them back.

There are some downsides to this, though: the storage folios are no
longer known to the VM, drop_caches can't flush them, the folios are
not migrateable. The inode must also be marked dirty manually to get
the data written to the cache in the background.

(4) The vnode operation lock is changed from a mutex struct to a private
lock implementation. The problem is that the lock now needs to be
dropped in a separate thread and mutexes don't permit that.

(5) When a new directory or symlink is created, we now initialise it
locally and mark it valid rather than downloading it (we know what
it's likely to look like).

(6) We now use the in-directory hashtable to reduce the number of entries
we need to scan when doing a lookup. The edit routines have to
maintain the hash chains.

(7) Cancellation (e.g. by signal) of an async call after the rxrpc_call
has been set up is now offloaded to the worker thread as there will be
a notification from rxrpc upon completion. This avoids a double
cleanup.

SUPPORTING CHANGES
==================

To support the above some other changes are also made:

(1) A "rolling buffer" implementation is created to abstract out the two
separate folio_queue chaining implementations I had (one for read and
one for write).

(2) Functions are provided to create/extend a buffer in a folio_queue
chain and tear it down again. This is used to handle AFS directories,
but could also be used to create bounce buffers for content crypto and
transport crypto.

(3) The was_async argument is dropped from netfs_read_subreq_terminated().
Instead we wake the read collection work item by either queuing it or
waking up the app thread.

(4) We don't need to use BH-excluding locks when communicating between the
issuing thread and the collection thread as neither of them now run in
BH context.

MISCELLANY
==========

Also included are a number of new tracepoints; a split of the netfslib
write collection code to put retrying into its own file (it gets more
complicated with content encryption).

There are also some minor fixes AFS included, including fixing the AFS
directory format struct layout, reducing some directory over-invalidation
and making afs_mkdir() translate EEXIST to ENOTEMPY (which is not available
on all systems the servers support).

Finally, there's a patch to try and detect entry into the folio unlock
function with no folio_queue structs in the buffer (which isn't allowed in
the cases that can get there). This is a debugging patch, but should be
minimal overhead.

* patches from https://lore.kernel.org/r/20241216204124.3752367-1-dhowells@redhat.com: (31 commits)
netfs: Report on NULL folioq in netfs_writeback_unlock_folios()
afs: Add a tracepoint for afs_read_receive()
afs: Locally initialise the contents of a new symlink on creation
afs: Use the contained hashtable to search a directory
afs: Make afs_mkdir() locally initialise a new directory's content
netfs: Change the read result collector to only use one work item
afs: Make {Y,}FS.FetchData an asynchronous operation
afs: Fix cleanup of immediately failed async calls
afs: Eliminate afs_read
afs: Use netfslib for symlinks, allowing them to be cached
afs: Use netfslib for directories
afs: Make afs_init_request() get a key if not given a file
netfs: Add support for caching single monolithic objects such as AFS dirs
netfs: Add functions to build/clean a buffer in a folio_queue
afs: Add more tracepoints to do with tracking validity
cachefiles: Add auxiliary data trace
cachefiles: Add some subrequest tracepoints
netfs: Remove some extraneous directory invalidations
afs: Fix directory format encoding struct
afs: Fix EEXIST error returned from afs_rmdir() to be ENOTEMPTY
...

Link: https://lore.kernel.org/r/20241216204124.3752367-1-dhowells@redhat.com
Signed-off-by: Christian Brauner <brauner@kernel.org>

show more ...


# e2d46f2e 16-Dec-2024 David Howells <dhowells@redhat.com>

netfs: Change the read result collector to only use one work item

Change the way netfslib collects read results to do all the collection for
a particular read request using a single work item that w

netfs: Change the read result collector to only use one work item

Change the way netfslib collects read results to do all the collection for
a particular read request using a single work item that walks along the
subrequest queue as subrequests make progress or complete, unlocking folios
progressively rather than doing the unlock in parallel as parallel requests
come in.

The code is remodelled to be more like the write-side code, though only
using a single stream. This makes it more directly comparable and thus
easier to duplicate fixes between the two sides.

This has a number of advantages:

(1) It's simpler. There doesn't need to be a complex donation mechanism
to handle mismatches between the size and alignment of subrequests and
folios. The collector unlocks folios as the subrequests covering each
complete.

(2) It should cause less scheduler overhead as there's a single work item
in play unlocking pages in parallel when a read gets split up into a
lot of subrequests instead of one per subrequest.

Whilst the parallellism is nice in theory, in practice, the vast
majority of loads are sequential reads of the whole file, so
committing a bunch of threads to unlocking folios out of order doesn't
help in those cases.

(3) It should make it easier to implement content decryption. A folio
cannot be decrypted until all the requests that contribute to it have
completed - and, again, most loads are sequential and so, most of the
time, we want to begin decryption sequentially (though it's great if
the decryption can happen in parallel).

There is a disadvantage in that we're losing the ability to decrypt and
unlock things on an as-things-arrive basis which may affect some
applications.

Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/20241216204124.3752367-28-dhowells@redhat.com
cc: Jeff Layton <jlayton@kernel.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>

show more ...


# 627cf645 16-Dec-2024 David Howells <dhowells@redhat.com>

netfs: Don't use bh spinlock

All the accessing of the subrequest lists is now done in process context,
possibly in a workqueue, but not now in a BH context, so we don't need the
lock against BH inte

netfs: Don't use bh spinlock

All the accessing of the subrequest lists is now done in process context,
possibly in a workqueue, but not now in a BH context, so we don't need the
lock against BH interference when taking the netfs_io_request::lock
spinlock.

Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/20241216204124.3752367-11-dhowells@redhat.com
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cachefs@redhat.com
cc: linux-fsdevel@vger.kernel.org
cc: linux-mm@kvack.org
Signed-off-by: Christian Brauner <brauner@kernel.org>

show more ...


# 31fc366a 16-Dec-2024 David Howells <dhowells@redhat.com>

netfs: Drop the was_async arg from netfs_read_subreq_terminated()

Drop the was_async argument from netfs_read_subreq_terminated(). Almost
every caller is either in process context and passes false.

netfs: Drop the was_async arg from netfs_read_subreq_terminated()

Drop the was_async argument from netfs_read_subreq_terminated(). Almost
every caller is either in process context and passes false. Some
filesystems delegate the call to a workqueue to avoid doing the work in
their network message queue parsing thread.

The only exception is netfs_cache_read_terminated() which handles
completion in the cache - which is usually a callback from the backing
filesystem in softirq context, though it can be from process context if an
error occurred. In this case, delegate to a workqueue.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/CAHk-=wiVC5Cgyz6QKXFu6fTaA6h4CjexDR-OV9kL6Vo5x9v8=A@mail.gmail.com/
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/20241216204124.3752367-10-dhowells@redhat.com
cc: Jeff Layton <jlayton@kernel.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>

show more ...


# 06fa229c 16-Dec-2024 David Howells <dhowells@redhat.com>

netfs: Abstract out a rolling folio buffer implementation

A rolling buffer is a series of folios held in a list of folio_queues. New
folios and folio_queue structs may be inserted at the head simul

netfs: Abstract out a rolling folio buffer implementation

A rolling buffer is a series of folios held in a list of folio_queues. New
folios and folio_queue structs may be inserted at the head simultaneously
with spent ones being removed from the tail without the need for locking.

The rolling buffer includes an iov_iter and it has to be careful managing
this as the list of folio_queues is extended such that an oops doesn't
incurred because the iterator was pointing to the end of a folio_queue
segment that got appended to and then removed.

We need to use the mechanism twice, once for read and once for write, and,
in future patches, we will use a second rolling buffer to handle bounce
buffering for content encryption.

Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/20241216204124.3752367-6-dhowells@redhat.com
cc: Jeff Layton <jlayton@kernel.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>

show more ...


# 5fe85a5c 20-Dec-2024 Christian Brauner <brauner@kernel.org>

Merge patch series "netfs, ceph, nfs, cachefiles: Miscellaneous fixes/changes"

David Howells <dhowells@redhat.com> says:

Here are some miscellaneous fixes and changes for netfslib and the ceph and

Merge patch series "netfs, ceph, nfs, cachefiles: Miscellaneous fixes/changes"

David Howells <dhowells@redhat.com> says:

Here are some miscellaneous fixes and changes for netfslib and the ceph and
nfs filesystems:

(1) Ignore silly-rename files from afs and nfs when building the header
archive in a kernel build.

(2) netfs: Fix the way read result collection applies results to folios
when each folio is being read by multiple subrequests and the results
come out of order.

(3) netfs: Fix ENOMEM handling in buffered reads.

(4) nfs: Fix an oops in nfs_netfs_init_request() when copying to the cache.

(5) cachefiles: Parse the "secctx" command immediately to get the correct
error rather than leaving it to the "bind" command.

(6) netfs: Remove a redundant smp_rmb(). This isn't a bug per se and
could be deferred.

(7) netfs: Fix missing barriers by using clear_and_wake_up_bit().

(8) netfs: Work around recursion in read retry by failing and abandoning
the retried subrequest if no I/O is performed.

[!] NOTE: This only works around the recursion problem if the
recursion keeps returning no data. If the server manages, say, to
repeatedly return a single byte of data faster than the retry
algorithm can complete, it will still recurse and the stack
overrun may still occur. Actually fixing this requires quite an
intrusive change which will hopefully make the next merge window.

(9) netfs: Fix the clearance of a folio_queue when unlocking the page if
we're going to want to subsequently send the queue for copying to the
cache (if, for example, we're using ceph).

(10) netfs: Fix the lack of cancellation of copy-to-cache when the cache
for a file is temporarily disabled (for example when a DIO write is
done to the file). This patch and (9) fix hangs with ceph.

With these patches, I can run xfstest -g quick to completion on ceph with a
local cache.

The patches can also be found here with a bonus cifs patch:

https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=netfs-fixes

* patches from https://lore.kernel.org/r/20241213135013.2964079-1-dhowells@redhat.com:
netfs: Fix is-caching check in read-retry
netfs: Fix the (non-)cancellation of copy when cache is temporarily disabled
netfs: Fix ceph copy to cache on write-begin
netfs: Work around recursion by abandoning retry if nothing read
netfs: Fix missing barriers by using clear_and_wake_up_bit()
netfs: Remove redundant use of smp_rmb()
cachefiles: Parse the "secctx" immediately
nfs: Fix oops in nfs_netfs_init_request() when copying to cache
netfs: Fix enomem handling in buffered reads
netfs: Fix non-contiguous donation between completed reads
kheaders: Ignore silly-rename files

Link: https://lore.kernel.org/r/20241213135013.2964079-1-dhowells@redhat.com
Signed-off-by: Christian Brauner <brauner@kernel.org>

show more ...


# d4e338de 16-Dec-2024 David Howells <dhowells@redhat.com>

netfs: Fix is-caching check in read-retry

netfs: Fix is-caching check in read-retry

The read-retry code checks the NETFS_RREQ_COPY_TO_CACHE flag to determine
if there might be failed reads from the

netfs: Fix is-caching check in read-retry

netfs: Fix is-caching check in read-retry

The read-retry code checks the NETFS_RREQ_COPY_TO_CACHE flag to determine
if there might be failed reads from the cache that need turning into reads
from the server, with the intention of skipping the complicated part if it
can. The code that set the flag, however, got lost during the read-side
rewrite.

Fix the check to see if the cache_resources are valid instead. The flag
can then be removed.

Fixes: ee4cdf7ba857 ("netfs: Speed up buffered reading")
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/3752048.1734381285@warthog.procyon.org.uk
cc: Jeff Layton <jlayton@kernel.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>

show more ...


Revision tags: v6.13-rc3
# 4acb665c 13-Dec-2024 David Howells <dhowells@redhat.com>

netfs: Work around recursion by abandoning retry if nothing read

syzkaller reported recursion with a loop of three calls (netfs_rreq_assess,
netfs_retry_reads and netfs_rreq_terminated) hitting the

netfs: Work around recursion by abandoning retry if nothing read

syzkaller reported recursion with a loop of three calls (netfs_rreq_assess,
netfs_retry_reads and netfs_rreq_terminated) hitting the limit of the stack
during an unbuffered or direct I/O read.

There are a number of issues:

(1) There is no limit on the number of retries.

(2) A subrequest is supposed to be abandoned if it does not transfer
anything (NETFS_SREQ_NO_PROGRESS), but that isn't checked under all
circumstances.

(3) The actual root cause, which is this:

if (atomic_dec_and_test(&rreq->nr_outstanding))
netfs_rreq_terminated(rreq, ...);

When we do a retry, we bump the rreq->nr_outstanding counter to
prevent the final cleanup phase running before we've finished
dispatching the retries. The problem is if we hit 0, we have to do
the cleanup phase - but we're in the cleanup phase and end up
repeating the retry cycle, hence the recursion.

Work around the problem by limiting the number of retries. This is based
on Lizhi Xu's patch[1], and makes the following changes:

(1) Replace NETFS_SREQ_NO_PROGRESS with NETFS_SREQ_MADE_PROGRESS and make
the filesystem set it if it managed to read or write at least one byte
of data. Clear this bit before issuing a subrequest.

(2) Add a ->retry_count member to the subrequest and increment it any time
we do a retry.

(3) Remove the NETFS_SREQ_RETRYING flag as it is superfluous with
->retry_count. If the latter is non-zero, we're doing a retry.

(4) Abandon a subrequest if retry_count is non-zero and we made no
progress.

(5) Use ->retry_count in both the write-side and the read-size.

[?] Question: Should I set a hard limit on retry_count in both read and
write? Say it hits 50, we always abandon it. The problem is that
these changes only mitigate the issue. As long as it made at least one
byte of progress, the recursion is still an issue. This patch
mitigates the problem, but does not fix the underlying cause. I have
patches that will do that, but it's an intrusive fix that's currently
pending for the next merge window.

The oops generated by KASAN looks something like:

BUG: TASK stack guard page was hit at ffffc9000482ff48 (stack is ffffc90004830000..ffffc90004838000)
Oops: stack guard page: 0000 [#1] PREEMPT SMP KASAN NOPTI
...
RIP: 0010:mark_lock+0x25/0xc60 kernel/locking/lockdep.c:4686
...
mark_usage kernel/locking/lockdep.c:4646 [inline]
__lock_acquire+0x906/0x3ce0 kernel/locking/lockdep.c:5156
lock_acquire.part.0+0x11b/0x380 kernel/locking/lockdep.c:5825
local_lock_acquire include/linux/local_lock_internal.h:29 [inline]
___slab_alloc+0x123/0x1880 mm/slub.c:3695
__slab_alloc.constprop.0+0x56/0xb0 mm/slub.c:3908
__slab_alloc_node mm/slub.c:3961 [inline]
slab_alloc_node mm/slub.c:4122 [inline]
kmem_cache_alloc_noprof+0x2a7/0x2f0 mm/slub.c:4141
radix_tree_node_alloc.constprop.0+0x1e8/0x350 lib/radix-tree.c:253
idr_get_free+0x528/0xa40 lib/radix-tree.c:1506
idr_alloc_u32+0x191/0x2f0 lib/idr.c:46
idr_alloc+0xc1/0x130 lib/idr.c:87
p9_tag_alloc+0x394/0x870 net/9p/client.c:321
p9_client_prepare_req+0x19f/0x4d0 net/9p/client.c:644
p9_client_zc_rpc.constprop.0+0x105/0x880 net/9p/client.c:793
p9_client_read_once+0x443/0x820 net/9p/client.c:1570
p9_client_read+0x13f/0x1b0 net/9p/client.c:1534
v9fs_issue_read+0x115/0x310 fs/9p/vfs_addr.c:74
netfs_retry_read_subrequests fs/netfs/read_retry.c:60 [inline]
netfs_retry_reads+0x153a/0x1d00 fs/netfs/read_retry.c:232
netfs_rreq_assess+0x5d3/0x870 fs/netfs/read_collect.c:371
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:407
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5d3/0x870 fs/netfs/read_collect.c:371
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:407
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5d3/0x870 fs/netfs/read_collect.c:371
...
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:407
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5d3/0x870 fs/netfs/read_collect.c:371
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:407
netfs_retry_reads+0x155e/0x1d00 fs/netfs/read_retry.c:235
netfs_rreq_assess+0x5d3/0x870 fs/netfs/read_collect.c:371
netfs_rreq_terminated+0xe5/0x110 fs/netfs/read_collect.c:407
netfs_dispatch_unbuffered_reads fs/netfs/direct_read.c:103 [inline]
netfs_unbuffered_read fs/netfs/direct_read.c:127 [inline]
netfs_unbuffered_read_iter_locked+0x12f6/0x19b0 fs/netfs/direct_read.c:221
netfs_unbuffered_read_iter+0xc5/0x100 fs/netfs/direct_read.c:256
v9fs_file_read_iter+0xbf/0x100 fs/9p/vfs_file.c:361
do_iter_readv_writev+0x614/0x7f0 fs/read_write.c:832
vfs_readv+0x4cf/0x890 fs/read_write.c:1025
do_preadv fs/read_write.c:1142 [inline]
__do_sys_preadv fs/read_write.c:1192 [inline]
__se_sys_preadv fs/read_write.c:1187 [inline]
__x64_sys_preadv+0x22d/0x310 fs/read_write.c:1187
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xcd/0x250 arch/x86/entry/common.c:83

Fixes: ee4cdf7ba857 ("netfs: Speed up buffered reading")
Closes: https://syzkaller.appspot.com/bug?extid=1fc6f64c40a9d143cfb6
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/20241108034020.3695718-1-lizhi.xu@windriver.com/ [1]
Link: https://lore.kernel.org/r/20241213135013.2964079-9-dhowells@redhat.com
Tested-by: syzbot+885c03ad650731743489@syzkaller.appspotmail.com
Suggested-by: Lizhi Xu <lizhi.xu@windriver.com>
cc: Dominique Martinet <asmadeus@codewreck.org>
cc: Jeff Layton <jlayton@kernel.org>
cc: v9fs@lists.linux.dev
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Reported-by: syzbot+885c03ad650731743489@syzkaller.appspotmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>

show more ...


# 6d4a0f4e 17-Dec-2024 Dmitry Torokhov <dmitry.torokhov@gmail.com>

Merge tag 'v6.13-rc3' into next

Sync up with the mainline.


1234