Lines Matching full:that
22 exploration is needed to discover, is that it is complex. There are
23 many rules, special cases, and implementation alternatives that all
26 tool that we will make extensive use of is "divide and conquer". For
41 of elements: "slashes" that are sequences of one or more "``/``"
42 characters, and "components" that are sequences of one or more
43 non-"``/``" characters. These form two kinds of paths. Those that
52 component, but that isn't always accurate: a pathname can lack both
62 it must identify a directory that already exists, otherwise an error
68 pathname that is just slashes have a final component. If it does
75 tempting to consider that to have an empty final component. In many
76 ways that would lead to correct results, but not always. In
81 A pathname that contains at least one non-<slash> character and
82 that ends with one or more trailing <slash> characters shall not
85 directory entry that is to be created for a directory immediately
91 checking that the trailing slash is not used where it isn't
96 changes that affect that lookup. One fairly extreme case is that if
98 "a/b/..", that process might successfully resolve on "a/c".
102 "dcache" and an understanding of that is central to understanding
112 contains further information about the object in that parent with
113 the given name. The inode pointer can be ``NULL`` indicating that the
115 dentry of a directory to the dentries of the children, that linkage is
119 that will be particularly relevant is that it is closely integrated
120 with the mount table that records which filesystem is mounted where.
127 Some filesystems ensure that the information in the dcache is always
130 without checking with the filesystem, and means that the VFS can
134 Other filesystems don't provide that guarantee because they cannot.
135 These are typically filesystems that are shared across a network,
150 you ignore all the places that only run when "``LOOKUP_RCU``"
168 reference count. The special-sauce of this primitive is that the
172 Holding a reference on a dentry ensures that the dentry won't suddenly
199 ``d_lock`` is a synonym for the spinlock that is part of ``d_lockref`` above.
206 each candidate dentry that it finds in the hash table and then checks
207 that the parent and name are correct. So it doesn't lock the parent
222 accessing that slot in a hash table, and searching the linked list
223 that is found there.
228 happened to be looking at a dentry that was moved in this way,
234 ``rename_lock`` is a seqlock that is updated whenever any dentry is
235 renamed. If ``d_lookup`` finds that a rename happened while it
249 ``i_rwsem`` is a read/write semaphore that serializes all changes to a particular
250 directory. This ensures that, for example, an ``unlink()`` and a ``rename()``
252 stable while the filesystem is asked to look up a name that is not
256 This has a complementary role to that of ``d_lock``: ``i_rwsem`` on a
257 directory protects all of the names in that directory, while ``d_lock``
268 falls back to ``lookup_slow()`` which takes a shared lock on ``i_rwsem``, checks again that
275 that the required exclusion can be achieved. How path lookup chooses
280 name that is not yet in the dcache - the shared lock on ``i_rwsem`` will
293 If a matching dentry was found in the primary hash table then that is
294 returned and the caller can know that it lost a race with some other
298 knows that it has won any race and now is responsible for asking the
303 added to the primary hash table already. Note that a ``struct
310 ``DCACHE_PAR_LOOKUP`` to be cleared, using a wait_queue that was passed
311 to the instance of ``d_alloc_parallel()`` that won the race and that
314 has, the dentry is returned and the caller just sees that it lost any
316 likely explanation is that some other dentry was added instead using
325 Per-CPU here means that incrementing the count is cheap as it only
330 ``mnt_count`` doesn't ensure that the mount remains in the namespace and,
332 does, however, ensure that the ``mount`` data structure remains coherent,
344 crossing a mount point to check that the crossing was safe. That is,
345 the value in the seqlock is read, then the code finds the mount that
383 all the way back to `First Edition Unix`_ - of the function that
402 that is the "next" component in the pathname.
414 filesystem. Often that reference won't be needed, so this field is
416 is requested. Keeping a reference in the ``nameidata`` ensures that
420 It should be noted that in the case of ``LOOKUP_IN_ROOT`` or
432 escape that subtree. It works a bit like a local ``chroot()``.
438 Given a path (``name``) and a nameidata structure (``nd``), check that the
440 over one component while updating ``last_type`` and ``last``. If that
448 filesystem to revalidate the result if it is that sort of filesystem.
449 If that doesn't get a good result, it calls "``lookup_slow()``" which
463 seem obvious, but is worth pointing out so that we will recognize its
471 not call ``walk_component()`` that last time. Handling that final
493 implementation of ``lookup_slow()`` which skips that step. This is
494 important when unmounting a filesystem that is inaccessible, such as
506 the possibility that the final component is not ``LAST_NORM``. If the
510 won't try to create that name. They also check for trailing slashes
521 On filesystems that require it, the lookup routines will call the
522 ``->d_revalidate()`` dentry method to ensure that the cached information
524 from a server. In some cases it may find that there has been change
525 further up the path and that something that was thought to be valid
532 lookup a name can trigger changes to how that lookup should be
541 to three different flags that might be set in ``dentry->d_flags``:
546 If this flag has been set, then the filesystem has requested that the
551 unmounted, the ``d_manage()`` function will usually wait for that
558 processing. That server process can identify itself to the ``autofs``
565 This flag is set on every dentry that is mounted on. As Linux
566 supports multiple filesystem namespaces, it is possible that the
584 report that there was an error, that there was nothing to mount, or
590 There is no new locking of import here and it is important that no
604 We noted that REF-walk is complex because there are numerous details
616 thread from changing the data structures that a given thread is
619 same time, this can be very costly. Even when using locks that permit
622 goal when reading a shared data structure that no other process is
632 other parts it is important that RCU-walk can quickly fall back to
639 notices that something has changed or is changing, or if something
644 ``vfsmount`` and ``dentry``, and ensuring that these are still valid -
645 that a path walk with REF-walk would have found the same entries.
646 This is an invariant that RCU-walk must guarantee. It can only make
647 decisions, such as selecting the next step, that are decisions which
654 This pattern of "try RCU-walk, if that fails try REF-walk" can be
662 that fails with the error ``ECHILD`` they are called again with no
665 ``LOOKUP_RCU``) to ensure that entries found in the cache are forcibly
667 determines that they are too old to trust.
669 The ``LOOKUP_RCU`` attempt may drop that flag internally and switch to
671 that trip up RCU-walk are much more likely to be near the leaves and
672 so it is very unlikely that there will be much, if any, benefit from
679 ``rcu_read_lock()`` is held for the entire time that RCU-walk is walking
680 down a path. The particular guarantee it provides is that the key
685 is the only guarantee that RCU provides; everything else is done using
697 To preserve the invariant mentioned above (that RCU-walk may only make
698 decisions that REF-walk could have made), it must make the checks at
699 or near the same places that REF-walk holds the references. So, when
706 However, there is a little bit more to seqlocks than that. If
711 use ``read_seqcount_retry()`` to validate that copy.
714 imposes a memory barrier so that no memory-read instruction from
726 sufficient to catch any problem that could occur at this point.
728 With that little refresher on seqlocks out of the way we can look at
735 ensure that crossing a mount point is performed safely. RCU-walk uses
736 it for that too, but for quite a bit more.
745 that any "mount" or "unmount" happens.
755 If RCU-walk finds that ``mount_lock`` hasn't changed then it can be sure
756 that, had REF-walk taken counted references on each vfsmount, the
778 check if we have landed on a mount point and, if so, must find that
781 starting point of the path lookup was in part of the filesystem that
792 ``lookup_fast()`` is the only lookup routine that is used in RCU-mode,
794 ``lookup_fast()`` that we find the important "hand over hand" tracking
804 getting a counted reference to the new dentry before dropping that for
810 A semaphore is a fairly heavyweight lock that can only be taken when it is
813 take ``i_rwsem`` and modifies the directory in a way that RCU-walk needs
814 to notice, the result will be either that RCU-walk fails to find the
815 dentry that it is looking for, or it will find a dentry which
823 something that actually is there. When RCU-walk fails to find
832 That "dropping down to REF-walk" typically involves a call to
845 Other reasons for dropping out of RCU-walk that do not trigger a call
846 to ``unlazy_walk()`` are when some inconsistency is found that cannot be
853 takes a reference on each of the pointers that it holds (vfsmount,
854 dentry, and possibly some symbolic links) and then verifies that the
860 incrementing a counter. That works to take a second reference if you
869 ``mount_lock`` is then used to validate the reference. If that
870 validation fails, it may *not* be safe to just drop that reference in
873 finds that the reference it got might not be safe, checks the
890 In this case an extra "``MAY_NOT_BLOCK``" flag is passed so that it
914 the big picture, there are a couple of related patterns that are worth
917 The first is "try quickly and check, if that fails try slowly". We
918 can see that in the high-level approach of first trying RCU-walk and
924 The second pattern is "try quickly and check, if that fails try
931 "try quickly *and carefully*, then check". The fact that checking is
932 needed is a reminder that the system is dynamic and only a limited
941 There are several basic issues that we will examine to understand the
951 There are only two sorts of filesystem objects that can usefully
959 a component name refers to a symbolic link, then that component is
960 replaced by the body of the link and, if that body starts with a '/',
997 further limit of eight on the maximum depth of recursion, but that was
1001 The ``nameidata`` structure that we met in an earlier article contains a
1002 small stack that can be used to store the remaining part of up to two
1005 lookup will never exceed that stack as, once the 40th symlink is
1008 It might seem that the name remnants are all that needs to be stored on
1009 this stack, but we need a bit more. To see that, we need to move on to
1018 able to find and temporarily hold onto these cached entries, so that
1030 pathname in a symlink can be seen as the content of that symlink and
1034 that the filesystem will allocate some temporary memory and copy or
1035 construct the symlink content into that memory whenever it is needed.
1039 on the dentry. This means that the mechanisms that pathname lookup
1047 on an inode does not imply any reference on cached pages of that
1048 inode, and even an ``rcu_read_lock()`` is not sufficient to ensure that
1051 significantly, needs to release that reference when it is finished
1056 but that isn't necessarily a big cost and it is better than dropping
1057 out of RCU-walk mode completely. Even filesystems that allocate
1066 RCU-walk mode as the rewrite is not quite complete. It is likely that
1070 looked at previously, ``->follow_link()`` would need to be careful that
1074 code is ready to release the reference when that does happen.
1077 complexity. It requires a reference to the inode so that the
1078 ``i_op->put_link()`` inode operation can be called. In REF-walk, that
1083 we also need the seq number for the dentry so we can confirm that
1087 provides an opaque "cookie" that must be passed to ``->put_link()`` so that it
1099 - the ``cookie`` that tells ``->put_path()`` what to put.
1101 This means that each entry in the symlink stack needs to hold five
1108 Note that, in a given stack frame, the path remnant (``name``) is not
1109 part of the symlink that the other fields refer to. It is the remnant
1110 to be followed once that symlink has been fully parsed.
1118 symlink, or is restored from the stack, so that much of the loop
1126 called; it then gets the link from the filesystem. Providing that
1136 the symlink-just-found to avoid leaving empty path remnants that would
1141 ``walk_component()`` is also the last piece of code that needs to look at the
1142 old symlink as it walks that last component. So it is quite
1164 so ``NULL`` is returned to indicate that the symlink can be released and
1167 The other case involves things in ``/proc`` that look like symlinks but
1174 something that looks like a symlink. It is really a reference to the
1176 objects you get a name that might refer to the same file - unless it
1180 ``nameidata`` in place to point to that target. ``->follow_link()`` then
1191 For some callers, this is all they need; they want to create that
1194 apply special handling to the last component of that symlink, rather
1197 successive symlinks until one is found that doesn't point to another
1201 ``path_lookupat()`` using a loop that calls ``link_path_walk()``, and then
1203 that needs to be followed, then ``trailing_symlink()`` is called to set
1208 The various functions that examine the final component and possibly
1209 report that it is a symlink are ``lookup_last()``, ``mountpoint_last()``
1211 ``walk_component()`` of returning ``1`` if a symlink was found that needs
1235 If that doesn't work, only then is the lookup restarted from the top.
1246 so does ``do_last()`` so that ``trailing_symlink()`` gets called and the
1247 open process continues on the symlink that was found.
1252 We previously said of RCU-walk that it would "take no locks, increment
1253 no counts, leave no footprints." We have since seen that some
1259 footprints in a way that doesn't affect directories is in updating access times.
1266 update the atime on that symlink.
1271 subject. The `clearest statement`_ is that, if a particular implementation
1273 documented "except that any changes caused by pathname resolution need
1274 not be documented". This seems to imply that POSIX doesn't really
1279 An examination of history shows that prior to `Linux 1.3.87`_, the ext2
1281 Unfortunately we have no record of why that behavior was changed.
1283 In any case, access time must now be updated and that operation can be
1288 limits the updates of ``atime`` to once per day on files that aren't
1303 the various flags that can be stored in the ``nameidata`` to guide the
1320 ``LOOKUP_PARENT`` indicates that the final component hasn't been reached
1324 ``LOOKUP_ROOT`` indicates that the ``root`` field in the ``nameidata`` was
1328 ``LOOKUP_JUMPED`` means that the current dentry was chosen not because
1351 ensure that they return errors from ``nd_jump_link()``, because that is how
1355 bind-mounts and ordinary mounts). Note that the ``vfsmount`` which contains the
1377 considered. Others are only checked for when considering that final
1380 ``LOOKUP_AUTOMOUNT`` ensures that, if the final component is an automount
1391 ``WALK_GET`` that we already met, but it is used in a different way.
1393 ``LOOKUP_DIRECTORY`` insists that the final component is a directory.
1401 if it knows that it will be asked to open or create the file soon.
1410 than even a couple of releases ago. But that doesn't mean it is
1412 symlinks that are stored in the inode so, while it handles many ext4
1413 symlinks, it doesn't help with NFS, XFS, or Btrfs. That support