Dealing with many files: Difference between revisions
Jump to navigation
Jump to search
mNo edit summary |
m (→On stat()) |
||
Line 106: | Line 106: | ||
===On stat()=== | ===On stat()=== | ||
<!-- | <!-- | ||
Commands like {{inlinecode|ls}} (without -l or similar) only needs to list entry names, which it can read in bulk from readdir()/getdents() calls. | |||
: {{comment|(side note: readdir() (POSIX interface) can be slower than the underlying syscall (getdents) for interesting reasons)}} | |||
ls -l, or anything else showing more details per file, does a separate stat() on every entry, because you ''asked'' for that detail. So where listing names was ''O(n)'', listing details ''can'' be ''O(n<sup>2</sup>)'' (though due to lookup optimization may be more like | |||
''O(n lg n)''{{verify}} | |||
Also, code that ''walks'' a filesystem tree (looking to recurse into directories) | |||
requires a stat() on each entry to figure out which are directories. | |||
If you see the filesystem as a database, | |||
then for the purpose keeping ''many'' files | |||
it's actually a relatively dumbly implemented database. | |||
ls then iterates through a readdir(), then stat()s every item | |||
* using libc readdir() is less efficient than using getdents() directly | * using libc readdir() is less efficient than using getdents() directly | ||
* so in a naive scan of the directory, the combination of is ''O(n<sup>2</sup>)'', while a filesystem with directory indices is often closer to ''O(n lg n)'' | * so in a naive scan of the directory, the combination of is ''O(n<sup>2</sup>)'', while a filesystem with directory indices is often closer to ''O(n lg n)'' |
Revision as of 16:42, 26 February 2024
📃 These are primarily notes, intended to be a collection of useful fragments, that will probably never be complete in any sense. |
(When you want to watch/monitor a large amount of files for change, see File polling and event notification)
The below is more about the practicalities of keeping many files around.