Dealing with many files: Difference between revisions

From Helpful
Jump to navigation Jump to search
mNo edit summary
Line 106: Line 106:
===On stat()===
===On stat()===
<!--
<!--
ls only cares for entry names, which can be had from just readdir()/getdents() calls
: and note that readdir() (POSIX interface) can be slower than the underlying syscall (getdents) for interesting reasons


Commands like {{inlinecode|ls}} (without -l or similar) only needs to list entry names, which it can read in bulk from readdir()/getdents() calls.
: {{comment|(side note: readdir() (POSIX interface) can be slower than the underlying syscall (getdents) for interesting reasons)}}


ls -l does a stat() on each because you ''asked'' for the detail


code that walks a tree entries requires a stat() on each entry to figure out which are directories.
ls -l, or anything else showing more details per file, does a separate stat() on every entry, because you ''asked'' for that detail. So where listing names was ''O(n)'', listing details ''can'' be ''O(n<sup>2</sup>)'' (though due to lookup optimization may be more like
''O(n lg n)''{{verify}}




Also, code that ''walks'' a filesystem tree (looking to recurse into directories)
requires a stat() on each entry to figure out which are directories.




Note that '''ls is a rather unusual test''', and how good a test it is depend on your purposes.


It iterates through a readdir(), then stat()s every item
If you see the filesystem as a database,
then for the purpose keeping ''many'' files
it's actually a relatively dumbly implemented database.
 
 
 
 
ls then iterates through a readdir(), then stat()s every item
* using libc readdir() is less efficient than using getdents() directly
* using libc readdir() is less efficient than using getdents() directly
* so in a naive scan of the directory, the combination of is ''O(n<sup>2</sup>)'', while a filesystem with directory indices is often closer to ''O(n lg n)''
* so in a naive scan of the directory, the combination of is ''O(n<sup>2</sup>)'', while a filesystem with directory indices is often closer to ''O(n lg n)''

Revision as of 16:42, 26 February 2024

📃 These are primarily notes, intended to be a collection of useful fragments, that will probably never be complete in any sense.


(When you want to watch/monitor a large amount of files for change, see File polling and event notification)


The below is more about the practicalities of keeping many files around.


Potential issues with many files: partly the filesystem, partly the uses

On permission checks

On stat()

Splitting off directories

Notes on fragmentation

some filesystem properties