File polling, event notification, and asynchronous IO: Difference between revisions
m (→inotify) |
m (→inotify) |
||
Line 155: | Line 155: | ||
:: (...and not things you can ''only'' refer to via file descriptors, such as sockets) | :: (...and not things you can ''only'' refer to via file descriptors, such as sockets) | ||
* '''does not apply to mounted remote filesystems''', mostly because remote change does not involve a local syscall | * '''does not apply to (mounted) remote filesystems''', mostly because remote change does not involve a local syscall | ||
: and you probably wouldn't want them to (basic scaling issues) | : and you probably wouldn't want them to (basic scaling issues) | ||
Revision as of 18:43, 15 January 2024
Why would you want this?
Polling means regularly checking the state of something else, to see whether something has changed.
The problem with that is that regularly checking a lot of things tends to become proportionally slower, less because because the check is slow or hard, but because the means of communication is.
Also, if multiple things poll for the same things, that tends to make things less efficient yet.
Event notification is asking something to notify you when something you care about has changed.
It also means that even if the thing responsible for noticing isn't the fastest, at least there will only ever be one.
With most of these watching mechanisms, you get results faster, with fewer syscalls and less IO being done,
and in general less overhead.
So this is useful for anything that would regularly poll more than a few files, such as desktop search and sync clients.
Underlying mechanisms
noticing changes with manual stats
Availability: anything (with a POSIX filesystem interface, or similar)
Idea: frequenly stat() things, and notice when e.g. its ctime/mtime/size has changed.
Pros:
- simple to implement
- works almost everywhere - doesn't rely on a kernel/OS-specific feature feature, or limited kernel resources
- you can't e.g. miss a change by missing an event
- Possible optimization:
- a directory's mtime will change when a file/dir entry is added or removed
- so if a stat() of a known directory has the same mtime(), you don't need to stat that directory's files contents
- however, directory mtime doesn't change when only file contents are changed (because that's not a directory-altering operation). This is mostly irrelevant for e.g. updatedb, but matters for a bunch of other applications so you would still stat each file
- watching a directory tree this way has some further rough edges
Cons:
- doing this for thousand+ of files, or noticing quickly, means a lot of syscall overhead
- more so for network filesystems
- relies on cacheing of filesystem metadata to be decently fast (and to not be a lot of IO continuously)
POSIX select and POSIX poll
Availability: most *nices (though poll was not in the first POSIX versions, so isn't there on some now-ancient things)
select is older so was more widely present, so you would use it, or at least still implement it as a fallback.
These days poll is everwhere worth noting, so generally preferred over select.
Both can watch one or more file descriptors (sockets, files, etc).
Pros:
- 'wait for kernel to mention change' means it stays fast in most cases
- fairly ubiquitous
Cons:
- Doesn't report much detail, so for some purposees it's still fairly clunky and only part of the solution
- each select can't watch a lot of file descriptors
- select is a fixed-size structure (size defined at kernel-compile time, see FD_SETSIZE)
- poll requires you to allocate the array of fd references
- so not the most efficient to scale to watching many things
See also:
/dev/poll
Availability: Solaris-only
Alternative interface, functionally mostly like *nix poll.
Not implemented on linux because by that time, epoll was nicer anyway.
event ports
Availability: (Solaris, Illumos, versions?(verify))
Event ports are a generic event system.
File Events Notification are basically when you use that for file descriptors(verify)
https://blogs.oracle.com/dap/entry/event_ports_and_performance
https://docs.oracle.com/cd/E36784_01/html/E36874/port-create-3c.html
epoll
Availability: Linux-only.
epoll is a variant of poll (present since linux 2.5.something)
that scales better to watching a larger number file descriptors,
because it uses a data structure to avoid iterating over all watched descriptors each time.
the API is more complex and a little more flexible than poll()'s
Notes
- up to maybe a thousand, you won't see much difference(verify)
- changing what you're polling is roughly as expensive as with poll (because kernel),
- so if you change often, it may make little difference(verify).
See also:
dnotify
Availability: Linux-only, since around 2.4
Can only watch directories, and uses an open fd for each watch.
Note that as file alterations in a directory also marks the contained directory as changed, this can be a fairly minimal way to report "has something changed in this directory / tree?",
Functionally, you may well want inotify now - it can do much the same.
inotify
Availability: Linux-only, kernel ≥2.6.13 and glibc ≥2.4 / 2.5, so since ~2005.
Practical notes:
- Watches inodes, so things with entries on a filesystem
- (...that works with inodes, which isn't all)
- (...and not things you can only refer to via file descriptors, such as sockets)
- does not apply to (mounted) remote filesystems, mostly because remote change does not involve a local syscall
- and you probably wouldn't want them to (basic scaling issues)
- Recursive watching is not provided by the kernel component,
- because it's a complex task for which there is no minimum-latency in-kernel way of doing it.
- libraries tend to provide it by building on inotify itself
- also e.g. automatically watch newly created directories.
- inotify's interface is itself a file descriptor, so can be watched with epoll (or poll or select)
For CLI tools, see e.g. #inotifywait.2C_inotifywatch
Applicable limits
Some tuning you may need to do:
- fs.inotify.max_user_watches
- may default to as low as 8192, but probably a few multiples higher
- for some purposes (e.g. sync clients when you keep lots of small files) you may need tens of thousands or more
- actual use costs ~1KB of unswappable kernel memory per watch, so if you actually watch 1 million files that means 1GB of physical RAM
- (when you actually watch such a high amount of files, also look at the CPU overhead)
- Note that multiple processes watching the same files/dirs is free in terms of watches, because you get references to existing watches
- fs.inotify.max_user_instances
- defaults to something like 128
- since there are rarely that many individual things watching your files, you don't often need to increase it.
- fs.inotify.max_queued_events
- defaults to something like 16K (272 bytes each, meaning ~4MB)
- a full queue means dropped events(verify)
- See about your clients before you increase this, because if the reason is that they are consuming slower than inotify is generating, a larger queue doesn't really help (for very long)
You can check that you are at your inotify (watch or instance?) limit with tail -f somefile because it defaults to use inotify to look for changes, and will tell give you a warning if it can't.
If you want to know which processes are using inotify at all, you can get the number of instances (not watches) per named processes, with something like: (thanks to [1]):
find /proc/*/fd -lname anon_inode:inotify 2>/dev/null | cut -d/ -f3 | xargs -I '{}' -- ps --no-headers -o '%p %U %c' -p '{}' | uniq -c | sort -nr
Seeing a watch count, or a list of all watches, is much harder.
The API doesn't seem to provide either, so this takes some work.
People suggest lsof for the count, although it seems this isn't universal (verify)
You can strace a process to look for inotify_add_watch, but this is only directly useful if you restart it.
Stuff you can watch
The mask is used both to ask the kernel to filter events sent to us,
and to check what exactly each event does.
A few are request-only:
IN_ONESHOT only send event once IN_ONLYDIR only watch the path if it is a directory IN_DONT_FOLLOW don't follow a sym link IN_EXCL_UNLINK exclude events on unlinked objects IN_MASK_ADD add to the mask of an already existing watch
A few are response-only: (verify)
IN_ISDIR event occurred against dir
A few are special-case events:
IN_Q_OVERFLOW event queue overflowed. You want to read them faster (and maybe raise fs.inotify.max_queued_events)
Most of them relate to accesses/alterations you may want to look for, so can be used in both the request and the result (verify)
IN_CREATE File/directory created in watched directory, e.g., open() with O_CREAT, mkdir(), link(), symlink(), bind() on a domain socket IN_ATTRIB metadata change, e.g. chmod(), chown(), utimensat(), setxattr(), link count (link()/unlink()) since Linux 2.6.25 IN_OPEN file/directory opened IN_ACCESS e.g. read(2), execve(2) IN_MODIFY e.g. write(), truncate() IN_CLOSE_WRITE File opened for writing was close()d IN_CLOSE_NOWRITE File or directory not opened for writing was close()d IN_DELETE file/directory deleted from watched directory IN_DELETE_SELF watched file/dir was deleted (including 'moved to another filesystem') IN_MOVE_SELF watched file/directory was itself moved IN_UNMOUNT when filesystem backing watches is umounted IN_IGNORED watch is removed (e.g. directly after a delete, unmount, move(verify))
fanotify
Availability: Linux
API that mostly just intercepts open, read, write, and close syscalls (note: not create, delete, move).
Applies to all objects on a filesystem.
When your aim is to get reports of all changes in directory trees (rather than specfic files for changes),
this can be more lightweight than inotify(verify).
As an API this also lets you insert code before the real access happens, or e.g. delay or deny access, can be handy for use cases like virus scanning).
kqueue
Availability: BSD, OSX
See also:
FSEvents
Availability: OSX 10.7 (Lion)(verify)
File System Events, a.k.a. fsevents, fseventsd
Libraries
fam
Doesn't scale very well (TODO: explain which conditions)
See also:
Gamin
Separate implementation of a subset of FAM.
On Linux it uses inotify or dnotify, on BSD it uses kqueue/kevent.
Doesn't always scale very well (TODO: explain which conditions)
See also:
libevent
Released in 2000
Frontend to select, poll, epoll, kqueue and/or /dev/poll
See also:
libev
Released in 2007
"Full-featured high-performance event loop loosely modelled after libevent", and has emulation layer for libevent, and was apparently meant as a cleaner alternative.
Can use epoll, kqueue, select, poll (no windows)
See also:
libuv
Developed for node.js to abstract non-blocking IO, now used more widely.
Can use epoll, kqueue, IOCP, event ports.
See also:
http://docs.libuv.org/en/v1.x/
libae
Made for redis.
Can use epoll, kqueue, event ports, select
Boost asio
CLI tools
fswatch
Basically unifies:
- inotify (Linux),
- File System Events (OSX),
- kqueue (BSD, OSX),
- Solaris/Illumos File Events Notification,
- stat()-based scan-and-figgerence
- ReadDirectoryChangesW (Windows)
For example:
fswatch -0 /var/log/ | xargs -0 -n 1 ls -l
See also:
inotifywait, inotifywatch
inotifywait
- wait for a change of the type specified, then
- exits (default), or
- prints out (-m / --monitor)
When you want to parse the output, -q is useful to remove the "establishing watches" and such, -e to listen for specific events, and use something like read to separate out the filename and event name fields, for example:
inotifywait -m -q -e modify /var/log/syslog | \ while read -r filename event; do ls -l ${filename} done
inotifywatch - establishes watches with inotify, counts and summarizes events received (for some time or until a signal like Ctrl-C)
- useful for filesystem usage statistics
Example: (note that -r will watch only directories, not files, not because it couldn't but because it's easy to run out of watches)
inotifywatch inotifywatch -r /var/log
Then after a Ctrl-C some time later:
total access modify close_write close_nowrite open filename 2088 1553 249 1 142 143 /var/log/ 1064 345 273 0 223 223 /var/log/apache2/ 96 48 0 0 24 24 /var/log/dist-upgrade/ 51 6 31 4 3 7 /var/log/munin/ 48 24 0 0 12 12 /var/log/samba/cores/ 39 15 0 0 12 12 /var/log/cups/ 38 18 0 1 9 10 /var/log/samba/ 36 12 0 0 12 12 /var/log/mysql/ 34 10 14 0 5 5 /var/log/journal/16e361e7d1b2e06a9a71a09e544c11b1/ 32 16 0 0 8 8 /var/log/journal/ 24 12 0 0 6 6 /var/log/installer/ 14 6 2 0 3 3 /var/log/atop/
watchman
inotify, FSEvents/kqueue for OSX, and windows in theory
https://facebook.github.io/watchman/
Windows
- Completion Ports (IOCP)
- very nice for watching specific files or connections, not so much for directories (verify)
- C++: FindFirstChangeNotification + ReadDirectoryChangesW
- under some well known but complex-to-describe conditions it will fail to send some notifications
- so it doesn't scale very well
- .NET's FileSystemWatcher
- seems to be built directly on the above, so equally flaky
- ...so you may want to use periodic scans to make sure you're up-to-date.
- USN Change Journals
- https://msdn.microsoft.com/en-us/library/aa363798(v=vs.85).aspx
- https://msdn.microsoft.com/en-us/library/aa363803(VS.85).aspx
- file system filter
- basically a kernel-level driver, so must be good quality.
See also:
See also