File polling, event notification, and asynchronous IO

From Helpful
Jump to navigation Jump to search

Some fragmented programming-related notes, not meant as introduction or tutorial

Data: Numbers in computers ·· Computer dates and times ·· Data structures

Wider abstractions: Programming language typology and glossary · Generics and templating ·· Some abstractions around programming · · Computational complexity theory notes · Synchronous, asynchronous · First-class citizen

Syntaxy abstractions: Constness · Memory aliasing · Binding, assignment, and such · Hoisting · Closures · Context manager · Garbage collection

Sharing stuff: Communicated state and calls · Locking, data versioning, concurrency, and larger-scale computing notes ·· Dependency hell

Language specific: Python notes ·· C and C++ notes · Compiling and linking ·· Lua notes

Teams and products: Programming in teams, working on larger systems, keeping code healthy · Benchmarking, performance testing, load testing, stress testing, etc. · Maintainability

Algorithms: Dynamic programming · Sorting · String search · Sequence alignment and diffs

More applied notes: Optimized number crunching · File polling, event notification · Webdev · GUI toolkit notes · StringBuilder

Mechanics of duct taping software together: Automation, remote management, configuration management · Build tool notes · Packaging · Installers


Why would you want this?

Polling means regularly checking the state of something else, to see whether something has changed.


Polling doesn't scale very well, though:

  • regularly checking a lot of things tends to become proportionally slower to evaluate
often less because because the check is slow or hard, and more because the means of communication is.
  • checking more often, for lower latency, tends to become more costly to do (because you do it more often)
  • multiple things polling the same things tends to become more costly to do (because you do it more often)


What's the alternative?

There are a few approaches but many address the issues with two main ingredients:

  • the "polling by multiple things" can be alleviated by putting a single thing in charge
which notifies you (ideally' there are some inbetween forms)
  • the "monitoring lots of things" and "the polling more often for lower latency" problem can be alleviated by making that single thing somehow monitor absolutely the mechanism of changes, instead of interrogating the state and noticing that things have changed
which then notifies you


Notes:

  • We often call these event notification systems
  • Desktop search and sync clients will probably stand to gain the most. Your battery as well.
  • distributed systems will need their own flavours of these

Underlying mechanisms

noticing changes with manual stats

Availability: anything (with a POSIX filesystem interface, or similar)

Idea: frequenly stat() things, and notice when e.g. its ctime/mtime/size has changed.


Pros:

  • simple to implement
  • works almost everywhere - doesn't rely on a kernel/OS-specific feature feature
  • you can't e.g. miss a change by missing an event
  • Possible optimization:
a directory's mtime will change when a file/dir entry is added or removed
so if a stat() of a known directory has the same mtime(), you don't need to stat that directory's files contents
however, directory mtime doesn't change when only file contents are changed (because that's not a directory-altering operation). This is mostly irrelevant for e.g. updatedb, but matters for a bunch of other applications so you would still stat each file
watching a directory tree this way has some further rough edges


Cons:

  • doing this for thousand+ of files, or noticing quickly, means a lot of syscall overhead
more so for network filesystems
  • relies on cacheing of filesystem metadata to be decently fast (and to not be a lot of IO continuously)

POSIX select and POSIX poll

Availability: most *nices (though poll was not in the first POSIX versions, so isn't there on some now-ancient things)


select is older so was more widely present, so you would use it, or at least still implement it as a fallback.

These days poll is everwhere worth noting, so generally preferred over select.

Both can watch one or more file descriptors (sockets, files, etc).


Pros:

  • 'wait for kernel to mention change' means it stays fast in most cases
  • fairly ubiquitous


Cons:

  • Doesn't report much detail, so for some purposees it's still fairly clunky and only part of the solution
  • each select can't watch a lot of file descriptors
select is a fixed-size structure (size defined at kernel-compile time, see FD_SETSIZE)
poll requires you to allocate the array of fd references
  • so not the most efficient to scale to watching many things


See also:

/dev/poll

Availability: Solaris-only

Alternative interface, functionally mostly like *nix poll.

Not implemented on linux because by that time, epoll was nicer anyway.

event ports

Availability: (Solaris, Illumos, versions?(verify))

Event ports are a generic event system.

File Events Notification are basically when you use that for file descriptors(verify)


https://blogs.oracle.com/dap/entry/event_ports_and_performance

https://docs.oracle.com/cd/E36784_01/html/E36874/port-create-3c.html

epoll

Availability: Linux-only.


When compared to poll(), epoll() scales better to watching a larger number file descriptors, because it uses a data structure to avoid iterating over all watched descriptors each time.


Notes

  • present since linux 2.5.something
  • the API is more complex and a little more flexible than poll()'s,
  • up to maybe a thousand, you won't see much difference(verify)
  • changing what you're polling is roughly as expensive as with poll (because kernel),
so if you change often, it may make little difference(verify).


See also:

dnotify

Availability: Linux-only, since around 2.4


Can only watch directories, and uses an open fd for each watch.

Note that as file alterations in a directory also marks the contained directory as changed, this can be a fairly minimal way to report "has something changed in this directory / tree?",


Functionally, you may well want inotify now - it can do much the same.

inotify

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

Availability: Linux-only, kernel ≥2.6.13 and glibc ≥2.4 / 2.5, so since ~2005.


Practical notes:

  • Watches inodes, so things with entries on a filesystem
(...that works with inodes, which isn't all)
(...and not things you can only refer to via file descriptors, such as sockets)
  • does not apply to (mounted) remote filesystems, mostly because remote change does not involve a local syscall
and you probably wouldn't want to watch remote changes this way (arguably at all) -- it would scale badly
  • Recursive watching is not provided by the kernel component,
because it's a complex task for which there is no minimum-latency in-kernel way of doing it.
libraries tend to provide it by building on inotify itself
also e.g. automatically watch newly created directories.
note that you can miss file creations that happened faster than that directory registration got through - an almost-unavoidable race since these are independent syscalls(verify)). If this is important, consider it in your code, or look at the more widely targeted fanotify.
  • inotify's interface is itself a file descriptor, so can be watched with epoll (or poll or select)


For CLI tools, see e.g. #inotifywait.2C_inotifywatch



Applicable limits

Some tuning you may need to do:

  • fs.inotify.max_user_watches
how many filesystem items can be watched (per user?)
may default to as low as 8192, but recentish installs set it vaguely proportionate to your amount of RAM, e.g. 60000
for some purposes (e.g. sync clients when you keep lots of small files) you may need tens of thousands or more
actual use costs ~1KB (of unswappable kernel memory) per watch, so when you actually start watching a million files that means ~1GB of physical RAM
(when you actually watch such a high amount of files, also look at the CPU overhead)
Note that multiple processes watching the same files/dirs is almost free in terms of watches, because you get references to existing watches
  • fs.inotify.max_user_instances
defaults to something like 128
since there are rarely that many individual things watching your files, you don't often need to increase it.
  • fs.inotify.max_queued_events
defaults to something like 16384 (time ~272 bytes each is ~4MB)
how many un-consumed events to keep (per user) before throwing old ones away -- assuming applications using them is paying active attention this you shouldn't need this very high
See about your clients before you increase this, because if the reason is that they are consuming slower than inotify is generating, a larger queue doesn't really help (for very long)


If you want to know which processes are using inotify at all, you can get the number of instances (not watches) per named processes, with something like: (thanks to [1]):

find /proc/*/fd -lname anon_inode:inotify 2>/dev/null |
  cut -d/ -f3 |
  xargs -I '{}' -- ps --no-headers -o '%p %U %c' -p '{}' |
  uniq -c |
  sort -nr


Seeing a watch count, or a list of all watches, is much harder.

The API doesn't seem to provide either, so this takes some work.

People suggest lsof for the count, although it seems this isn't universal (verify)

You can strace a process to look for inotify_add_watch, but this is only directly useful if you restart it.

What things you can ask for

The mask is used both to

ask the kernel to filter out events sent to us,
ask the kernel to check what exactly each event does


A few are request-only:

IN_ONESHOT            only send event once
IN_ONLYDIR            only watch the path if it is a directory
IN_DONT_FOLLOW        don't follow a sym link
IN_EXCL_UNLINK        exclude events on unlinked objects
IN_MASK_ADD           add to the mask of an already existing watch


A few are response-only: (verify)

IN_ISDIR              event occurred against dir


A few are special-case events:

IN_Q_OVERFLOW    event queue overflowed. Lets your app e.g. log the error that it needs to read them faster
                 (and maybe raise fs.inotify.max_queued_events)


Most of them relate to accesses/alterations you may want to look for, so can be used in both the request and the result (verify)

IN_CREATE         File/directory created in watched directory, e.g., open() with O_CREAT,
                  mkdir(), link(), symlink(), bind() on a domain socket

IN_ATTRIB         metadata change, e.g. chmod(), chown(), utimensat(), setxattr(), 
                  link count (link()/unlink()) since Linux 2.6.25

IN_OPEN           file/directory opened
IN_ACCESS         e.g. read(2), execve(2)
IN_MODIFY         e.g. write(), truncate()

IN_CLOSE_WRITE    File opened for writing was close()d
IN_CLOSE_NOWRITE  File or directory not opened for writing was close()d

IN_DELETE        file/directory deleted from watched directory
IN_DELETE_SELF   watched file/dir was deleted   (including 'moved to another filesystem')

IN_MOVE_SELF     watched file/directory was itself moved

IN_UNMOUNT       when filesystem backing watches is umounted

IN_IGNORED       watch is removed (e.g. directly after a delete, unmount, move(verify))

fanotify

Availability: Linux

API that mostly just intercepts open, read, write, and close syscalls (note: not create, delete, move).

Applies to all objects on a filesystem.


When your aim is to get reports of all changes in directory trees (rather than specfic files for changes), this can be more lightweight than inotify(verify).


As an API this also lets you insert code before the real access happens, or e.g. delay or deny access, can be handy for use cases like virus scanning).

kqueue

Availability: BSD, OSX

See also:

FSEvents

Availability: OSX 10.7 (Lion)(verify)

File System Events, a.k.a. fsevents, fseventsd

https://developer.apple.com/library/archive/documentation/Darwin/Conceptual/FSEvents_ProgGuide/UsingtheFSEventsFramework/UsingtheFSEventsFramework.html

Libraries - some of which talk to multiple underlying APIs

fam

Doesn't scale very well (TODO: explain which conditions)

See also:


Gamin

Separate implementation of a subset of FAM.

On Linux it uses inotify or dnotify, on BSD it uses kqueue/kevent.

Doesn't always scale very well (TODO: explain which conditions)


See also:


libevent

Released in 2000

Frontend to select, poll, epoll, kqueue and/or /dev/poll

See also:


libev

Released in 2007

"Full-featured high-performance event loop loosely modelled after libevent", and has emulation layer for libevent, and was apparently meant as a cleaner alternative.

Can use epoll, kqueue, select, poll (no windows)


See also:


libuv

Developed for node.js to abstract non-blocking IO, now used more widely.

Can use epoll, kqueue, IOCP, event ports.


See also:

http://docs.libuv.org/en/v1.x/

libae

Made for redis.

Can use epoll, kqueue, event ports, select


Boost asio

CLI tools

fswatch

Basically unifies:

inotify (Linux),
File System Events (OSX),
kqueue (BSD, OSX),
Solaris/Illumos File Events Notification,
stat()-based scan-and-figgerence
ReadDirectoryChangesW (Windows)


For example:

fswatch -0 /var/log/ | xargs -0 -n 1 ls -l


See also:


inotifywait, inotifywatch

inotifywait

  • wait for a change of the type specified, then
    • exits (default), or
    • prints out (-m / --monitor)

When you want to parse the output, -q is useful to remove the "establishing watches" and such, -e to listen for specific events, and use something like read to separate out the filename and event name fields, for example:

inotifywait -m -q -e modify /var/log/syslog | \
while read -r filename event; do
 ls -l ${filename}
done


inotifywatch - establishes watches with inotify, counts and summarizes events received (for some time or until a signal like Ctrl-C)

useful for filesystem usage statistics

Example: (note that -r will watch only directories, not files, not because it couldn't but because it's easy to run out of watches)

inotifywatch -r /var/log

Then after a Ctrl-C some time later:

total  access  modify  close_write  close_nowrite  open  filename
2088   1553    249     1            142            143   /var/log/
1064   345     273     0            223            223   /var/log/apache2/
96     48      0       0            24             24    /var/log/dist-upgrade/
51     6       31      4            3              7     /var/log/munin/
48     24      0       0            12             12    /var/log/samba/cores/
39     15      0       0            12             12    /var/log/cups/
38     18      0       1            9              10    /var/log/samba/
36     12      0       0            12             12    /var/log/mysql/
34     10      14      0            5              5     /var/log/journal/16e361e7d1b2e06a9a71a09e544c11b1/
32     16      0       0            8              8     /var/log/journal/
24     12      0       0            6              6     /var/log/installer/
14     6       2       0            3              3     /var/log/atop/

watchman

inotify, FSEvents/kqueue for OSX, and windows in theory

https://facebook.github.io/watchman/



Windows

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.
  • Completion Ports (IOCP)
very nice for watching specific files or connections, not so much for directories (verify)


  • C++: FindFirstChangeNotification + ReadDirectoryChangesW
under some well known but complex-to-describe conditions it will fail to send some notifications
so it doesn't scale very well


  • .NET's FileSystemWatcher
seems to be built directly on the above, so equally flaky
...so you may want to use periodic scans to make sure you're up-to-date.


  • USN Change Journals
https://msdn.microsoft.com/en-us/library/aa363798(v=vs.85).aspx
https://msdn.microsoft.com/en-us/library/aa363803(VS.85).aspx


  • file system filter
basically a kernel-level driver, so must be good quality.


See also:

See also