Python usage notes/joblib
|Syntaxish: syntax and language · changes and py2/3 · decorators · importing, modules, packages · iterable stuff · concurrency|
|This article/section is a stub — probably a pile of half-sorted notes and is probably a first version, is not well-checked, so may have incorrect bits. (Feel free to ignore, or tell me)|
Joblib is a way to
- serialize jobs, and
- execute them on demand
- execute them in parallel
- offer memoization to avoid double work.
- There's also some file-memmapping functionality (mostly share read-only data)
- There's also some shared-memory functionality
It has its own way of doing things, but also has a few options that let you bolt it onto existing code.
It's numpy-aware - and should deal okayish with large arrays, compressing them where that's easy.
joblib.Memory is disk-backed memoization, which you
- could decorate functions with
- could wrap in more explicitly to do the occasional checkpoint
- could wrap into every Parallel call
- lower overhead, but not always faster (consider GIL stuff)
- more overhead, but can be safer.
- like multiprocessing, with a few extra nice details
- (also capable of threading)
delayed() is basically a cleanish way to pass in the function and its arguments to Parallel, without accidentally calling it as a function and doing that work in the main interpreter.
For example, the example from  is trying to parallelize
[sqrt(i ** 2) for i in range(10)]
which works out as something like
Parallel( n_jobs=2 )( delayed(sqrt)(i ** 2) for i in range(10) )
(Note that i**2 is still computed in the main thread(verify)