Python usage notes/joblib
|Syntaxish: syntax and language · importing, modules, packages · iterable stuff · concurrency|
|This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)|
Joblib is a way to serialize jobs, and execute them on demand and in parallel, and offer memoization to avoid double work.
It has a few options that let you bolt it onto existing code, rather than requiring you to write code towards a specific API.
It's numpy-aware - and should deal okayish with large arrays, compressing them where that's easy.
joblib.Memory is disk-backed memoization, which you
- could decorate functions with
- could wrap in more explicitly to do the occasional checkpoint
- could wrap into every Parallel call
- lower overhead, but not always faster (consider GIL stuff)
- more overhead, but can be safer.
- like multiprocessing with a few extra nice details
- (also capable of threading)
delayed() is basically a cleanish way to pass in the function and its arguments to Parallel, without accidentally calling it as a function and doing that work in the main interpreter.
For example, the example from  is trying to parallelize
[sqrt(i ** 2) for i in range(10)]
which works out as something like
Parallel( n_jobs=2 )( delayed(sqrt)(i ** 2) for i in range(10) )
(Note that i**2 is still computed in the main thread(verify)