Python usage notes/joblib
| Syntaxish: syntax and language · importing, modules, packages · iterable stuff · concurrency
|This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)|
joblib.Memory is disk-backed memoization, which you can
- decorate functions with
- wrap in more explicitly to do the occasional checkpoint
- wrap into every Parallel call
It's numpy-aware (and should deal okayish with large arrays).
(threading is often lower overhead when the thing you're calling is a compiled extension anyway and doesn't lock the GIL while doing it, multiprocessing has more overhead but better when the thing you're calling is basically sequential, extension or not)
This is not a portable format, because the underlying cloudpickle is only guaranteed to work in the exact same version of python. (verify) In particularly creating in different major versions is likely to break (ValueError: unsupported pickle protocol: 3 means you created it in py3 and loading it in py2
delayed() is basically a cleaner-looking way to pass in the function and its arguments to Parallel, without accidentally doing the work in the main interpreter.
For example, the example from  is parallelize
[sqrt(i ** 2) for i in range(10)]
which works out as something like
Parallel(n_jobs=2)(delayed(sqrt)(i ** 2) for i in range(10))
(Note that i**2 is still computed in the main thread(verify)