Python usage notes/joblib

Syntaxish: syntax and language · changes and py2/3 · decorators · importing, modules, packages · iterable stuff · concurrency

IO: networking and web · filesystem

Data: Numpy, scipy · pandas, dask · struct, buffer, array, bytes, memoryview · Python database notes

Image, Visualization: PIL · Matplotlib, pylab · seaborn · bokeh · plotly

Tasky: Concurrency (threads, processes, more) · joblib · pty and pexpect

Stringy: strings, unicode, encodings · regexp · command line argument parsing · XML

date and time

Notebooks

speed, memory, debugging, profiling · Python extensions · semi-sorted

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

Joblib is a way to

serialize jobs, and
execute them on demand
execute them in parallel
offer memoization to avoid double work.

There's also some file-memmapping functionality (mostly share read-only data)
There's also some shared-memory functionality

It has its own way of doing things, but also has a few options that let you bolt it onto existing code.

It's numpy-aware - and should deal okayish with large arrays, compressing them where that's easy.

Memoization

joblib.Memory is disk-backed memoization, which you

could decorate functions with

could wrap in more explicitly to do the occasional checkpoint

could wrap into every Parallel call

Parallel execution

joblib.Parallel uses

threading

lower overhead, but not always faster (consider GIL stuff)

multiprocessing

more overhead, but can be safer.

loky[1]

like multiprocessing, with a few extra nice details

(also capable of threading)

joblib.dump() and joblib.load()) help serialize numpy data (and in general handles more types than plain pickle handles).

This is however not a portable format, because the underlying cloudpickle is only guaranteed to work in the exact same version of python, not between them. (verify)

delayed() is basically a cleanish way to pass in the function and its arguments to Parallel, without accidentally calling it as a function and doing that work in the main interpreter.

For example, the example from [2] is trying to parallelize

[sqrt(i ** 2)  for i in range(10)]

which works out as something like

Parallel( n_jobs=2 )( delayed(sqrt)(i ** 2)  for i in range(10) )

(Note that i**2 is still computed in the main thread(verify)

https://joblib.readthedocs.io/en/latest/

Python usage notes/joblib

Navigation menu