Python notebook notes
Syntaxish: syntax and language · changes and py2/3 · decorators · importing, modules, packages · iterable stuff · concurrency
IO: networking and web · filesystem Data: Numpy, scipy · pandas, dask · struct, buffer, array, bytes, memoryview · Python database notes Image, Visualization: PIL · Matplotlib, pylab · seaborn · bokeh · plotly
Stringy: strings, unicode, encodings · regexp · command line argument parsing · XML
speed, memory, debugging, profiling · Python extensions · semi-sorted |
Contents
Ipython
IPython is an interactive shell more capable than python's own, and started the notebook thing that later became jupyter.
It e.g.
- has better history than py2's shell did (py3's is better that way), completion, and other interactivity
- introduced notebooks - served via browser, allows embedded code, text, plots, mathematical expressions
- integrates with some interactive data visualization
- integrates with GUI toolkits
- makes it easier to embed an interpreter into your own project
- hooks in some profiling, via its magic functions:
- %time - how much time (one run)
- %timeit - how much time (in a bunch of runs, at least a second's worth?(verify))
- %prun - how much time, per function
- %lprun - how much time, per line
- %mprun, %memit - how much memory per function (once, a bunch)
- some tools for parallel computing (due to itself being abstracted out this way)
See also:
- http://ipython.org/
- http://en.wikipedia.org/wiki/IPython
- http://pynash.org/2013/03/06/timing-and-profiling
jupyter
Jupyter, are a more backend-agnostic framework/protocol than it already was when it was still called ipython notebooks. (it speaks largely JSON over 0MQ).
Ipython is just one of its possible kernel/backends
- see this list for more.
- this includes some things that basically just expose an existing CLI (see wrapper kernels)
Jupyter is mostly known carrying on the notebook thing.
jupyter Qt console
This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, or tell me) |
Most people care about notebooks, so skip to the next section.
The Qt console is a similar idea to notebooks, but is closer to integrating with (Qt) apps, which is sometimes very useful.
jupyter qtconsole
https://qtconsole.readthedocs.io/en/stable/
jupyter notebooks
Python notebooks mean you have a web interface, that is actually speaking to a backend, a.k.a. kernel, of (usually) python.
Compared to a plain CLI, notebooks...
- save the interactions you've submitted, in a more document-like way
- are more visual than the shell (there are various things rendered as more than text)
- easier to work remotely
- easier to persist the interpreter behind it (to a degree)
This means
less typing and more prettiness while you're doing plotting, math;
you can copy the notebooks so more easily write things like tutorials that others can load and use;
you can set up more complex code and bootstrap other people to do similar experiments.
For some examples, see e.g. https://github.com/jupyter/jupyter/wiki
There are now also "publish notebook to site", "store/load to code repository",
and other additions.
Basic use
The actual notebook gets stored where you run:
jupyter notebook
The only keyboard shortcut you really need to know is ShiftEnter (Run, go to next cell), but you may like to skim over Help → keyboard shortcuts at least once.
Notes:
- Execution happens as the user that started it. Permissions are also tied to them.
How and where to run it
Running it locally
Basically the default.
It'll launch a browser for you.
By default it binds to 127.0.0.1 (port 8888), meaning no one else can connect to it.
Running it remotely
If you are the only real user on another host (e.g. your own computer, your own server),
The integration in some development environments (e.g. VS code's) amounts to the same thing.
Some additions that you can't do on basic python
Specially rendered objects
When showing objects, jupyter will try to show things with more than a basic
Things that do this out of the box
Libraries may have implemented this already
- PIL itself implements _repr_png_
- pandas itself implements _repr_html_ (and _repr_latex_)
- ...and so on
matplotlib
Jupyter seems to itself special-case matplotlib objects, though the precise integration varies a little,
also because you may actually want to avoid that in some cases.
https://ipython.readthedocs.io/en/stable/interactive/plotting.html#id1
Things you can get with minimal suggestion
You can force such rendering modes yourself (for HTML, SVG, LaTeX, Markdown, Video) via cell magic like
%%html
<a href="http://example.com">link</a>
OR code-wise, using an existing class from IPython.core.display
from IPython.core.display import HTML display(HTML('<a href="http://example.com">link</a>')) # the display() is optional but apparently solves some interactivity
For Markdown, where you would typically set the cell type to markdown (Cell → Cell Type → Markdown)
Rolling your own
If an object has an attribute called _repr_pretty_, _repr_svg_, _repr_png_, _repr_jpeg_, _repr_html_, _repr_javascript_, _repr_markdown_, _repr_latex_, or _repr_mimebundle_, it will be called.
These are still pre-made renderers, in that you mostly just give it data in that format, and it gets rendered by existing IPython code for that type.
Similar for an attribute called _ipython_display_, (verify)
You can create your own wrappers with such a repr.
class NumpyImage(object): ''' Takes an array of values assumed to have useful values in 0..255 value range and returns it as a grayscale image. Each cell is represented by an 8-by-8 pixel square (by default) for visibility ''' def __init__(self, ary, pixelsize=8): self.ary = numpy.array(ary) # copy self.ary[self.ary<0] = 0 # clamp (before type conversion) self.ary[self.ary>255] = 255 self.pixelsize = pixelsize def _repr_png_(self): from PIL import Image import io, numpy im = Image.fromarray( numpy.uint8(self.ary) ) if self.pixelsize != 1: im = im.resize( (im.size[0]*self.pixelsize, im.size[1]*self.pixelsize), resample=Image.NEAREST) by = io.BytesIO() im.save(by,'png') return by.getvalue() # test: import numpy NumpyImage( numpy.random.rand(30,60)*255 )
See also:
Widgets
magic
This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, or tell me) |
- basically, "does the first line look like %%thing? Then read the rest of the line as text, and the rest of the cell as text, and hand both to the registered function
- basically, "does the line look like %thing? Then read the rest of the line as text and hand it to the registered thing
- the rest of the cell is evaluated as usual(verify)
if automagic is enabled (which it is by default), you don't need to use the single percent for line magic.
- can be toggled via %automagic
- note that magic intentionally has lower priority than any registered variable names
- so that magic won't block you from being able to run code
- which instead means you can manage to mask out certain magic from being run
- (I prefer to use % to make it clear what is being invoked)
- https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-automagic
Note that magic means that text need not be python code at all (but often is, because it's easy to create "I no longer know how to do this in normal python" situations)
Built-in magic includes
Debugging, profiling
- %time - how much time (one run)
- %timeit - how much time (in a bunch of runs, at least a second's worth?(verify))
- %prun - how much time, per function
- -s controls what to sort on
- -r return the pstats.Stats object
- ...and more
- %lprun - how much time, per line
- %mprun, %memit - how much memory per function (once, a bunch)
- %debug - interactive debugger
- %pdb -
- %tb - print last traceback
Input and output
- %pprint - whether to use pprint or repr
- %precision - precision used in pretty pringint
- %config - lets you alter IPython's configuratio
- %env, %set_env - get or set environment (mostly a slighy shorter version of altering os.environ)
System
- %pip
- %conda - run that conda package manager
- ! and !! (%sx)
- cf. %system, %sc, %%bash, %%capture, %%script, %%pypy, %%sh
Shell-like things (and note that automagic applies, so you don't need the %) like
- %ls
- %pwd
- %cat
- %more
- %env
- %man
- %mkdir
- %cp
- %mv
- %rm
- %rmdir
IPython environment
- %logon, %logoff, %logstart - logging
- %recall, %rep -
- %reset
- %gui - GUI event loop integration stuff for Qt, gtk, wxPython, tk, and cocoa
- e.g. used for matplotlib windows if you use an interactive backend
- %pastebin
- specifier of what to save (defaults to 'everything so far), can be
- input history range
- filename
- name of string/macro
- -d for a description
- -e argument for the timeout in days (defaults to 7)
- specifier of what to save (defaults to 'everything so far), can be
https://ipython.readthedocs.io/en/stable/interactive/magics.html
custom magic
You can register your own named magic functions
https://ipython.readthedocs.io/en/stable/config/custommagics.html
Keep in mind that a kernel represents its own state - and can be restarted. Magic would not stay registered - but it's transparent because it's usually registered as part of library import (conditional on whether that library sees IPython).
The perhaps-cleaner, inversion of control way is to have a function called load_ipython_extension() in your extension. With one argument, the current InteractiveShell.
testing magic
https://pmbaumgartner.github.io/blog/testing-ipython-magics/
Shell access
This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, or tell me) |
You can run shell commands like:
!ls
and get their output, like
cwd = !pwd files = !ls
...which are returned as a IPython.utils.text.SList which does whitespace splitting, and prints and acts like a list.
...but is actually an object that also has some convenience values/functions:
- .l or .list : value as list (the list itself).
- .n or .nlstr: basically '\n'.join(l) ? (verify)
- .s or .spstr: basically ' '.join(l) ? (verify) - so is the closest thing to the raw output(verify)
- .p or .paths: basically list(path.Path(v) for v in l) ? (verify)
- .grep(): returns .l grepped with a regex or callable
- .fields(): basically lets you return a ragged array - consier:
>>> cwd = !ls -l >>> cwd.fields() ['total', '52'], ['drwxr-xr-x', '4', 'me', 'us', '4', 'Aug', '10', '17:38', 'file1'], ['drwxr-xr-x', '2', 'me', 'us', '3', 'Jun', '10', '20:29', 'file2'],
You can hand in variables like:
pyvar = '..' !ls {pyvar}
or
!echo ${pyvar}
Notes:
- Somewhat confusingly, there are a few commands that seem to not need one (ls, cd, and more) --- because they are instead magic, and specifically automagic, meaning they don't need a %
- you might prefer these via ! if you want to capture their output.
- This is more a convenience than a real API
- it doesn't seem like you can get the raw output.
- if you want more control, you probably want to use subprocess anyway
- Don't expect interactive shell commands to work.
- Something that doesn't return without interaction will block the kernel.
- e.g. google colab lets you do this too (it's isolated to VMs anyway), which means it's actually quite flexible in what you can install
Security
This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, or tell me) |
Notebooks allow
- arbitrary code execution - whatever the kernel allows
- which includes shell access, as the starting user
- fairly arbitrary JS within the browser that's the client
As such
- you probably want to use auth
- never run notebooks you got from someone you don't trust
- jupyter and others may refuse to render HTML or JS in an existing notebook unless
- you execute/regenerate it yourself
- you started the notebook with 'trust'
- ...but those are easily done if you're just clicking things until they work.
See also:
In editors
Various editors can use notebooks, including:
Hydrogen
Jupyter wrapped in the Atom editor
https://atom.io/packages/hydrogen
Visual Studio Code
Can embed them, with an extension - that basically comes with the Python extension itself.
Keep in mind that Visual Studio has its own workspace trust concept, and if its restricted mode applies you can't execute anything and won't see much.
https://code.visualstudio.com/docs/datascience/jupyter-notebooks
"Install kernels for name.ipynb"
- My VS code got confused about whether python extension was installed or not.
- Uninstall and reinstall fixed that.
Error loading preloads: Could not find renderer
- after the above? do a Reload Window.
Multi-user, hosting, etc.
JupyterHub
A login service (e.g. PAM, OAuth) around keeping track of notebooks. No other changes - the notebooks are still single-user things as before.
It means only one person has to figure out install, so it's a low-threshold thing for things like
- classrooms (there are also homework/grading extensions)
- workshops (see what everyone's doing)
- academia / teams (share what everyone's doing)
...though there's no overly easy way of sharing?(verify)
See also:
JupyterLab
Can be seen as an online-hosted notebook thing that is more of an IDE and more convenience.
https://jupyterlab.readthedocs.io/en/stable/
google colab
This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, or tell me) |
A project derived from jupyter.
- No setup
- free access, though with resource limits
- allows GPU and TPU (also resource limits)
- easier to share than your own jupyter notebooks.
- easier access to things like google drive, google cloud storage [3]
Note that free colab has a number of limits:
- resources may be prioritized for interactive users and lower-resource users, rather than long-running things
- because it's intended for expertimenting. If people used it as a bulk-compute platform, it could not be free.
- some types of bulk comput are specifically disallowed (e.g. hosting, cryptocurrency mining, (even if paid?))
- notebooks run on VMs, and these VMs will shut down after idle time and/or a maximum lifetime of something like 12 hours(verify)
- no guarantee that you access a GPU (except with Pro)
- due to the above, you may have some cooldown if you do things like beefy training
- no guarantee to the amount of memory the VM you get has (except with Pro)
Google seems to hope you like the setup, tie yourself to its setup,
and want to continue your project with Colab Pro/Pro+.
Note that if you like colab's setup but like to use your own compute resources, you can have it connect to a jupyter kernel on your host - see local runtimes
- WARNING: that's arbitrary code execution that you allowed and a security risk do not run code you do not trust
nbviewer
Seems static hosting for a rendered notebook. (verify)
mybinder