Python notebook notes

From Helpful
Jump to: navigation, search
Syntaxish: syntax and language · changes and py2/3 · decorators · importing, modules, packages · iterable stuff · concurrency

IO: networking and web · filesystem

Data: Numpy, scipy · pandas, dask · struct, buffer, array, bytes, memoryview · Python database notes

Image, Visualization: PIL · Matplotlib, pylab · seaborn · bokeh · plotly


Tasky: Concurrency (threads, processes, more) · joblib · pty and pexpect

Stringy: strings, unicode, encodings · regexp · command line argument parsing · XML

date and time


Notebooks

speed, memory, debugging, profiling · Python extensions · semi-sorted


Ipython

IPython is an interactive shell more capable than python's own, and started the notebook thing that later became jupyter.


It e.g.

  • has better history than py2's shell did (py3's is better that way), completion, and other interactivity
  • introduced notebooks - served via browser, allows embedded code, text, plots, mathematical expressions
  • integrates with some interactive data visualization
  • integrates with GUI toolkits
  • makes it easier to embed an interpreter into your own project
  • hooks in some profiling, via its magic functions:
 %time - how much time (one run)
 %timeit - how much time (in a bunch of runs, at least a second's worth?(verify))
 %prun - how much time, per function
 %lprun - how much time, per line
 %mprun, %memit - how much memory per function (once, a bunch)
  • some tools for parallel computing (due to itself being abstracted out this way)
[1] [2]


See also:

jupyter

Jupyter, are a more backend-agnostic framework/protocol than it already was when it was still called ipython notebooks. (it speaks largely JSON over 0MQ).


Ipython is just one of its possible kernel/backends

see this list for more.
this includes some things that basically just expose an existing CLI (see wrapper kernels)

Jupyter is mostly known carrying on the notebook thing.



jupyter Qt console

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, or tell me)

Most people care about notebooks, so skip to the next section.

The Qt console is a similar idea to notebooks, but is closer to integrating with (Qt) apps, which is sometimes very useful.

jupyter qtconsole

https://qtconsole.readthedocs.io/en/stable/


jupyter notebooks

Python notebooks mean you have a web interface, that is actually speaking to a backend, a.k.a. kernel, of (usually) python.

Compared to a plain CLI, notebooks...

save the interactions you've submitted, in a more document-like way
are more visual than the shell (there are various things rendered as more than text)
easier to work remotely
easier to persist the interpreter behind it (to a degree)


This means less typing and more prettiness while you're doing plotting, math; you can copy the notebooks so more easily write things like tutorials that others can load and use; you can set up more complex code and bootstrap other people to do similar experiments.

For some examples, see e.g. https://github.com/jupyter/jupyter/wiki


There are now also "publish notebook to site", "store/load to code repository", and other additions.



Basic use

The actual notebook gets stored where you run:

jupyter notebook


The only keyboard shortcut you really need to know is ShiftEnter (Run, go to next cell), but you may like to skim over Help → keyboard shortcuts at least once.




Notes:

  • Execution happens as the user that started it. Permissions are also tied to them.


How and where to run it

Running it locally

Basically the default.

It'll launch a browser for you.

By default it binds to 127.0.0.1 (port 8888), meaning no one else can connect to it.


Running it remotely


If you are the only real user on another host (e.g. your own computer, your own server),

perhaps the safest way is to do the above, and use a SSH tunnel like
ssh -L localhost:8888:localhost:8888 workhost
(and pointing your browser at 127.0.0.1:8888 on the SSH-client side).

The integration in some development environments (e.g. VS code's) amounts to the same thing.


On a trusted LAN you can consider running it
--ip=0.0.0.0
so that it's easily reachable.

Some additions that you can't do on basic python

Specially rendered objects
When showing objects, jupyter will try to show things with more than a basic
repr()


Things that do this out of the box

Libraries may have implemented this already

PIL itself implements _repr_png_
pandas itself implements _repr_html_ (and _repr_latex_)
...and so on


matplotlib


Jupyter seems to itself special-case matplotlib objects, though the precise integration varies a little, also because you may actually want to avoid that in some cases.


https://ipython.readthedocs.io/en/stable/interactive/plotting.html#id1

Things you can get with minimal suggestion

You can force such rendering modes yourself (for HTML, SVG, LaTeX, Markdown, Video) via cell magic like

%%html
<a href="http://example.com">link</a>

OR code-wise, using an existing class from IPython.core.display

from IPython.core.display import HTML
display(HTML('<a href="http://example.com">link</a>'))  # the display() is optional but apparently solves some interactivity


For Markdown, where you would typically set the cell type to markdown (Cell → Cell Type → Markdown)







Rolling your own

If an object has an attribute called _repr_pretty_, _repr_svg_, _repr_png_, _repr_jpeg_, _repr_html_, _repr_javascript_, _repr_markdown_, _repr_latex_, or _repr_mimebundle_, it will be called.

These are still pre-made renderers, in that you mostly just give it data in that format, and it gets rendered by existing IPython code for that type.


Similar for an attribute called _ipython_display_, (verify)


You can create your own wrappers with such a repr.

class NumpyImage(object):
    ''' Takes an array of values assumed to have useful values in 0..255 value range and returns it as a grayscale image. 
        Each cell is represented by an 8-by-8 pixel square (by default) for visibility '''
    def __init__(self, ary, pixelsize=8):
        self.ary = numpy.array(ary) # copy
        self.ary[self.ary<0]   = 0 # clamp (before type conversion)
        self.ary[self.ary>255] = 255
        self.pixelsize = pixelsize
    def _repr_png_(self):
        from PIL import Image
        import io, numpy
        im = Image.fromarray( numpy.uint8(self.ary) )
        if self.pixelsize != 1:
             im = im.resize( (im.size[0]*self.pixelsize, im.size[1]*self.pixelsize), resample=Image.NEAREST)
        by = io.BytesIO()
        im.save(by,'png')
        return by.getvalue()
 
# test:
import numpy
NumpyImage( numpy.random.rand(30,60)*255 )

See also:

Widgets
magic
This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, or tell me)
%%
is cell magic, and applies to the rest of the cell.
basically, "does the first line look like %%thing? Then read the rest of the line as text, and the rest of the cell as text, and hand both to the registered function
%
is line magic, and applies only to the rest of the line.
basically, "does the line look like %thing? Then read the rest of the line as text and hand it to the registered thing
the rest of the cell is evaluated as usual(verify)


if automagic is enabled (which it is by default), you don't need to use the single percent for line magic.

can be toggled via %automagic
note that magic intentionally has lower priority than any registered variable names
so that magic won't block you from being able to run code
which instead means you can manage to mask out certain magic from being run
(I prefer to use % to make it clear what is being invoked)
https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-automagic


Note that magic means that text need not be python code at all (but often is, because it's easy to create "I no longer know how to do this in normal python" situations)


Built-in magic includes


Debugging, profiling

 %time - how much time (one run)
 %timeit - how much time (in a bunch of runs, at least a second's worth?(verify))
 %prun - how much time, per function
-s controls what to sort on
-r return the pstats.Stats object
...and more
 %lprun - how much time, per line
 %mprun, %memit - how much memory per function (once, a bunch)
 %debug - interactive debugger
 %pdb -
 %tb - print last traceback



Input and output

 %pprint - whether to use pprint or repr
 %precision - precision used in pretty pringint


 %config - lets you alter IPython's configuratio
 %env, %set_env - get or set environment (mostly a slighy shorter version of altering os.environ)


System

 %pip
 %conda - run that conda package manager


 ! and !! (%sx)
cf. %system, %sc, %%bash, %%capture, %%script, %%pypy, %%sh


Shell-like things (and note that automagic applies, so you don't need the %) like

 %ls
 %pwd
 %cat
 %more
 %env
 %man
 %mkdir
 %cp
 %mv
 %rm
 %rmdir



IPython environment

 %logon, %logoff, %logstart - logging
 %recall, %rep -
 %reset


 %gui - GUI event loop integration stuff for Qt, gtk, wxPython, tk, and cocoa
e.g. used for matplotlib windows if you use an interactive backend


 %pastebin
specifier of what to save (defaults to 'everything so far), can be
input history range
filename
name of string/macro
-d for a description
-e argument for the timeout in days (defaults to 7)


https://ipython.readthedocs.io/en/stable/interactive/magics.html


custom magic

You can register your own named magic functions

https://ipython.readthedocs.io/en/stable/config/custommagics.html


Keep in mind that a kernel represents its own state - and can be restarted. Magic would not stay registered - but it's transparent because it's usually registered as part of library import (conditional on whether that library sees IPython).

The perhaps-cleaner, inversion of control way is to have a function called load_ipython_extension() in your extension. With one argument, the current InteractiveShell.



testing magic

https://pmbaumgartner.github.io/blog/testing-ipython-magics/

Shell access

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, or tell me)
⚠ this is not portable between OSes

You can run shell commands like:

!ls

and get their output, like

cwd = !pwd
files = !ls

...which are returned as a IPython.utils.text.SList which does whitespace splitting, and prints and acts like a list.

...but is actually an object that also has some convenience values/functions:

  • .l or .list : value as list (the list itself).
  • .n or .nlstr: basically '\n'.join(l) ? (verify)
  • .s or .spstr: basically ' '.join(l) ? (verify) - so is the closest thing to the raw output(verify)
  • .p or .paths: basically list(path.Path(v) for v in l) ? (verify)
  • .grep(): returns .l grepped with a regex or callable
  • .fields(): basically lets you return a ragged array - consier:
>>> cwd = !ls -l
>>> cwd.fields()
['total', '52'],
['drwxr-xr-x', '4', 'me', 'us', '4', 'Aug', '10', '17:38', 'file1'],
['drwxr-xr-x', '2', 'me', 'us', '3', 'Jun', '10', '20:29', 'file2'],


You can hand in variables like:

pyvar = '..'
!ls {pyvar}

or

!echo ${pyvar}


Notes:

  • Somewhat confusingly, there are a few commands that seem to not need one (ls, cd, and more) --- because they are instead magic, and specifically automagic, meaning they don't need a %
you might prefer these via ! if you want to capture their output.
  • This is more a convenience than a real API
it doesn't seem like you can get the raw output.
if you want more control, you probably want to use subprocess anyway
  • Don't expect interactive shell commands to work.
  • Something that doesn't return without interaction will block the kernel.
  • e.g. google colab lets you do this too (it's isolated to VMs anyway), which means it's actually quite flexible in what you can install

Security

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, or tell me)


Notebooks allow

  • arbitrary code execution - whatever the kernel allows
  • which includes shell access, as the starting user
  • fairly arbitrary JS within the browser that's the client


As such

  • you probably want to use auth
  • never run notebooks you got from someone you don't trust
jupyter and others may refuse to render HTML or JS in an existing notebook unless
you execute/regenerate it yourself
you started the notebook with 'trust'
...but those are easily done if you're just clicking things until they work.


See also:

In editors

Various editors can use notebooks, including:

Hydrogen

Jupyter wrapped in the Atom editor

https://atom.io/packages/hydrogen


Visual Studio Code

Can embed them, with an extension - that basically comes with the Python extension itself.

Keep in mind that Visual Studio has its own workspace trust concept, and if its restricted mode applies you can't execute anything and won't see much.


https://code.visualstudio.com/docs/datascience/jupyter-notebooks


"Install kernels for name.ipynb"

My VS code got confused about whether python extension was installed or not.
Uninstall and reinstall fixed that.

Error loading preloads: Could not find renderer

after the above? do a Reload Window.

Multi-user, hosting, etc.

JupyterHub

A login service (e.g. PAM, OAuth) around keeping track of notebooks. No other changes - the notebooks are still single-user things as before.


It means only one person has to figure out install, so it's a low-threshold thing for things like

classrooms (there are also homework/grading extensions)
workshops (see what everyone's doing)
academia / teams (share what everyone's doing)

...though there's no overly easy way of sharing?(verify)

See also:


JupyterLab

Can be seen as an online-hosted notebook thing that is more of an IDE and more convenience.

https://jupyterlab.readthedocs.io/en/stable/



google colab

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, or tell me)

A project derived from jupyter.

  • No setup
  • free access, though with resource limits
  • allows GPU and TPU (also resource limits)
  • easier to share than your own jupyter notebooks.
  • easier access to things like google drive, google cloud storage [3]


Note that free colab has a number of limits:

  • resources may be prioritized for interactive users and lower-resource users, rather than long-running things
because it's intended for expertimenting. If people used it as a bulk-compute platform, it could not be free.
some types of bulk comput are specifically disallowed (e.g. hosting, cryptocurrency mining, (even if paid?))
  • notebooks run on VMs, and these VMs will shut down after idle time and/or a maximum lifetime of something like 12 hours(verify)
  • no guarantee that you access a GPU (except with Pro)
due to the above, you may have some cooldown if you do things like beefy training
  • no guarantee to the amount of memory the VM you get has (except with Pro)


Google seems to hope you like the setup, tie yourself to its setup, and want to continue your project with Colab Pro/Pro+.


Note that if you like colab's setup but like to use your own compute resources, you can have it connect to a jupyter kernel on your host - see local runtimes

WARNING: that's arbitrary code execution that you allowed and a security risk do not run code you do not trust


nbviewer

Seems static hosting for a rendered notebook. (verify)

http://nbviewer.org/

mybinder

https://mybinder.org/


JupyterLite