Python notebook notes

From Helpful
Jump to navigation Jump to search

Syntaxish: syntax and language · changes and py2/3 · decorators · importing, modules, packages · iterable stuff · concurrency · exceptions, warnings

IO: networking and web · filesystem

Data: Numpy, scipy · pandas, dask · struct, buffer, array, bytes, memoryview · Python database notes

Image, Visualization: PIL · Matplotlib, pylab · seaborn · bokeh · plotly


Tasky: Concurrency (threads, processes, more) · joblib · pty and pexpect

Stringy: strings, unicode, encodings · regexp · command line argument parsing · XML

date and time


Notebooks

speed, memory, debugging, profiling · Python extensions · semi-sorted


Ipython

IPython is an interactive shell more capable than python's own, and started the notebook thing that later became jupyter.


It e.g.

  • has better history than py2's shell did (py3's is better that way), completion, and other interactivity
  • introduced notebooks - served via browser, allows embedded code, text, plots, mathematical expressions
  • integrates with some interactive data visualization
  • integrates with GUI toolkits
  • makes it easier to embed an interpreter into your own project
  • hooks in some profiling, via its magic functions:
%time - how much time (run once)
%timeit - how much time (in a bunch of runs, at least a second's worth?(verify))
%prun - how much time, per function
%lprun - how much time, per line
%mprun - how much memory per function (run once)
%memit - how much memory per function (in a bunch of runs)
  • some tools for parallel computing (due to itself being abstracted out this way)
[1] [2]


See also:

jupyter

Jupyter, are a more backend-agnostic framework/protocol than it already was when it was still called ipython notebooks. (it speaks largely JSON over 0MQ).


Ipython is just one of its possible kernel/backends

see this list for more.
this includes some things that basically just expose an existing CLI (see wrapper kernels)

Jupyter is mostly known carrying on the notebook thing.



jupyter Qt console

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

Most people care about notebooks, so skip to the next section.

The Qt console is a similar idea to notebooks, but is closer to integrating with (Qt) apps, which is sometimes very useful.

jupyter qtconsole

https://qtconsole.readthedocs.io/en/stable/


jupyter notebooks

You know how you can get an interactive python shell by typing python?


Compared to a plain interactive interpreter, a notebook is functionally very similar (you still talk to an interpreter, it still sticks around in memory until you quit)

  • ...but entirely from within a browser
  • you can save all the interactions in you put in those cells in a document-like way - hence the name notebook - and re-open and re-run that later (in a new session)
    • can contain just text explanation alongside the code (You can e.g. send people tutorials and code to more easily play with)
    • People have since made "publish notebook to site", "store/load to code repository" to make certain reuse easier
  • it's more visual than the shell - things like images and plots are drawn nicely, not just shown as some text representation
  • It's (potentially) easier to work remotely
    • zero installation on the computer you're working from is sometimes also a useful detail


Which of those is significant to you depends on what you're doing.


For some examples, see e.g. https://github.com/jupyter/jupyter/wiki


To most users, the magic glue that makes that work is fairly important, but yeah, it's a relatively regular interpreter wrapped in some "this is how to have people communicate with you".




On sessions

Basic use

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

Some code editors (like VS Code) understand what notebooks are, and will start interpreter sessions for you - also remotely.


If you do this more manually:

The actual notebook gets stored where you run:

jupyter notebook



The main keyboard shortcuts you really need to know is

  • ShiftEnter (Run, go to next cell) and/or ShiftEnter (Run, don't go to next cell, e.g. if your editor is annoying about scrolling)

You may like to skim over Help → keyboard shortcuts at least once.






Notes:

  • Execution happens as the user that started it. Permissions are also tied to them.


How and where to run it

Running it locally

Basically the default.

It'll launch a browser for you.

By default it binds to 127.0.0.1 (port 8888), meaning no one else can connect to it.


Running it remotely


If you are the only real user on another host (e.g. your own computer, your own server), perhaps the safest way is to do the above, and use a SSH tunnel like ssh -L localhost:8888:localhost:8888 workhost (and pointing your browser at 127.0.0.1:8888 on the SSH-client side).

The integration in some development environments (e.g. VS code's) amounts to the same thing.


On a trusted LAN you can consider running it --ip=0.0.0.0 so that it's easily reachable.

Some additions that you can't do on basic python

Specially rendered objects

When showing objects, jupyter will try to show things with more than a basic repr()


Things that do this out of the box

Libraries may have implemented this already

PIL itself implements _repr_png_
pandas itself implements _repr_html_ (and _repr_latex_)
...and so on


matplotlib


Jupyter seems to itself special-case matplotlib objects, though the precise integration varies a little, also because you may actually want to avoid that in some cases.


https://ipython.readthedocs.io/en/stable/interactive/plotting.html#id1

Things you can get with minimal suggestion

You can force such rendering modes yourself (for HTML, SVG, LaTeX, Markdown, Video) via cell magic like

%%html
<a href="http://example.com">link</a>

OR code-wise, using an existing class from IPython.core.display

from IPython.core.display import HTML
display(HTML('<a href="http://example.com">link</a>'))  # the display() is optional but apparently solves some interactivity


For Markdown, where you would typically set the cell type to markdown (Cell → Cell Type → Markdown)







Rolling your own

If an object has an attribute called (one of)

_repr_pretty_
_repr_svg_
_repr_png_
_repr_jpeg_
_repr_html_
_repr_javascript_
_repr_markdown_
_repr_latex_
_repr_mimebundle_

...then that will be called as a function, it should produce data according in that format; IPython has renderers that show that data.


In other words, you're aiming at the most useful existing renderer that applies, and you can write a function that wraps it in that

(There is also _ipython_display_ which is probably what you want if you're using ipywidgets to create forms, and may be useful for a lower-level thing for when you want more control and side effects(verify) - and accordingly is more work)


You can create your own wrappers with such a repr. For example:

class NumpyImage:
    ''' Takes an array of values, assumed to already have useful values in 0..255 value range, and visualizes it as a grayscale image. 
        Each cell is represented by an 8-by-8 pixel square (by default) for visibility.'''
    def __init__(self, ary, pixelsize=8):
        self.ary = numpy.array(ary) # copy
        self.ary[self.ary<0]   = 0 # clamp (before type conversion)
        self.ary[self.ary>255] = 255
        self.pixelsize = int(round(pixelsize))
    def _repr_png_(self):
        from PIL import Image
        import io, numpy
        im = Image.fromarray( numpy.uint8(self.ary) )
        if self.pixelsize != 1:
             im = im.resize( (im.size[0]*self.pixelsize, im.size[1]*self.pixelsize), resample=Image.NEAREST)
        by = io.BytesIO()
        im.save(by,'png')
        return by.getvalue()

# test:
import numpy
NumpyImage( numpy.random.rand(30,60)*255 )

See also:

Widgets
This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.


Progress bars

import time

# I'm assuming you often have a list that amount to individual tasks
tasks = ['a','b','c','d','e']





import tqdm

#   Upside: less typing, works on console as well
#   Downside: has to be installed first, graphical variant takes a little more typing
for task in tqdm.tqdm( tasks ): # force notebook mode
    print(task)
    time.sleep(1)





import tqdm.autonotebook # or notebook, but it doesn't have the console fallback

#   Upside: less typing, works on console as well
#   Downside: has to be installed first, graphical variant takes a little more typing
for task in tqdm.autonotebook.tqdm( tasks ): # prefer terminal mode, fall back to console
    print(task)
    time.sleep(1)

# If you make a _lot_ of output, it may matter that in console mode the bar goes under, and in notebook mode it goes above.


# IPython.display has some things like showing a progress bar, which seems to be written like an iterator, so you can use it like:
#   Upside: no need to install; part of notebooks
#   Downside: slightly clunkier

import IPython
for index in IPython.core.display.ProgressBar( len(tasks) ):
    print(tasks[index])
    time.sleep(1)


# Similarly, ipywidgets also has a progress bar (among other things), that lets you do:
# 
# 
from ipywidgets import IntProgress
from IPython.display import display
 
# init
progress = IntProgress(max=len(tasks)) # instantiate the bar
display(progress) 

for task in tasks:
    print(task) 
    progress.value += 1
    time.sleep(1)


https://forums.fast.ai/t/progress-bars-in-ipython-notebooks/22826/15


https://ipywidgets.readthedocs.io/en/stable/examples/Widget%20List.html



magic
This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

%% is cell magic, and applies to the rest of the cell.

basically, "does the first line look like %%thing? Then read the rest of the line as text, and the rest of the cell as text, and hand both to the registered function

% is line magic, and applies only to the rest of the line.

basically, "does the line look like %thing? Then read the rest of the line as text and hand it to the registered thing
the rest of the cell is evaluated as usual(verify)


if automagic is enabled (which it is by default), you don't need to use the single percent for line magic.

can be toggled via %automagic
note that magic intentionally has lower priority than any registered variable names
so that magic won't block you from being able to run code
...which instead means you could mask out certain magic from being runnable this way
(I prefer to use % to make it clear what is being invoked)
https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-automagic


Note that magic means that text need not be python code at all (but often is, because it's easy to create "I no longer know how to do this in normal python" situations)


Built-in magic includes


Debugging, profiling

%time - how much time (one run)
%timeit - how much time (in a bunch of runs, at least a second's worth?(verify))
%prun - how much time, per function
-s controls what to sort on
-r return the pstats.Stats object
...and more
%lprun - how much time, per line
%mprun, %memit - how much memory per function (once, a bunch)
%debug - interactive debugger
%pdb -
%tb - print last traceback


Input and output

%pprint - whether to use pprint or repr
%precision - precision used in pretty printint


%%capture varname will capture all output from a cell and store it in a variable. Without an argument, it throws it away. Useful to quieten spammy output
if you wanted that more selectively, consider a with io.capture_output() as captured: block


System

%config - lets you alter IPython's configuration
%env, %set_env - get or set environment (mostly a slighy shorter version of altering os.environ)


%pip
%conda - run that conda package manager


! and !! (%sx)
cf. %system, %sc, %%bash, %%capture, %%script, %%pypy, %%sh


Shell-like things (and note that automagic applies, so you don't need the %) like

%ls
%pwd
%cat
%more
%env
%man
%mkdir
%cp
%mv
%rm
%rmdir



IPython environment

%logon, %logoff, %logstart - logging
%recall, %rep -
%reset


%gui - GUI event loop integration stuff for Qt, gtk, wxPython, tk, and cocoa
e.g. used for matplotlib windows if you use an interactive backend


%pastebin
specifier of what to save (defaults to 'everything so far), can be
input history range
filename
name of string/macro
-d for a description
-e argument for the timeout in days (defaults to 7)


https://ipython.readthedocs.io/en/stable/interactive/magics.html


custom magic

You can register your own named magic functions

https://ipython.readthedocs.io/en/stable/config/custommagics.html


Keep in mind that a kernel represents its own state - and can be restarted. Magic would not stay registered - but it's transparent because it's usually registered as part of library import (conditional on whether that library sees IPython).

The perhaps-cleaner, inversion of control way is to have a function called load_ipython_extension() in your extension. With one argument, the current InteractiveShell.



testing magic

https://pmbaumgartner.github.io/blog/testing-ipython-magics/

Shell access

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.
⚠ this is not portable between OSes

You can run shell commands like:

!ls

and get their output, like

cwd = !pwd
files = !ls

...which are returned as a IPython.utils.text.SList which does whitespace splitting, and prints and acts like a list.

...but is actually an object that also has some convenience values/functions:

  • .l or .list : value as list (the list itself).
  • .n or .nlstr: basically '\n'.join(l) ? (verify)
  • .s or .spstr: basically ' '.join(l) ? (verify) - so is the closest thing to the raw output(verify)
  • .p or .paths: basically list(path.Path(v) for v in l) ? (verify)
  • .grep(): returns .l grepped with a regex or callable
  • .fields(): basically lets you return a ragged array - consier:
>>> cwd = !ls -l
>>> cwd.fields()
['total', '52'],
['drwxr-xr-x', '4', 'me', 'us', '4', 'Aug', '10', '17:38', 'file1'],
['drwxr-xr-x', '2', 'me', 'us', '3', 'Jun', '10', '20:29', 'file2'],


You can hand in variables like:

pyvar = '..'
!ls {pyvar}

or

!echo ${pyvar}


Notes:

  • Somewhat confusingly, there are a few commands that seem to not need one (ls, cd, and more) --- because they are instead magic, and specifically automagic, meaning they don't need a %
you might prefer these via ! if you want to capture their output.
  • This is more a convenience than a real API
it doesn't seem like you can get the raw output.
if you want more control, you probably want to use subprocess anyway
  • Don't expect interactive shell commands to work.
  • Something that doesn't return without interaction will block the kernel.
  • e.g. google colab lets you do this too (it's isolated to VMs anyway), which means it's actually quite flexible in what you can install

Security

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.


Notebooks allow

  • arbitrary code execution - whatever the kernel allows
  • which includes shell access, as the starting user
  • fairly arbitrary JS within the browser that's the client


As such

  • you probably want to use auth
  • never run notebooks you got from someone you don't trust
jupyter and others may refuse to render HTML or JS in an existing notebook unless
you execute/regenerate it yourself
you started the notebook with 'trust'
...but those are easily done if you're just clicking things until they work.


See also:

In editors

Various editors can use notebooks, including:


Spyder

Visual Studio Code

Can embed them, with an extension - that basically comes with the Python extension itself.

Keep in mind that Visual Studio has its own workspace trust concept, and if its restricted mode applies you can't execute anything and won't see much.


"Install kernels for name.ipynb"

My VS code got confused about whether python extension was installed or not.
Uninstall and reinstall fixed that.

Error loading preloads: Could not find renderer

after the above? do a Reload Window.


https://code.visualstudio.com/docs/datascience/jupyter-notebooks

Hydrogen

Jupyter wrapped in the Atom editor

https://atom.io/packages/hydrogen

Multi-user, hosting, etc.

JupyterHub

Adds a login service (e.g. PAM, OAuth) around keeping track of notebooks.

No other changes - the notebooks are still single-user things as before.


It means only one person has to figure out install, so it's a low-threshold thing for things like

classrooms (there are also homework/grading extensions)
workshops (see what everyone saved)
academia / teams (share what everyone's doing)

...though there's no overly easy way of sharing?(verify)

See also:

JupyterLab

Can be seen as an online-hosted notebook thing that is more of an IDE and more convenience.

https://jupyterlab.readthedocs.io/en/stable/



google colab

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

A project derived from jupyter.

  • No setup
  • free access, though with resource limits
  • allows GPU and TPU (also resource limits)
  • easier to share than your own jupyter notebooks.
  • easier access to things like google drive, google cloud storage [3]


Note that free colab has a number of limits:

  • resources may be prioritized for interactive users and lower-resource users, rather than long-running things
because it's intended for expertimenting. If people used it as a bulk-compute platform, it could not be free.
some types of bulk comput are specifically disallowed (e.g. hosting, cryptocurrency mining, (even if paid?))
  • notebooks run on VMs, and these VMs will shut down after idle time and/or a maximum lifetime of something like 12 hours(verify)
  • no guarantee that you access a GPU (except with Pro)
due to the above, you may have some cooldown if you do things like beefy training
  • no guarantee to the amount of memory the VM you get has (except with Pro)


Google seems to hope you like the setup, tie yourself to its setup, and want to continue your project with Colab Pro/Pro+.


Note that if you like colab's setup but like to use your own compute resources, you can have it connect to a jupyter kernel on your host - see local runtimes

WARNING: that's arbitrary code execution that you allowed and a security risk do not run code you do not trust


nbviewer

Seems static hosting for a rendered notebook. (verify)

http://nbviewer.org/

mybinder

https://mybinder.org/


JupyterLite