Python notebook notes: Difference between revisions

From Helpful
Jump to navigation Jump to search
Tag: Reverted
 
(5 intermediate revisions by the same user not shown)
Line 19: Line 19:


* hooks in some profiling, via its magic functions:
* hooks in some profiling, via its magic functions:
: %time - how much time (one run)
: %time - how much time (run once)
: %timeit - how much time (in a bunch of runs, at least a second's worth?{{verify}})
: %timeit - how much time (in a bunch of runs, at least a second's worth?{{verify}})
: %prun - how much time, per function
: %prun - how much time, per function
: %lprun - how much time, per line
: %lprun - how much time, per line
: %mprun, %memit - how much memory per function (once, a bunch)
 
: %mprun - how much memory per function (run once)
: %memit - how much memory per function (in a bunch of runs)


* some tools for parallel computing (due to itself being abstracted out this way)
* some tools for parallel computing (due to itself being abstracted out this way)
Line 70: Line 74:


===jupyter notebooks===
===jupyter notebooks===
A python-based notebook is a mix of notes and code -- where loaded into the right environment,
that code is executable elsewhere.


You know how you can get an interactive python shell by typing python?
You know how you can get an interactive python shell by typing python?




Compared to a plain interactive interpreter, a notebook is functionally very similar.
Compared to a plain interactive interpreter, a notebook is functionally very similar {{comment|(you still talk to an interpreter, it still sticks around in memory until you quit)}}
You still talk to an interpreter, that interpreter sticks around in memory until you quit
* ...but entirely from within a browser


...but you're not now talking to it from a browser,
* you can save all the interactions in you put in those cells in a document-like way - hence the name notebook - and re-open and re-run that later (in a new session)
and you're writing into a document-like thing that can be saved, and loaded later.
** can contain just text explanation alongside the code (You can e.g. send people tutorials and code to more easily play with)
** People have since made "publish notebook to site", "store/load to code repository" to make certain reuse easier


That document-like thing is a series of cells, each either code or just text {{comment|(well, markup. Well, [[markdown]])}},
* it's more visual than the shell - things like images and plots are drawn nicely, not just shown as some text representation
making it easier to add text explanation alongside the code.


This is e.g. useful to send people tutorials and code to more easily play with, and people later added things like
* It's (potentially) easier to work remotely
"publish notebook to site", "store/load to code repository" to make certain reuse easier {{comment|(because the ipynb format itself is a data representation of those cells, and not directly viewable HTML)}}.
** zero installation on the computer you're working from is sometimes also a useful detail


Additionally
* it's more visual than the shell - things like images and plots are drawn nicely, not just shown as some symbolic text representation
:: For some examples, see e.g. https://github.com/jupyter/jupyter/wiki


* It is ''potentially'' easier to work remotely
Which of those is significant to you depends on what you're doing.
:: because the browser-to-backend connection was networked anyway, they can be on different computers
:: and when people have set this up for you, this can also serves certain needs like classrooms, data science, etc.


* This interchange turns out to be generic enough that it has been extended to support other languages.
:: the below mostly focuses on python, though


For some examples, see e.g. https://github.com/jupyter/jupyter/wiki


Which of those is ''useful'' to you depends on what you're doing.




To most users, the magic glue that makes that work is fairly important,
but yeah, it's a relatively regular interpreter wrapped in some "this is how to have people communicate with you".




Line 120: Line 113:




====On sessions====
<!--
: '''To install'''


A notebook it by itself just a document - data.
If your package manager has it, use that.  Otherwise you probably want to <tt>pip install jupyter</tt>
-->


To actually run things, you need to start a new session.


As a mental model, this is 90% like a regular python shell, that you are sending those fragments of code to, a cell at a time.
====On sessions====
<!--
A notebook it by itself just a document.


Which is also why executing out of order is possible (cells are not related, you just decide what to do with them), sometimes useful, and sometimes confusing.
To actually run things, you need to start a new session
* running {{inlinecode|jupyter notebook}} will create a session that probably sticks around until you sto pit


The backend behind a sessions can be explicitly stopped, restarted, and in some cases may stop because you weren't using them for a while.
* an IDE may be a little more on-the-fly about it
:: can be more convenient in that
:: can be less convenient if you specifically wanted to continue with a specific state
-->


====Basic use====
{{stub}}


====Variations====
Some code editors (like VS Code) understand what notebooks are, and will start interpreter sessions for you - also remotely.


The open source world wouldn't be itself if there weren't ''variations'' on how that worked exactly.
: e.g. how to install it, start it, what determines the environment, how safe it is, how multi-user
<!--
: '''To install'''


If your package manager has it, use that.  Otherwise you probably want to <tt>pip install jupyter</tt>
-->


'''To run'''
If you do this more manually:


* The classical notebooks were little web servers that you started like
The actual notebook gets stored where you run:
  jupyter notebook
  jupyter notebook
: That however requires you are logged into the computer you want to run that on. If that's local, fine. If that's remote, it's more thought.
: it'll launch a browser for you, and point that at http://localhost:8888/ (by default)
: it doesn't keep track of the browser(s) pointed at it, so you can in theory keep this running forever, and keep coming back to it
* if you're nerdy enough to know [[SSH tunnels]], that's one way to also get to them remotely
:: {{inlinecode|ssh -L localhost:8888:localhost:8888 hostwiththatnotebookserver}} {{comment|(and pointing your browser at 127.0.0.1:8888 on the SSH-client side)}}.


* an IDE may be a little more on-the-fly about it
:: more convenient in that it's less thinking
:: can be made to work remotely - e.g. vscode will open remote ipynb files this way
:: though you lose the "continue with specific state forever" thing




====Basic use====
{{stub}}


The main '''{{search|jupyter notebook keyboard shortcuts|keyboard shortcuts}}''' you really need to know is  
The main '''{{search|jupyter notebook keyboard shortcuts|keyboard shortcuts}}''' you really need to know is  
Line 169: Line 151:


You may like to skim over ''Help &rarr; keyboard shortcuts'' at least once.
You may like to skim over ''Help &rarr; keyboard shortcuts'' at least once.




Line 192: Line 177:




=====How and where to run it=====
'''Running it locally'''
Basically the default.
It'll launch a browser for you.
By default it binds to 127.0.0.1 (port 8888), meaning no one else can connect to it.
'''Running it remotely'''
If you are the only real user on another host (e.g. your own computer, your own server),
perhaps the safest way is to do the above, and use a [[SSH tunnel]] like {{inlinecode|ssh -L localhost:8888:localhost:8888 workhost}} {{comment|(and pointing your browser at 127.0.0.1:8888 on the SSH-client side)}}.
The integration in some development environments (e.g. VS code's) amounts to the same thing.
On a trusted LAN you can ''consider'' running it {{inlinecode|<nowiki>--ip=0.0.0.0</nowiki>}} so that it's easily reachable.


====Some additions that you can't do on basic python====
====Some additions that you can't do on basic python====
Line 324: Line 332:
         self.ary[self.ary<0]  = 0 # clamp (before type conversion)
         self.ary[self.ary<0]  = 0 # clamp (before type conversion)
         self.ary[self.ary>255] = 255
         self.ary[self.ary>255] = 255
         self.pixelsize = pixelsize
         self.pixelsize = int(round(pixelsize))
     def _repr_png_(self):
     def _repr_png_(self):
         from PIL import Image
         from PIL import Image
Line 407: Line 415:
import time
import time


# assuming you often have a list that amount to individual tasks, say
# I'm assuming you often have a list that amount to individual tasks
tasks = ['a','b','c','d','e']
tasks = ['a','b','c','d','e']
</syntaxhighlight>




<syntaxhighlight lang="python">
 
 
 
import tqdm
import tqdm


Line 420: Line 429:
     print(task)
     print(task)
     time.sleep(1)
     time.sleep(1)
</syntaxhighlight>


<syntaxhighlight lang="python">
 
 
 
 
import tqdm.autonotebook # or notebook, but it doesn't have the console fallback
import tqdm.autonotebook # or notebook, but it doesn't have the console fallback



Latest revision as of 18:29, 12 April 2024

Syntaxish: syntax and language · changes and py2/3 · decorators · importing, modules, packages · iterable stuff · concurrency

IO: networking and web · filesystem

Data: Numpy, scipy · pandas, dask · struct, buffer, array, bytes, memoryview · Python database notes

Image, Visualization: PIL · Matplotlib, pylab · seaborn · bokeh · plotly


Tasky: Concurrency (threads, processes, more) · joblib · pty and pexpect

Stringy: strings, unicode, encodings · regexp · command line argument parsing · XML

date and time


Notebooks

speed, memory, debugging, profiling · Python extensions · semi-sorted


Ipython

IPython is an interactive shell more capable than python's own, and started the notebook thing that later became jupyter.


It e.g.

  • has better history than py2's shell did (py3's is better that way), completion, and other interactivity
  • introduced notebooks - served via browser, allows embedded code, text, plots, mathematical expressions
  • integrates with some interactive data visualization
  • integrates with GUI toolkits
  • makes it easier to embed an interpreter into your own project
  • hooks in some profiling, via its magic functions:
%time - how much time (run once)
%timeit - how much time (in a bunch of runs, at least a second's worth?(verify))
%prun - how much time, per function
%lprun - how much time, per line
%mprun - how much memory per function (run once)
%memit - how much memory per function (in a bunch of runs)
  • some tools for parallel computing (due to itself being abstracted out this way)
[1] [2]


See also:

jupyter

Jupyter, are a more backend-agnostic framework/protocol than it already was when it was still called ipython notebooks. (it speaks largely JSON over 0MQ).


Ipython is just one of its possible kernel/backends

see this list for more.
this includes some things that basically just expose an existing CLI (see wrapper kernels)

Jupyter is mostly known carrying on the notebook thing.



jupyter Qt console

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

Most people care about notebooks, so skip to the next section.

The Qt console is a similar idea to notebooks, but is closer to integrating with (Qt) apps, which is sometimes very useful.

jupyter qtconsole

https://qtconsole.readthedocs.io/en/stable/


jupyter notebooks

You know how you can get an interactive python shell by typing python?


Compared to a plain interactive interpreter, a notebook is functionally very similar (you still talk to an interpreter, it still sticks around in memory until you quit)

  • ...but entirely from within a browser
  • you can save all the interactions in you put in those cells in a document-like way - hence the name notebook - and re-open and re-run that later (in a new session)
    • can contain just text explanation alongside the code (You can e.g. send people tutorials and code to more easily play with)
    • People have since made "publish notebook to site", "store/load to code repository" to make certain reuse easier
  • it's more visual than the shell - things like images and plots are drawn nicely, not just shown as some text representation
  • It's (potentially) easier to work remotely
    • zero installation on the computer you're working from is sometimes also a useful detail


Which of those is significant to you depends on what you're doing.


For some examples, see e.g. https://github.com/jupyter/jupyter/wiki


To most users, the magic glue that makes that work is fairly important, but yeah, it's a relatively regular interpreter wrapped in some "this is how to have people communicate with you".




On sessions

Basic use

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

Some code editors (like VS Code) understand what notebooks are, and will start interpreter sessions for you - also remotely.


If you do this more manually:

The actual notebook gets stored where you run:

jupyter notebook



The main keyboard shortcuts you really need to know is

  • ShiftEnter (Run, go to next cell) and/or ShiftEnter (Run, don't go to next cell, e.g. if your editor is annoying about scrolling)

You may like to skim over Help → keyboard shortcuts at least once.






Notes:

  • Execution happens as the user that started it. Permissions are also tied to them.


How and where to run it

Running it locally

Basically the default.

It'll launch a browser for you.

By default it binds to 127.0.0.1 (port 8888), meaning no one else can connect to it.


Running it remotely


If you are the only real user on another host (e.g. your own computer, your own server), perhaps the safest way is to do the above, and use a SSH tunnel like ssh -L localhost:8888:localhost:8888 workhost (and pointing your browser at 127.0.0.1:8888 on the SSH-client side).

The integration in some development environments (e.g. VS code's) amounts to the same thing.


On a trusted LAN you can consider running it --ip=0.0.0.0 so that it's easily reachable.

Some additions that you can't do on basic python

Specially rendered objects

When showing objects, jupyter will try to show things with more than a basic repr()


Things that do this out of the box

Libraries may have implemented this already

PIL itself implements _repr_png_
pandas itself implements _repr_html_ (and _repr_latex_)
...and so on


matplotlib


Jupyter seems to itself special-case matplotlib objects, though the precise integration varies a little, also because you may actually want to avoid that in some cases.


https://ipython.readthedocs.io/en/stable/interactive/plotting.html#id1

Things you can get with minimal suggestion

You can force such rendering modes yourself (for HTML, SVG, LaTeX, Markdown, Video) via cell magic like

%%html
<a href="http://example.com">link</a>

OR code-wise, using an existing class from IPython.core.display

from IPython.core.display import HTML
display(HTML('<a href="http://example.com">link</a>'))  # the display() is optional but apparently solves some interactivity


For Markdown, where you would typically set the cell type to markdown (Cell → Cell Type → Markdown)







Rolling your own

If an object has an attribute called (one of)

_repr_pretty_
_repr_svg_
_repr_png_
_repr_jpeg_
_repr_html_
_repr_javascript_
_repr_markdown_
_repr_latex_
_repr_mimebundle_

...then that will be called as a function, it should produce data according in that format; IPython has renderers that show that data.


In other words, you're aiming at the most useful existing renderer that applies, and you can write a function that wraps it in that

(There is also _ipython_display_ which is probably what you want if you're using ipywidgets to create forms, and may be useful for a lower-level thing for when you want more control and side effects(verify) - and accordingly is more work)


You can create your own wrappers with such a repr. For example:

class NumpyImage:
    ''' Takes an array of values, assumed to already have useful values in 0..255 value range, and visualizes it as a grayscale image. 
        Each cell is represented by an 8-by-8 pixel square (by default) for visibility.'''
    def __init__(self, ary, pixelsize=8):
        self.ary = numpy.array(ary) # copy
        self.ary[self.ary<0]   = 0 # clamp (before type conversion)
        self.ary[self.ary>255] = 255
        self.pixelsize = int(round(pixelsize))
    def _repr_png_(self):
        from PIL import Image
        import io, numpy
        im = Image.fromarray( numpy.uint8(self.ary) )
        if self.pixelsize != 1:
             im = im.resize( (im.size[0]*self.pixelsize, im.size[1]*self.pixelsize), resample=Image.NEAREST)
        by = io.BytesIO()
        im.save(by,'png')
        return by.getvalue()

# test:
import numpy
NumpyImage( numpy.random.rand(30,60)*255 )

See also:

Widgets
This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.


Progress bars

import time

# I'm assuming you often have a list that amount to individual tasks
tasks = ['a','b','c','d','e']





import tqdm

#   Upside: less typing, works on console as well
#   Downside: has to be installed first, graphical variant takes a little more typing
for task in tqdm.tqdm( tasks ): # force notebook mode
    print(task)
    time.sleep(1)





import tqdm.autonotebook # or notebook, but it doesn't have the console fallback

#   Upside: less typing, works on console as well
#   Downside: has to be installed first, graphical variant takes a little more typing
for task in tqdm.autonotebook.tqdm( tasks ): # prefer terminal mode, fall back to console
    print(task)
    time.sleep(1)

# If you make a _lot_ of output, it may matter that in console mode the bar goes under, and in notebook mode it goes above.


# IPython.display has some things like showing a progress bar, which seems to be written like an iterator, so you can use it like:
#   Upside: no need to install; part of notebooks
#   Downside: slightly clunkier

import IPython
for index in IPython.core.display.ProgressBar( len(tasks) ):
    print(tasks[index])
    time.sleep(1)


# Similarly, ipywidgets also has a progress bar (among other things), that lets you do:
# 
# 
from ipywidgets import IntProgress
from IPython.display import display
 
# init
progress = IntProgress(max=len(tasks)) # instantiate the bar
display(progress) 

for task in tasks:
    print(task) 
    progress.value += 1
    time.sleep(1)


https://forums.fast.ai/t/progress-bars-in-ipython-notebooks/22826/15


https://ipywidgets.readthedocs.io/en/stable/examples/Widget%20List.html



magic
This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

%% is cell magic, and applies to the rest of the cell.

basically, "does the first line look like %%thing? Then read the rest of the line as text, and the rest of the cell as text, and hand both to the registered function

% is line magic, and applies only to the rest of the line.

basically, "does the line look like %thing? Then read the rest of the line as text and hand it to the registered thing
the rest of the cell is evaluated as usual(verify)


if automagic is enabled (which it is by default), you don't need to use the single percent for line magic.

can be toggled via %automagic
note that magic intentionally has lower priority than any registered variable names
so that magic won't block you from being able to run code
...which instead means you could mask out certain magic from being runnable this way
(I prefer to use % to make it clear what is being invoked)
https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-automagic


Note that magic means that text need not be python code at all (but often is, because it's easy to create "I no longer know how to do this in normal python" situations)


Built-in magic includes


Debugging, profiling

%time - how much time (one run)
%timeit - how much time (in a bunch of runs, at least a second's worth?(verify))
%prun - how much time, per function
-s controls what to sort on
-r return the pstats.Stats object
...and more
%lprun - how much time, per line
%mprun, %memit - how much memory per function (once, a bunch)
%debug - interactive debugger
%pdb -
%tb - print last traceback


Input and output

%pprint - whether to use pprint or repr
%precision - precision used in pretty printint


%%capture varname will capture all output from a cell and store it in a variable. Without an argument, it throws it away. Useful to quieten spammy output
if you wanted that more selectively, consider a with io.capture_output() as captured: block


System

%config - lets you alter IPython's configuration
%env, %set_env - get or set environment (mostly a slighy shorter version of altering os.environ)


%pip
%conda - run that conda package manager


! and !! (%sx)
cf. %system, %sc, %%bash, %%capture, %%script, %%pypy, %%sh


Shell-like things (and note that automagic applies, so you don't need the %) like

%ls
%pwd
%cat
%more
%env
%man
%mkdir
%cp
%mv
%rm
%rmdir



IPython environment

%logon, %logoff, %logstart - logging
%recall, %rep -
%reset


%gui - GUI event loop integration stuff for Qt, gtk, wxPython, tk, and cocoa
e.g. used for matplotlib windows if you use an interactive backend


%pastebin
specifier of what to save (defaults to 'everything so far), can be
input history range
filename
name of string/macro
-d for a description
-e argument for the timeout in days (defaults to 7)


https://ipython.readthedocs.io/en/stable/interactive/magics.html


custom magic

You can register your own named magic functions

https://ipython.readthedocs.io/en/stable/config/custommagics.html


Keep in mind that a kernel represents its own state - and can be restarted. Magic would not stay registered - but it's transparent because it's usually registered as part of library import (conditional on whether that library sees IPython).

The perhaps-cleaner, inversion of control way is to have a function called load_ipython_extension() in your extension. With one argument, the current InteractiveShell.



testing magic

https://pmbaumgartner.github.io/blog/testing-ipython-magics/

Shell access

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.
⚠ this is not portable between OSes

You can run shell commands like:

!ls

and get their output, like

cwd = !pwd
files = !ls

...which are returned as a IPython.utils.text.SList which does whitespace splitting, and prints and acts like a list.

...but is actually an object that also has some convenience values/functions:

  • .l or .list : value as list (the list itself).
  • .n or .nlstr: basically '\n'.join(l) ? (verify)
  • .s or .spstr: basically ' '.join(l) ? (verify) - so is the closest thing to the raw output(verify)
  • .p or .paths: basically list(path.Path(v) for v in l) ? (verify)
  • .grep(): returns .l grepped with a regex or callable
  • .fields(): basically lets you return a ragged array - consier:
>>> cwd = !ls -l
>>> cwd.fields()
['total', '52'],
['drwxr-xr-x', '4', 'me', 'us', '4', 'Aug', '10', '17:38', 'file1'],
['drwxr-xr-x', '2', 'me', 'us', '3', 'Jun', '10', '20:29', 'file2'],


You can hand in variables like:

pyvar = '..'
!ls {pyvar}

or

!echo ${pyvar}


Notes:

  • Somewhat confusingly, there are a few commands that seem to not need one (ls, cd, and more) --- because they are instead magic, and specifically automagic, meaning they don't need a %
you might prefer these via ! if you want to capture their output.
  • This is more a convenience than a real API
it doesn't seem like you can get the raw output.
if you want more control, you probably want to use subprocess anyway
  • Don't expect interactive shell commands to work.
  • Something that doesn't return without interaction will block the kernel.
  • e.g. google colab lets you do this too (it's isolated to VMs anyway), which means it's actually quite flexible in what you can install

Security

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.


Notebooks allow

  • arbitrary code execution - whatever the kernel allows
  • which includes shell access, as the starting user
  • fairly arbitrary JS within the browser that's the client


As such

  • you probably want to use auth
  • never run notebooks you got from someone you don't trust
jupyter and others may refuse to render HTML or JS in an existing notebook unless
you execute/regenerate it yourself
you started the notebook with 'trust'
...but those are easily done if you're just clicking things until they work.


See also:

In editors

Various editors can use notebooks, including:


Spyder

Visual Studio Code

Can embed them, with an extension - that basically comes with the Python extension itself.

Keep in mind that Visual Studio has its own workspace trust concept, and if its restricted mode applies you can't execute anything and won't see much.


"Install kernels for name.ipynb"

My VS code got confused about whether python extension was installed or not.
Uninstall and reinstall fixed that.

Error loading preloads: Could not find renderer

after the above? do a Reload Window.


https://code.visualstudio.com/docs/datascience/jupyter-notebooks

Hydrogen

Jupyter wrapped in the Atom editor

https://atom.io/packages/hydrogen

Multi-user, hosting, etc.

JupyterHub

Adds a login service (e.g. PAM, OAuth) around keeping track of notebooks.

No other changes - the notebooks are still single-user things as before.


It means only one person has to figure out install, so it's a low-threshold thing for things like

classrooms (there are also homework/grading extensions)
workshops (see what everyone saved)
academia / teams (share what everyone's doing)

...though there's no overly easy way of sharing?(verify)

See also:

JupyterLab

Can be seen as an online-hosted notebook thing that is more of an IDE and more convenience.

https://jupyterlab.readthedocs.io/en/stable/



google colab

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

A project derived from jupyter.

  • No setup
  • free access, though with resource limits
  • allows GPU and TPU (also resource limits)
  • easier to share than your own jupyter notebooks.
  • easier access to things like google drive, google cloud storage [3]


Note that free colab has a number of limits:

  • resources may be prioritized for interactive users and lower-resource users, rather than long-running things
because it's intended for expertimenting. If people used it as a bulk-compute platform, it could not be free.
some types of bulk comput are specifically disallowed (e.g. hosting, cryptocurrency mining, (even if paid?))
  • notebooks run on VMs, and these VMs will shut down after idle time and/or a maximum lifetime of something like 12 hours(verify)
  • no guarantee that you access a GPU (except with Pro)
due to the above, you may have some cooldown if you do things like beefy training
  • no guarantee to the amount of memory the VM you get has (except with Pro)


Google seems to hope you like the setup, tie yourself to its setup, and want to continue your project with Colab Pro/Pro+.


Note that if you like colab's setup but like to use your own compute resources, you can have it connect to a jupyter kernel on your host - see local runtimes

WARNING: that's arbitrary code execution that you allowed and a security risk do not run code you do not trust


nbviewer

Seems static hosting for a rendered notebook. (verify)

http://nbviewer.org/

mybinder

https://mybinder.org/


JupyterLite