Python notes - semi-sorted

From Helpful
Jump to: navigation, search
Various things have their own pages, see Category:Python. Some of the pages that collect various practical notes include:
  • semi-sorted



notebooks

Python notebooks mean you can play through a web interface. This means less typing and more prettiness while you're doing plotting, math, or anything else you can manage via the python shell.

You can copy the notebooks elsewhere, bootstrapping other people to do similar experiments to yours.

To see what people have done with it, see e.g. https://github.com/jupyter/jupyter/wiki/A-gallery-of-interesting-Jupyter-Notebooks


Technically, notebooks are interactive backend approached via a webpage frontend, ...more technically this. They came from iPython, which has since merged into Jupyter (which supports more than just python).


To install

If your package manager has it, use that.

Otherwise you probably want to pip install jupyter


To use

The actual notebook gets stored in the current directory (so you may wish to organize notebooks into directories a bit) where you run:

jupyter notebook


By default it binds to 127.0.0.1:8888 and locally launches a browser.

When working remotely, consider using a SSH tunnel like ssh -L localhost:8888:localhost:8888 otherhost
On your own LAN you may like to use
--ip=0.0.0.0<nowiki>}} (and maybe {{inlinecode|<nowiki>--port=80
) so that it's easily reachable.


You then probably want to look at Help → keyboard shortcuts. Most important to start with is probably ShiftEnter: Run, go to next cell



Ipython

iPython is a collection of:

  • an interactive shell, more featured than python's own.
http://ipython.org/ipython-doc/rel-0.12/interactive/tutorial.html


  • integrates with some interactive data visualization
  • integrates with GUI toolkit
...both of which are used in...
  • notebooks - served via browser, allows embedded code, text, plots, mathematical expressions


  • some tools for parallel computing (due to itself being abstracted out this way)
  • makes it easier to embed an interpreter into your own project


  • I like the way it eases hooking in profiling into ipython, via its magic functions:
 %time - how much time (one run)
 %timeit - how much time (in a bunch of runs, at least a second's worth?(verify))
 %prun - how much time, per function
 %lprun - how much time, per line
 %mprun, %memit - how much memory per function (once, a bunch)

See also:



Python3 notes

I still code for python2 (almost all system pythons are 2, often ~2.7 as of this writing), but it's starting to become a good idea to learn to code for both.


Two things you want to know about:

  • this is one useful summary of differences
  • 2to3 is a tool to parse code and show suggested change as a diff


Notes to self:

  • type-related stuff
int and long are now the same thing, so you can use int everywhere
Division:
In py2
/
coerces to float if it involves a float, if int stays an int, e.g. 1/2==0, 1/2.0 = 0.5
//
coerces to float if it involves a float, and does a floor, e.g. 1//2==0, 1//2.0==0.0
In py3
/
is always float division, e.g. 1/2==0.5
//
as in py2
  • strings and unicode:
in py2,
means bytestring and
u
means unicode string
in py3,
b
means bytestring and
means unicode string
and ≥py3.3 accepts
u
(which does nothing, but makes porting easier for people already doing their py2 unicode correctely)
When you use strings to print text, things will mostly work as-is.
When you are explicitly handling conversions between these things, you'll need to rewrite that
When you need to suppor both py2 and py3
You can get py3 behaviour in ≥py2.6 via
from __future__ import unicode_literals
(verify)
in other cases you may wish to cheat, e.g. instantiating all strings via a function that does some conversion based on the python version
TODO: figure out how to best deal with libraries
  • buffer, memoryview and such changed.
largely but not fully compatible. If you use them, you'll need to read up.


  • you can't mix tabs and spaces anymore. Probably a good thing.
  • print is now a function (needs brackets)
adding brackets is usually perfectly backwards compatible with py2
if you use the comma trick to avoid a newline, you can't do that in py3 anymore. Adding
{{{1}}}
is not py2 compatible. You'll need to do some rewriting of your prints, or use sys.stdout.write() instead
if you can assume ≥py2.6(verify) you could use
from __future__ import print_function


  • exception syntax: use 'as' instead of a comma
except Exception as e
except (ValueError, TypeError) as e

instead of py2's:

except Exception, e
except (ValueError, TypeError), e
Using
as
has been supported since py2.6 or py2.7 (verify), and that's getting good enough since system python is now often 2.7
  • iterators
    • more things return an iterator or view, rather than a list (changes things if you did some type testing. Changes nothing if you just use them as sequences)
    • map() and filter() are now defined
    • py2 had .next for iterators, py3 has a next() built-in.


  • py3 has new-style classes only (that is, inherit from object)
which most pwople were using already
  • py2 had both file() and open(). py3 has only open()
  • has_key is gone. The in keyword should handle all cases.


See also:


For the most part, moving from py2 to py3 means reviewing all your code. For the most part this is trivial, largely automatic (see 2to3).


Moving to py3 means fixing your code. For the most part this is simple



Setting the process name

It's OS-specific stuff, so there is no short portable way.

The easiest method is to install/use the setproctitle module - it aims to be portable and try its best on various platforms. Example use:

try:
    import setproctitle
    setproctitle.setproctitle( os.path.basename(sys.argv[0]) )
except ImportError:
    pass

The above often means 'use the filesystem name that was used to run this' - but not always, so a hardcoded string can make sense.

Useful links

If you are new to Python, common suggestions for learning it include Dive Into Python, The python flavour of How to Think Like a Computer Scientist, or Thinking In Python.





Libraries and documentation



Semi-sorted:


Additional notes you may wish to have seen some time

array, deque

A list can be used as a sort of deque, like:

append(val)     # insert on right side 
pop()           # take from right side
insert(0,val)   # insert on left side
pop(0)          # take from left side

However, list is primarily efficient for stack-like use - list.pop(0) and list.insert(0, val) are O(n) operations,

...while those are O(1) operations on collections.deque (added in 2.4). deque also has appendleft(), extendleft(), and popleft(), and some others that can be convenient and/or more readable.

You may also wish to know about the queue nodule, a multi-producer, multi-consumer, thread-safe queue.

See also:

StringIO

You can write to StringIO objects, and ask them for the data they caught so far. They store this data only in memory.

Is is mostly useful where a function wants to write to file object, but you want to avoid the filesystem (for convenience, to avoid potential permission problems, for speed by avoiding IO).

Two caveats:

  • you can write(), but you cannot read() -- you can only getvalue() the full contents so far
  • Once the StringIO object is close()d, the contents are gone
not usually a problem, as most save functions either take a filename and does an open()-write()-close() (in which case stringio is fairly irrelevant), or take a file object and just write() (in which case you're fine)


cStringIO is the faster, written-in-C drop-in. It is often useful to do:

try: # use the faster extension when we can
    import cStringIO as StringIO
except: # drop back to python's own when we must
    import StringIO

See also:

Date stuff

Speed, memory, debugging

Profiling

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

One easy way is to use cProfile.py, which I like in a wrapper script like:

#!/bin/bash
python -m cProfile -s time $@

...or -o to saving it to a file, so that you can inspect it more flexibly.


For a little more control, you could add e.g. hotshot to your __main__ code. I've taken up the habit of putting it in a function like:

def profile():
    import hotshot
    prof = hotshot.Profile("hotprof") #file to write the profile to
    prof.runcall( main ) # or a wrapper function containing the things you want profiled
    prof.close()


...and use something like RunSnakeRun, a graphical profile viewer. In text mode, you'ld probably want a simple script to view the results instead, for example:

import hotshot.stats
stats = hotshot.stats.load("hotprof")
stats.strip_dirs()
stats.sort_stats('time', 'calls')
stats.print_stats(20)

Memory-related notes

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Memory profilers (/leak helpers)

Include:

  • dowser [1] (gives graphical statistics - uses CherryPy, PIL)
  • heapy [2]
  • pysizer [3] (can use pyrex if available)
  • Python Memory Validator [4] (commercial)

Since Py3.4 there is also tracemalloc and tools that build on it, like stackimpact [7]


See also:


If you want to look at an already-running process, consider pyrasite(-shell)

The Garbage collector, gc

Speed notes

Depending on what exactly you do, most of the crunch-time CPU can be spent in C implementations of important functions and not the interpreter, making it fast enough for most purposes, and one reason python is slow at some things and faster than you expect at others.


Number crunching can often be moved to something like NumPy or PyGSL.

You can also easily pull in C libraries you have source to with SWIG, and even those you don't with ctypes - see Python extensions. You'll often need some wrapper code to make this code work a little more pythonically.

In this way, python can be expressive in itself as well as pull together optimized C code that would have more overhead than strictly necessary. (the reason using psyco can speed python up rather a lot sometimes)


TODO: read up on things like http://www.python.org/doc/essays/list2str.html


Text file line reading

Reading all lines from a file can be done in a few different ways:


  • readlines()
is a read() followed by splitting, so reads everything before returning anything
pro: less CPU than e.g. readline, because it's fewer operations
con: memory use is proportional to file size
con: slower to start (for large files)
note: you can tell it to read rouhly some amount of bytes. You'ld have to changes your logic, though.
  • readlines() with sizehint
basically "read a chunk of roughly as many lines as fit in this size"
pro: usually avoids the memory issue
pro: somewhat faster than bare readline()
con: needs a bit more code


Lazily:

  • iterate over the file object
generally the cleanest code cleanest way to go
  • iterate over iter(readline, ) (basically wrapping readline into a gnerator)
  • individual readline()s
generator style (pre-py2.3 there was an xreadlines(), which was deprecated in favour of this)


These three are functionally mostly the same

Sometimes one of these variants is slightly nicer, e.g. the brevity of putting that in a for line in ... versus more conditional control of individual readline() calls
note: EOF test varies:
the iterator tests it, wheras calling readline()
is whether len(line)==0 because it leaves in the (possibly-translated) newline



Debugging

See the IDE section; some offer debugging features.


Failing that, try pdb, the python debugger. (Maybe through Stani's Python Editor? Have never used that.)


To get information about an exception such as the stack trace - without actually letting the exception terminate things - use the traceback module. Most people will instead want more formatting by using the cgitb module, which gives more useful information, and can be used in a web server/browser output, but also be set to output plain text


See also pylint and PyChecker



Snippets

Detect OS and/or path style

Note that you do not need to know the OS to split and join paths elements correctly -- you can rely on os.path for that.

You can get the
uname
[8] fields or a good imitation:
os.uname       returns a 5-tuple: (sysname, nodename, release, version, machine)
platform.uname returns a 6-tuple: (sysname, nodename, release, version, machine, processor)

Note that the contents of the following are relatively free-form. For example, examples for playform.uname:

('Linux',   'zeus',  '2.6.34-gentoo-r12', '#5 SMP Wed May 25 01:15:12 CEST 2011', 'i686', 'Pentium(R) Dual-Core CPU E5700 @ 3.00GHz')
('Windows', 'spork', '7',                 '6.1.7600',                             'x86',  'Intel64 Family 6 Model 23 Stepping 10, GenuineIntel')


You can detect the path style, via the path separator (os.sep, os.path.sep - one of those is probably deprecated, and I should figure out which one) to figure out what style of paths we should be using, and as a hint of what OS we are on. Don't use this for path string logic -- you can do things safely using os.path functions.

if os.sep=='/':
    print "*nix-style paths"
elif os.path.sep=='\\':
    print "Windows-style paths"
else:
    print "Very Weird Things (tm)"

Note that Windows CE has a single root instead of drive letters, but still uses backslashes. It is hard to completely unify path logic because of such details.


Python under windows is slightly smart about programmers mixing \ and /. That is to say, mixes will work when python processes path logic itself (e.g. open(), not values passed verbatim into subprocess/popen/system calls).

Also, note that this is a(nother) reason that that string equality is not a good test for path equality, and that you shouldn't do path splitting and such with your own string operations (look to os.path.something functions).

There may be other details, so don't be lazy - use the path splitting and joining functions instead of appending a character yourself.


You can also inspect:

  • sys.platform (e.g. linux2, darwin, win32, cygwin, sunos5, and various openbsd and netbsd strings)
  • os.name (e.g. posix, nt, ce)
  • or even os.environ

Some library notes

3D

PyGame

Win32 interface

pywin32, previously known as win32all, provides hooks into various parts of windows. Apparently with central module win32api. (see also its help file)

downloadable from sourceforge and with a homepage here.


Some code for this:


GPGPU

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Reikna

Clyther

PyOpenCL


PyStream


cudamat

scikits.cuda


gnumpy

Theano


Unsorted

Creating ZIP files (in-memory)

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

We can directly use in-memory data and StringIO objects here (so this is simpler than doing the same with tarfile).

You can add content with either:

  • ZipFile.write - takes filename it will open - useful when you just wanted more control over what files to add
  • ZipFile.writestr takes a filename/ZipInfo and a bytestring - the filesystem isn't touched, which is probably what you want when adding in-memory data to in-memory zip files.
# Just an example snippet
zip_sio=StringIO.StringIO()
z = zipfile.ZipFile(zip_sio, "w", zipfile.ZIP_DEFLATED) # or another compression level
 
for filename,filedata in (('filename1.txt', 'foo'),
                          ('filename2.txt', 'bar')):
   z.writestr(filename,filedata)
z.close()
return zip_sio.getvalue()


FFTs

There are a few implementations/modules, including:

  • fftpack: used by numpy.fft, scipy.fftpack


Speed-wise: FFTW is faster, numpy is slower, scipy is slower yet (not sure why the np/sp difference when they use the same code) Think a factor 2 or 3 (potentially), though small cases can be drowned in overhead anyway.

Also, FFTW planning matters.


It does matter how the coupling works - there for example are more and less direct (overhead-wise) ways of using FFTW.

TODO: figure out threading, MPI stuff.


See also:


Bytecode / resolve-related notes

  • .py - source text
  • .pyc - compiled bytecode
  • .pyo - compiled bytecode, optimized. Written when python is used with -O. The difference with pyc is currently usually negligible.
  • .pyd - a (windows) dll with some added conventions for importing
(and path-and-import wise, it acts exactly like the above, not as a linked library)
note: native code rather than bytecode


All of the above are searched for by python itself.

Python it will generate pyc or pyo

when they are imported them (not when the modules are run directly)
...with some exceptions, e.g. when importing from an egg, or zip file it will not alter those
...which means it can, for speed reasons, be preferable to distribute those with pyc/pyo files in them


There is some talk about changing these, see e.g. PEP 488


See also: