Python usage notes - Matplotlib, pylab

From Helpful
Jump to: navigation, search
Various things have their own pages, see Category:Python. Some of the pages that collect various practical notes include:


Intro

Matplotlib is a graphing package that allows you to fairly easily create various types of graphs.

It is modelled after Matlab's plotting, hence the name.

It integrates into ipython notebooks, which you may like for interactive playing around.


Minimal example of plotting a line:

 

More varied examples:


Related/alternative software you may like

  • Mayavi - 3D, similar interface. If you want interactive 3D plots, this is often much faster.
See also http://code.enthought.com/projects/mayavi/
  • PyQwt - provides Qt widgets with fast plotting


  • seaborn - easier to visualize analyses (so useful if you e.g. use pandas or similar), in more interesting graphs than just lines and bars
  • VisPy - more for visualisation

API choice

There is a choice between two major ways to call plot-related functions:

  • the object-oriented interface
keeps "current state of your graph" in objects.
which is clean and isolated when embedding, around threading
and tends to be a little cleaner for xomplex graphs
This page uses only this


  • pylab
a direct imitation of matlab, so if you're a matlab person this will be obvious for you.
most simple graphs can be done in fewer lines
the "current state of your graph" is stored in the module itself
also means
that you can't make more than one plot at a time
threading/subinterpreters will bite you


If you're new and want to learn just one style, I recommend the OO interface.

While for simple uses pylab may be simpler, once you run into issues you will probably need to learn the OO style and rewrite everything.


Examples (minus the imports, and 'actually show me it' command.)

OO style:

fig, ax = pyplot.subplots()     # actually a shorthand, see below
ax.plot([1,2,3,4], [5,1,3,2])   # plot some data
'some data')

Pylab equivalent:

#pylab.figure()   # omittable, the first plot() implies its creation. Later plots should clear it. 
pylab.plot([1,2,3,4], [5,1,3,2])
pylab.title('some data')

The difference becomes more pronounced when you have multiple axes, because the "I am currently drawing on this" is part of the state, not the calls.


See also:


Layout

Just one plot

This is shortest:

span style="color: #a05050; font-style: italic;"># its default is 1 row, 1 column


Adding figures on a grid as you go

Use [figure() and figure.add_subplot()

span style="color: #a05050; font-style: italic;"># figsize in inches (with the default dpi)
# same as 232. lay out two-row, three-column grid, create and select top middle axis
# ...fourth plot in the same grid, i.e. bottom left

Notes:

  • add_subplot only really uses those numbers to calculate position and size so each call is unrelated to others, and mixing grid sizes will make a mess
  • Keep in mind that the object representing a subplot is Axes and so many examples call their subplots 'axis' or such.
It may be better for you to call them plot or such, in that it can gets confusing that an Axes named axis has an Axis (the last as in the label-and-tick thing)

Spanning cells

A few ways (one alternative is subplot2grid()), I prefer the following :

# you may want to see the result before you try to read the coords
# basically settles the overall granularity of...
ax1 = fig.add_subplot( gs[0,  : ] )  # first row, all columns
ax2 = fig.add_subplot( gs[1, 0:2] )  # second row, left and middle cells
ax3 = fig.add_subplot( gs[1:, 2 ] )  # second and third row, last-column
ax4 = fig.add_subplot( gs[2,  0 ] )  # last row, first column              
ax5 = fig.add_subplot( gs[2,  1 ] )  # last row, second column             
plt.show()

Similar subplots

Subplots() is also handy. (To steal an example from here)

fig, axes = plt.subplots(3, 6, figsize=(12, 6))  # when row or col >1, axes becomes an array() instead
 
# makes it easier to do something like:


3D

or other slightly atypical handling will probably also need this somewhat longer style, e.g.

span style="color: #483d9b;">'3d')

See also: http://matplotlib.org/mpl_toolkits/mplot3d/tutorial.html

Spacing

With the above most things are handled for you.

You can control spacing around all the subplots, and between them, via subplots_adjust, but little else.


The defaults seem to be:

fig.subplots_adjust(left=0.125, bottom=0.1, right=0.9, top=0.9,  wspace=0.2, hspace=0.2)

Note that this positions the part of the plot that data is drawn in, and ignores the axis labels, titles, etc. If you care about that, look at tight_layout


tight_layout

The default is nicely spacious, but sometimes (e.g. when producing images) you don't want wasted space.

fig.tight_layout()
alters the positions of axes's contents to remove most whitespace (note: looks at ticklabels, axis labels, and titles - does not measure anything else that may be present outside the plot area)


It's experimental -- and apparently will stay so. Sometimes it's very handy, but try not to rely on it.

Various backends lack a direct implementation, which is why if you ask for it you may see:

tight_layout : falling back to Agg renderer

this seems to mean matplotlib is redoing the plot in Agg renderer to calculate the new positions (it needs to call get_renderer(), which not all backends provide), then and does the setting and drawing in your requested backend(verify). So you can ignore the warning if you don't care about the extra work(verify)


http://matplotlib.org/users/tight_layout_guide.html


free-style axes

Figure.add_axes( (left,bottom, width,height) ) will let you anchor axis edges at arbitrary fractions of the figure width/height.

This means you can e.g. place axes over others, have your own funky layouts.

Style notes

First check whether prettyplotlib does most of what you want :)


Drawing extra stuff

Grid

Goes a long way:

ax.grid(True) 


Optional keyword arguments:

  • which: 'major', 'minor', or 'both'
  • various style, e.g. color=’r’, linestyle=’-‘, linewidth=2, zorder=0

Keep in mind you can do major and minor differently like:

ax.grid(b=True, which='major', color='b', linestyle='-')   # solid blue
ax.grid(b=True, which='minor', color='r', linestyle='--')  # dashed red


You can also style lines later by fetching the objects, via calls like

ax.get_xgridlines()
ax.get_ygridlines()


See also:


for more control, see http://matplotlib.org/examples/axes_grid/demo_axes_grid.html

Lines in the data

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Sometimes you want to draw to point out boundaries, changes, particular values on a plot, etc.

Various functions will do horizontal, vertical lines, and/or specify fractions of axis size.



So when you want this in data coordinates, the simplest way may be to, within the same axis,

plot((x1, x2), (y1, y2), 'k-') 



http://stackoverflow.com/questions/16930328/vertical-horizontal-lines-in-matplotlib

Annotations

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

In its basic form, annotations are placed text, optionally using arrows.


You can specify coordinates as

'data'
: same as the data
'figure fraction'
: fraction within the figure
'axes fraction'
: fraction within the axis
axisobj.transData
: a fraction in another axis
'offset points'
: absolute offset in (verify)
figure points, figure pixels, axes points, axes pixels
polar
a 2-tuple, specifying x and y coordinate systems differently, e.g. ("data", "axes fraction")
matplotlib.patches.ConnectionPatch is sometimes easier to use, though
matplotlib.text.OffsetFrom(obj, (x,y))

See also things like [1]


Boxes around text happen when you specify bbox. The value to bbox is a dict that specifies how to draw it, a dict including things like:

boxstyle:"rarrow,pad=0.3"
fc:"cyan"
ec:"b"
lw:2

Note that one alternative is instantiating AnchoredText, which is how legends are made. You can get anchored things more generic than that.



Arrows (other than a bbox with larrow/rarrow/darrow for its shape) happen when you have an xy and an xytext: an arrow will be drawn from the latter to the former (and the text that is the first, non-keyword argument will also be drawn at the latter).

You can have them in different coordinate systems, even:

annotate('look at that',
  xy=(x1, y1),     xycoords='data',
  xytext=(x2, y2), textcoords='offset points' )


See also:

plot axes

Custom tick labels

Rotation of ticklabels

When you have a bunch of labels, or long labels, you may care to rotate them.

  • rotation='vertical' may be simplest
  • you can also use arbitrary rotation, like rotation=45
  • with tightly packed labels (and particularly with arbitrary rotation) you may also care about where the label/rotation is anchored. Basically:
rotation_mode=anchor means it is first align according to ha and va, then rotate
rotation_mode=None (the default) means first rotate, then align [2]
anchor is sometimes nicer with tigher-packed labels, the default sometimes looks more regular/spacious
see also [3]

For example, compare the following (and try swapping these ha values to see what goes wrong):

rotation=-45, rotation_mode='anchor', ha='left',  va='center'
rotation= 45, rotation_mode='anchor', ha='right', va='center'



Minor ticks

Example:

axis.xaxis.set_major_locator( matplotlib.ticker.MultipleLocator(1) )

See also:



Amount of ticklabels, content of ticklabels

You can control the amount of ticks you want to see (minimum, maximum) -- balancing clutter and information. You can also force them at specific points, but that's usually less handy.


You can also change the formatter. For most sorts of data you don't need to.

I have one report graph that is always on the order of days at most, in which case the default date formatting is a little vebose, and I prefer:

span style="color: #483d9b;">'%a %d\n%k:%M'# if major tick interval is more than <key> days, call strftime with <value>
    1.0    : '%a %d\n%k:%M',          # > 1 day
    1./24. : '%a %d\n%k:%M',          # > 1 hour
    1. / (24. * 60.): '%H:%M:%S.%f',  # > 1 minute
# the day/hour one which prints something like:  Thu 17
#                                                 22:30



Size of ticklabels

To set the default for all axes, see #Font_size

To alter for just one axis, try fontsize= on the axis call (often simpler), OR:

for tick in ax.xaxis.get_major_ticks():
    tick.label.set_fontsize(14)  # possibly make that mybasesize-2 or such

Less labeling

Sometimes the values on an an axis are redundant, and you can do a:

ax.get_xaxis().set_visible(False)

and/or

ax.get_yaxis().set_visible(False)


Drawing less

Removing the axis, as just mentioned, will remove the text and the tickmarks.


To remove only the lines of the main plot box: these are called the spines, so you can e.g.:

span style="color: #483d9b;">'top''right'


The background color is the frame.

You can remove it with frameon=False to the figure
save() settings can have its own overriding defaults, so sometimes you may need to force e.g. transparent=True on that call

Second axis

hostaxis.twinx() gives you a new axis based on the host.

If you want a single legend, it seems better to have two derived axes and not use the host.


http://stackoverflow.com/questions/5484922/secondary-axis-with-twinx-how-to-add-to-legend

Colors

Can be one of:

  • Single letter:
'b'
blue
'g'
green
'r'
red
'c'
cyan
'm'
magenta
'y'
yellow
'k'
black
'w'
white
  • names (from these, e.g.
    'forestgreen'
    , seems to be the CSS set(verify))
  • float-as-string': gray shades, e.g
    '0.4'
  • CSS style: e.g.
    '#eeefff'
  • RGB tuple, e.g.
    (0.5, 0.5, 0.8)
  • RGBA tuple, e.g.
    (0.5, 0.5, 0.8, 0.2)


Changing line color

  • Hand it into plot (c= or color=)
  • You can also alter the Line2D object returned by plt.plot()




Multi-colored lines:

Most solutions come down to plotting separate parts of the line, each with their own color.


There's:

not very readable, but more flexible in theory


  • figure out start and end indices in the data, and do a series of plot() calls, like:
plot(x[start:end], y[start:end], lw=2, c=segment_color[i]) 


  • numpy mask arrays, separate plot() calls
more readable if it's a single split based on x or y value.
The below example is for two segments. You'ld need to generalize it to be more flexibe
span style="color: #a05050; font-style: italic;"># note: won't work when they're python arrays
'data (avrot)''''Overall, and per type of thing'''"python">
ourfontsize = 10
matplotlib.pyplot.rcParams["font.size"] = ourfontsize
 
# The following are by default relative to font.size, but can be set to a fixed size
matplotlib.pyplot.rcParams['axes.titlesize']   = 'large'
matplotlib.pyplot.rcParams['axes.labelsize']   = 'medium'
matplotlib.pyplot.rcParams['figure.titlesize'] = 'medium'
matplotlib.pyplot.rcParams['legend.fontsize']  = 'large'
matplotlib.pyplot.rcParams['xtick.labelsize']  = 0.6*ourfontsize # if you want more control
matplotlib.pyplot.rcParams['ytick.labelsize']  = 0.6*ourfontsize
# (the last because you cannot hand in a scale, only a specific size)

These are factor scales. For reference (taken from font_manager.py):

'xx-small' : 0.579,
'x-small'  : 0.694,
'small'    : 0.833,
'smaller'  : 0.833,
'medium'   : 1.0,
'large'    : 1.200,
'larger'   : 1.2,
'x-large'  : 1.440,
'xx-large' : 1.728,


For more structural reuse, see styles: http://matplotlib.org/users/style_sheets.html#style-sheets


For the finest-grained (and fully-OO) control, either

  • use the fontsize argument on axis.set_title(), axis.set_xlabel(), axis.legend(), etc.
  • use setters, like
 


http://matplotlib.org/examples/pylab_examples/fonts_demo.html

Custom font

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)


See also:


3D plots

http://matplotlib.org/mpl_toolkits/mplot3d/tutorial.html

Interaction

Changing the status bar

This is updated by the current axis's
format_coord
function.

(The default function formats these vaues via the axis formatters)

It gets x and y, which are coordinates within the actual plot area, and should return a string.

span style="color: #a05050; font-style: italic;"># assuming you know them for your data
' radius=%.3f x=%4d y=%4d'

I regularly find I want to use state beyond x,y -- e.g. fetching a value from the data. You can hardcode a reference to a global -- or put your plotting into classes to cleanly keep the state.

Performance

Picking

span style="color: #483d9b;">'button=%d, x=%d, y=%d, xdata=%f, ydata=%f''button_press_event', onclick)


You can make arbitrary artist objects report events when you click near enough.

On figure/axis reuse

If you create a figure, then display it, then create a figure, etc., you'll probably have run into:

RuntimeWarning: More than 20 figures have been opened.


If you went the pyplot route, you can use:

pyplot.close([figure]) # the current figure (default), by reference, index, name, or 'all'


However, you can be faster and more resource-efficient:

(pyplot.clf clears the current figure)
(pyplot.cla clears the current axis)
  • set_data() on an axis (faster - often fast enough for realtime animation)


Blocking and non-blocking plot windows

The following is irrelevant for the non-interactive backends, because they just create a file and move on.


The sensible cases

Use from an ipython notebook
Entering
%matplotlib inline
within a notebook means from then on it will pick up on created plot objects, and render them as an inline image (static image in interactive context)

To get interaction with the plot, use:

  • %matplotlib nbagg
    (ipython 2)
  • %matplotlib notebook
    (since ipython 3)


From an interactive python prompt

Use ipython rather than plain python.

After a
%matplotlib
[4] it will notice when you produce a figure object(verify) and fires off a window for it. (Without an argument to the magic function it will choose a GUI backend for you. You can specify a specific one if you want.)


from one-off CLI scripts

If you have a script in the "gather data, show the results, done" style, then it is actually useful to block program flow while the plots are shown. (in that it means you don't have to ensure some sort of threading, only to have to listen for the figure close to postpone the program's exit)

...which is more or less the default when you use
pyplot.show()
with non-embedded interactive GUI backends: They'll create their own window with their own event loop, which is the thing that blocks until the plot window(s) it created are closed.


Plots with their own interaction

Everything you want to happen can be hooked into interactive events (or a timer), and thereby just the figure's mainloop.

See e.g. some examples here: http://matplotlib.org/examples/widgets/

You There are inbetweens with more control, if you don't mind coding for a specific backend, see e.g. the example here


Preprogrammed animations

Matplotlib has some built-in, on-a-timer style animation.

These are not interactive (although you can combine this, see the previous point).

See the examples: http://matplotlib.org/examples/animation/


Embedded in an existing GUI window

That is, you already created a GUI, and happen to want to put a plot somewhere in a window.

You would want to integrate it into your GUI program's event loop. There are examples, see http://matplotlib.org/examples/user_interfaces/

The interesting cases

from mostly-CLI programs (or the basic python shell)


I frequently write a script that gathers some data, shows it, and updates it later (Be it iterations done within the same program, or watching files, a database, or whatnot)

This means a few different details

  • leaving the originating program interactive
  • leaving the plot window interactive
  • ensuring draws-to-screen happen
  • having the originating program wait for the window to close before quitting itself

The exact solution depends on what exactly you are doing. (TODO: examples)


Interactive mode is a good part of the answer (but also a bit confusing, details vary a bit with backend, and parts are a bit experimental). Interactive mode refer more to how it leaves your shell than to the plot itself.(verify)

means a show() will not block
means things will not draw() on every state change (verify)

It does not directly imply that a GUI window will be independent and drawing. Says http://matplotlib.org/faq/usage_faq.html#what-is-interactive-mode

Use of an interactive backend (see What is a backend?) 
permits–but does not by itself require or ensure–plotting to the screen. 

Whether and when plotting to the screen occurs, and whether a script or shell session
continues after a plot is drawn on the screen, depends on the functions and methods that are called,
and on a state variable that determines whether matplotlib is in “interactive mode”.

The default Boolean value is set by the matplotlibrc file, and may be customized
like any other configuration parameter (see Customizing matplotlib). 
It may also be set via matplotlib.interactive(), and its value may be
queried via matplotlib.is_interactive().

pyplot.pause:

If there is an active figure it will be updated and displayed,
and the GUI event loop will run during the pause.

If there is no active figure, or if a non-interactive backend
is in use, this executes time.sleep(interval).

This can be used for crude animation.


Notes so far:

  • setting interactive mode:
matplotlib.interactive(True)
and/or
pyplot.ion()
matplotlib.interactive(False)
and/or
pyplot.ioff()
toggling it in the middle of your program is probably more confusing than useful
  • in interactive mode, OO-style (or altering plot state manually) will not imply a draw() (pylab-style calls still do)
(seems to be for performance(verify))
so do an explicit pause() or draw()
  • calling draw() not actually force a draw on-screen (when and because the backend is separate from us), so...
  • pyplot.pause(0.0001)
    roughtly makes sure the redraw actually happens nowish
(apparently can be used instead of draw())
was meant to pace animations that are non-interactive, it just turns out we can abuse it.
is basically draw(), show(block=False), and runs a matplotlib-side event loop so that our interaction can make it to the GUI-side event loop(verify) (see also canvas.start_event_loop())
  • to avoid script-exit meaning closing everything
it's close enough to do:
pyplot.pause(10000000)
(three months. Add more zeroes if you care)
show(block=True) will probably effectively be the same (verify)
  • (there are many footnotes that I don't necessarily mention here)


The problem cases

GUI window from GUI shells

That is, from IDEs like IDLE and such.

Short answer: When it specifically cooperates with matplotlib, it can work. If not, you have two GUI event loops in the same interpreter(verify), which is messy at best.


Says http://matplotlib.org/users/shell.html#other-python-interpreters

Gui shells are at best problematic, because they have to run a mainloop,
but interactive plotting also involves a mainloop.  
Ipython has sorted all this out for the primary matplotlib backends.
There may be other shells and IDEs that also work with matplotlib in interactive mode,
but one obvious candidate does not: the python IDLE IDE is a Tkinter gui app
that does not support pylab interactive mode, regardless of backend.


matplotlib in browsers

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

mpld3 notes

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)


Digging deeper, necessary hacks, lower level notes, etc.

Dates

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

The easiest way to deal with dates is to hand in datetime objects.

Matplotlib will do everything for you, primarily choosing an appropriate axis ticker formatter (AutoDateFormatter) (also convert the dates to its interal format).


When you want more manual or lower-level control (e.g. update the plot via set_data), then you probably have to deal with its internal format, possibly set a AutoDateFormatter yourself.

(You can hand in numbers representing dates and e.g. use plot_date() over plot(), but there are no such alternatives for various other plot functions.)


There is:

  • matplotlib's internal format - floating point, days since 0001-01-01 00:00:00 (plus one)
you tend to not have to care about the actual values, until you care about interactive plots or such
  • datetime
matplotlib.dates.date2num takes a datetime and converts it to such a number
matplotlib.dates.num2date takes such a number and returns a datetime
(will handle a single value or a sequence)
  • unix timestamp
matplotlib.dates.epoch2num takes unix time
matplotlib.dates.num2epoch produces unix time
(will handle a single value or a sequence)
  • mx2num and num2mx


As always, keep on the watch for timezone stuff (e.g. when converting from datetime or mx). Google around.


See also:

Running from minimal environments (like a web server)

To run without X, choose a backend that uses it, before a call that implicitly chooses a backend for you (pylab, also pyplot).

Choices include Cairo (usually looks nicest), Agg, or things like ps or pdf, and more. For example:

span style="color: #483d9b;">'Cairo')


matplotlib wants a HOME directory, and it should be writeable. This usually only matters when embedding in web environments (apache, mod_wsgi, CGI) where HOME is typically not set. You should alter this before you import matplotlib, like:

span style="color: #483d9b;">'HOME''HOME']='/tmp/' #a unique, empty subdir of /tmp would probably be slightly safer

Writing to images (memory-only)

This will be somewhat specific to the backend you prefer.


Cairo

Cairo's savefig() can take (c)StringIO objects.

Which is easy, and more memory-efficient than going via an additional uncompressed raster.

By default it writes PNGs, and you can make it write other formats by handing along a format parameter to savefig, such as savefig(sio,format='PNG'), or one of 'PDF', 'PS', 'EPS', 'SVG'.


For example, we might wrap our plot-generating code like:

span style="color: #a05050; font-style: italic;"># plot code
'PNG'


Agg

You can ask Agg's figure/canvas for RGB pixel data (note: easily large). Can be useful when you want to send it straight to PIL.


#If you do things at the figure level (i.e. pylab), get the underlying canvas, 
# and make sure the figure is drawn on the canvas (you may well get an empty image if it isn't)
# Get the canvas' pixel size as a (x,y) tuple; you'll need to hand this to PIL along with the raw data.
# (You could probably do this at canvas level, but I prefer to use the underlying renderer so that you don't have to remember the dpi)
# apparently this became floats at some point, 
                                             # which the below does not like.
# Export raw image (also possible at canvas level) 
# then e.g. read it into a PIL image. 
'RGB'# You can use transparency, but ARGB is larger and takes a little more code to explain to PIL
#Make PIL save the image and store it into a string:
im.save(pngdata, 'PNG')




Backend list

These lists will change over time

Image backends:

  • Cairo
  • Agg
  • PS
  • PDF
  • SVG
  • EMF
  • pgf (referring to pgfplots(verify))

UI/Interactive backends:

  • TkAgg (probably the most portable)
  • WX
  • WXAgg
  • QtAgg
  • Qt4Agg
  • Qt5Agg
  • gdk
  • GTK
  • GTKCairo
  • GTKAgg
  • CocoaAgg
  • MacOSX (verify)
  • FltkAgg

Special/unsorted cases:

  • WebAgg - used in ipython(verify), also independently in browsers
  • nbAgg - used in ipython(verify)


User-made:

  • matascii


Notes:

  • Most backends can savefig() to a file.
  • The UI backends also react to show() and are often interactive.
  • Producing raster images for web use may be easiest with Cairo, because:
It allows StringIO-based saving, which most others do not
You do not have to switch backends to produce PNG, PDF, EPS, SVG -- which is useful because you cannot switch backends (an imposed restioction because not all backends react well to this) when you use persisting interpreters, which might apply to WSGI, mod_python and more.


See also: