Python notes - semi-sorted

From Helpful
(Redirected from Python usage notes/Unsorted)
Jump to: navigation, search
Syntaxish: syntax and language · changes and py2/3 · importing, modules, packages · iterable stuff · concurrency

IO: networking and web · filesystem

Data: Numpy, scipy · pandas, dask · struct, buffer, array, bytes, memoryview · Python database notes

Image, Visualization: PIL · Matplotlib, pylab · seaborn · bokeh · plotly


Concurrency (threads, processes, more) · joblib · pty and pexpect

Stringy: strings, unicode, encodings · regexp · command line argument parsing · XML

date and time

speed, memory, debugging, profiling

semi-sorted



Ipython

iPython is a collection of:

  • an interactive shell, more featured than python's own.
http://ipython.org/ipython-doc/rel-0.12/interactive/tutorial.html


  • integrates with some interactive data visualization
  • integrates with GUI toolkit
...both of which are used in...
  • notebooks - served via browser, allows embedded code, text, plots, mathematical expressions


  • some tools for parallel computing (due to itself being abstracted out this way)
  • makes it easier to embed an interpreter into your own project


  • I like the way it eases hooking in profiling into ipython, via its magic functions:
 %time - how much time (one run)
 %timeit - how much time (in a bunch of runs, at least a second's worth?(verify))
 %prun - how much time, per function
 %lprun - how much time, per line
 %mprun, %memit - how much memory per function (once, a bunch)

See also:


notebooks and jupyter

Python notebooks mean you can play through a web interface.

Notebooks are webpage frontend (to a interactive backend), this makes it

easier to play with code visually than the shell
easier to use remotely
easier to persist the notebook
easier to persist the interpreter behind it (to a degree)


This means less typing and more prettiness while you're doing plotting, math, or anything else you can manage via the python shell.

You can copy the notebooks elsewhere, bootstrapping other people to do similar experiments to yours.

For some examples, see e.g. https://github.com/jupyter/jupyter/wiki/A-gallery-of-interesting-Jupyter-Notebooks


This used to be part of ipython (and called ipython notebooks), but this has since widened, and became more of a protocol (it already was fairly agnostic, being largely JSON over 0MQ) to whatever sort of backend you want.

That framework is called jupyter, and ipython is just one of its possible kernels - see this list for more.

You can build your own, for existing languages, or basically build your own DSL to play with.

If you like the technical details, see e.g.

https://ipython.org/ipython-doc/3/development/how_ipython_works.html
https://ipython.org/ipython-doc/3/development/messaging.html

The below mostly focuses on python.



Basic use

The actual notebook gets stored in the current directory (so you may wish to organize notebooks into directories a bit) where you run:

jupyter notebook


By default it binds to 127.0.0.1:8888 and locally launches a browser.

When working remotely
consider using a SSH tunnel like
ssh -L localhost:8888:localhost:8888 workhost
(and pointing your browser at 127.0.0.1:8888 on the SSH-client side)
On a trusted LAN you can consider doing
--ip=0.0.0.0
(and maybe
--port=80
) so that it's easily reachable.


You then probably want to look at Help → keyboard shortcuts.

Most important to start with is probably ShiftEnter: Run, go to next cell



JupyterLab

Basically a more extended/extensible, more integration and more convenience


JupyterHub

Basically, it's a login service (e.g. PAM, OAuth) around keeping track of notebooks.

The notebooks are still single-user things as before.

It means only one person has to figure out install, so it's a low-threshold thing for things like

classrooms (there are also homework/grading extensions)
workshops (see what everyone's doing)
academia / teams (share what everyone's doing)

...though there's no overly easy way of sharing?(verify)

Setting the process name

It's OS-specific stuff, so there is no short portable way.

The easiest method is to install/use the setproctitle module - it aims to be portable and try its best on various platforms. Example use:

try:
    import setproctitle
    setproctitle.setproctitle( os.path.basename(sys.argv[0]) )
except ImportError:
    pass

The above often means 'use the filesystem name that was used to run this' - but not always, so a hardcoded string can make sense.

Additional notes you may wish to have seen some time

array, deque

A list can be used as a sort of deque, like:

append(val)     # insert on right side 
pop()           # take from right side
insert(0,val)   # insert on left side
pop(0)          # take from left side

However, list is primarily efficient for stack-like use - list.pop(0) and list.insert(0, val) are O(n) operations,

...while those are O(1) operations on collections.deque (added in 2.4). deque also has appendleft(), extendleft(), and popleft(), and some others that can be convenient and/or more readable.

You may also wish to know about the queue nodule, a multi-producer, multi-consumer, thread-safe queue.

See also:

Date stuff

Snippets

Detect OS and/or path style

Note that you do not need to know the OS to split and join paths elements correctly -- you can rely on os.path for that.

You can get the
uname
[1] fields or a good imitation:
os.uname       returns a 5-tuple: (sysname, nodename, release, version, machine)
platform.uname returns a 6-tuple: (sysname, nodename, release, version, machine, processor)

Note that the contents of the following are relatively free-form. For example, examples for playform.uname:

('Linux',   'zeus',  '2.6.34-gentoo-r12', '#5 SMP Wed May 25 01:15:12 CEST 2011', 'i686', 'Pentium(R) Dual-Core CPU E5700 @ 3.00GHz')
('Windows', 'spork', '7',                 '6.1.7600',                             'x86',  'Intel64 Family 6 Model 23 Stepping 10, GenuineIntel')


You can detect the path style, via the path separator (os.sep, os.path.sep - one of those is probably deprecated, and I should figure out which one) to figure out what style of paths we should be using, and as a hint of what OS we are on. Don't use this for path string logic -- you can do things safely using os.path functions.

if os.sep=='/':
    print "*nix-style paths"
elif os.path.sep=='\\':
    print "Windows-style paths"
else:
    print "Very Weird Things (tm)"

Note that Windows CE has a single root instead of drive letters, but still uses backslashes. It is hard to completely unify path logic because of such details.


Python under windows is slightly smart about programmers mixing \ and /. That is to say, mixes will work when python processes path logic itself (e.g. open(), not values passed verbatim into subprocess/popen/system calls).

Also, note that this is a(nother) reason that that string equality is not a good test for path equality, and that you shouldn't do path splitting and such with your own string operations (look to os.path.something functions).

There may be other details, so don't be lazy - use the path splitting and joining functions instead of appending a character yourself.


You can also inspect:

  • sys.platform (e.g. linux2, darwin, win32, cygwin, sunos5, and various openbsd and netbsd strings)
  • os.name (e.g. posix, nt, ce)
  • or even os.environ


Some library notes

pytest notes

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)


Pytest wants to find tests itself - this is an inversion of control thing, with the point is that no matter how complex tests get, pytest can do everything for us without us having to hook it in specifically.


What does pytest pick up as files to be tested?

  • the filenames specified
  • if none specified:
filenames named like test_*.py or *_test.py on the directory tree under the current dir(verify)
You can control this discovery, see e.g. https://docs.pytest.org/en/6.2.x/example/pythoncollection.html


What does pytest pick up tests to be run?

  • functions prefixed
    test
    at module scope (verify)
  • Classes prefixed
    Test
    , and then functions prefixed
    test
    inside them
...but only if that class does not have an __init__.
These classes are not intended to be classes with state (and it does not actually instantiate the class(verify)), just to group test functions and pollute the namespace less.
  • classes subclassed from unittest.TestCase (see unittest)
  • by marker, e.g. pytest -m slow picks up things decorated with @pytest.mark.slow



What does pytest actually consider success/failure?

Roughly: each test function will

  • success: if it returns
and, if any
assert
s, only successful ones
  • failure: fails on the first failing
    assert
  • failure: fails on the first exception


Expressing further things

  • 'testing that something throws and exception has a few alternatives, but using a context manager is probably the shortest form:
with pytest.raises(ValueError, match=r'.*BUG.*'):
    raise ValueError("BUG: tell programmer")
Another would be catchng it and invoking pytest.fail
try:
   0/0
except ZeroDivisionError as exc:
   pytest.fail()


  • similarly, there is a pytest.warns to test that warnings happen
the context object variant is probably preferred here


More detailed testing can be done like

  • anything else that does asserts, like
unittest.assertEqual(a, b)
unittest.assertNotEqual(a, b)
unittest.assertTrue(x)
unittest.assertFalse(x)
unittest.assertIs(a, b)
unittest.assertIsNot(a, b)
unittest.assertIsNone(x)
unittest.assertIsNotNone(x)
unittest.assertIn(a, b)
unittest.assertNotIn(a, b)
unittest.assertIsInstance(a, b)
unittest.assertNotIsInstance(a, b)
...but most of those are shorter if you write the assert yourself.



Showing details

Pytest will try to pick up more to report for failed tests, e.g.:

comparing long strings: a context diff is shown
comparing long sequences: first failing indices
comparing dicts: different entries

There is more customization you can do - sometimes nice to get more actionable output from pytest runs.


On fixtures/mocking

Fixtures create reusable state/helpers for tests, and are great if you use the same data/objects.

For terminology, see Benchmarking,_performance_testing,_load_testing,_stress_testing,_etc.#Mocking.2C_monkey_patching.2C_fixtures

For what pytest helps with, see [2]

https://levelup.gitconnected.com/a-comprehensive-guide-to-pytest-3676f05df5a0


https://docs.pytest.org/en/latest/how-to/usage.html


3D

PyGame

Win32 interface

pywin32, previously known as win32all, provides hooks into various parts of windows. Apparently with central module win32api. (see also its help file)

downloadable from sourceforge and with a homepage here.


Some code for this:


GPGPU

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Reikna

Clyther

PyOpenCL


PyStream


cudamat

scikits.cuda


gnumpy

Theano


Unsorted

Creating ZIP files (in-memory)

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

We can use in-memory data and StringIO objects here - so this is sometimes simpler than doing the same with tarfile.

You can add content with either:

  • ZipFile.write - takes filename it will open - useful when you just wanted more control over what files to add
  • ZipFile.writestr takes a filename/ZipInfo and a bytestring - the filesystem isn't touched, which is probably what you want when adding in-memory data to in-memory zip files.
# Just an example snippet
zip_sio=StringIO.StringIO()
z = zipfile.ZipFile(zip_sio, "w", zipfile.ZIP_DEFLATED) # or another compression level
 
for filename,filedata in (('filename1.txt', 'foo'),
                          ('filename2.txt', 'bar')):
   z.writestr(filename,filedata)
z.close()
return zip_sio.getvalue()

FFTs

There are a few implementations/modules, including:

  • fftpack: used by numpy.fft, scipy.fftpack


Speed-wise: FFTW is faster, numpy is slower, scipy is slower yet (not sure why the np/sp difference when they use the same code) Think a factor 2 or 3 (potentially), though small cases can be drowned in overhead anyway.

Also, FFTW planning matters.


It does matter how the coupling works - there for example are more and less direct (overhead-wise) ways of using FFTW.

TODO: figure out threading, MPI stuff.


See also:


Bytecode / resolve-related notes

  • .py - source text
  • .pyc - compiled bytecode
  • .pyo - compiled bytecode, optimized. Written when python is used with -O. The difference with pyc is currently usually negligible.
  • .pyd - a (windows) dll with some added conventions for importing
(and path-and-import wise, it acts exactly like the above, not as a linked library)
note: native code rather than bytecode


All of the above are searched for by python itself.

Python it will generate pyc or pyo

when they are imported them (not when the modules are run directly)
...with some exceptions, e.g. when importing from an egg, or zip file it will not alter those
...which means it can, for speed reasons, be preferable to distribute those with pyc/pyo files in them


There is some talk about changing these, see e.g. PEP 488


See also:

Further notes

!=None versus is not None