Python notes - semi-sorted
Syntaxish: syntax and language · type stuff · changes and py2/3 · decorators · importing, modules, packages · iterable stuff · concurrency · exceptions, warnings
IO: networking and web · filesystem Data: Numpy, scipy · pandas, dask · struct, buffer, array, bytes, memoryview · Python database notes Image, Visualization: PIL · Matplotlib, pylab · seaborn · bokeh · plotly
Stringy: strings, unicode, encodings · regexp · command line argument parsing · XML speed, memory, debugging, profiling · Python extensions · semi-sorted |
functools notes
functools.partial
Setting the process name
It's OS-specific stuff, so there is no short portable way.
The easiest method is to install/use the setproctitle module - it aims to be portable and try its best on various platforms. Example use:
try:
import setproctitle
setproctitle.setproctitle( os.path.basename(sys.argv[0]) )
except ImportError:
pass
The above often means 'use the filesystem name that was used to run this' - but not always, so a hardcoded string can make sense.
Temporary files
Python's standard library module tempfile has a number of choices:
- mkstemp[1]
- creates a temporary file
- returns (open_file_descriptor, abspath)
- the open file handle is fairly secure in the sense that others cannot easily get that handle(verify)
- but if you only wanted a unique file *name* you wouldn't care about that
- you can ask it to add a a prefix and/or suffix (both default to none)
- you can ask for it to be placed in a specific directory (defaults to system default)
- you are responsible for cleanup
- (if you want that to be automatic, look to TemporaryFile)
- if the OS implements O_EXCL, that file is accessible only for the creating user (plus admins(verify))
- mkdtemp[2]
- creates a temporary directory
- with similar 'only accessible for creating user'
- returns the abspath
- you are responsible for cleanup
More help, and context managers
- TemporaryFile[3]
- acts much like mkstemp
- ...but returns file-like object (rather than file handle)
- that is destroyed on close / gc
- ...but returns file-like object (rather than file handle)
- depending on OS/filesystem, the directory entry is either not created at all, or removed immediately after creation
- helps security a little (but not hard guarantee)
- implies the filesystem will clean up the backing data only after you close (what behind the scenes is still) the file handle
- NamedTemporaryFile[4]
- like TemporaryFile but does specifically have a directory entry
- its filename is in .name
- SpooledTemporaryFile[5]
- like TemporaryFile, but makes an attempt to buffer contents in memory a bit longer
- TemporaryDirectory[6]
- acts much like mkstemp
- returns a stringy object that can be used as a context manager
- removes contents and directory (fundamentally best-effort - there are ways to break that)
These are all usable as context managers, so instead of e.g.:
tmp = tempfile.NamedTemporaryFile()
print('temp.name:', tmp.name)
tmp.write(...)
tmp.close()
you can write:
with tempfile.NamedTemporaryFile() as tmp:
print(tmp.name)
tmp.write(...)
Notes:
- since more is handled for you, there are some more hidden edge cases, and hidden platform-specific details
- gettempdir() - finds the directory we put temporary files in (basically what is used if you supply dir=None to the above)
Temporary files are created in tempfile.gettempdir() unless you
- hand in dir= to every call
- set tempfile.tempdir to a default directory you want the module to use (has some details, so not necessarily recomended)
array, deque
A list can be used as a sort of deque, like:
append(val) # insert on right side
pop() # take from right side
insert(0,val) # insert on left side
pop(0) # take from left side
However, list is primarily efficient for stack-like use - list.pop(0) and list.insert(0, val) are O(n) operations,
...while those are O(1) operations on collections.deque (added in 2.4). deque also has appendleft(), extendleft(), and popleft(), and some others that can be convenient and/or more readable.
You may also wish to know about the queue nodule, a multi-producer, multi-consumer, thread-safe queue.
See also:
Some library notes
pytest notes
Pytest wants to find tests itself - this is an inversion of control thing,
so no matter how complex tests get, pytest can do everything for us without us having to hook it in specifically.
What files does pytest pick up to be tested?
- the filenames specified
- if none specified:
- filenames named like test_*.py or *_test.py on the directory tree under the current dir(verify)
- You can control this discovery, see e.g. https://docs.pytest.org/en/6.2.x/example/pythoncollection.html
How does pytest decide what code to run as tests?
- functions prefixed test at module scope (verify)
- classes prefixed Test, and then functions prefixed test inside them
- ...but only if that class does not have a constructor (__init__).
- These classes are not intended to be classes with state (and it does not actually instantiate the class(verify))
- just to collect functions (and potentially pollute a namespace less).
- classes subclassed from unittest.TestCase (see unittest)
- by marker, e.g. pytest -m slow picks up things decorated with @pytest.mark.slow
- useful to define groups of tests, and run specific subsets
What does pytest actually consider success/failure?
Roughly: each test function will be a
- success:
- if it returns, AND
- if all asserts contained (if any) are successful
- failure: on the first failing assert
- failure: on the first exception
There are "assert for me" functions, including:
- unittest.assertEqual(a, b)
- unittest.assertNotEqual(a, b)
- unittest.assertTrue(x)
- unittest.assertFalse(x)
- unittest.assertIs(a, b)
- unittest.assertIsNot(a, b)
- unittest.assertIsNone(x)
- unittest.assertIsNotNone(x)
- unittest.assertIn(a, b)
- unittest.assertNotIn(a, b)
- unittest.assertIsInstance(a, b)
- unittest.assertNotIsInstance(a, b)
...but many of those are shorter to write in your own assert
How do I test that something throws an exception (or does a warning)?
Has a few alternatives, also varying a little with whether you're testing that it should raise an error or that it doesn't.
The context manager form seems a brief-and-more-flexible way to filter for a specific error type and specific error text:
with pytest.raises(ValueError, match=r'.*found after.*'):
# code that whines about some value parsing
You could also e.g. catch the specific error as you normally would.
And if you need your test to fail in response, use use pytest.fail
try:
0/0
except ZeroDivisionError as exc:
pytest.fail()
For warnings, pytest.warns can be used as a context manager that works much the same,
- and the context object variant is probably easiest here
- note: warnings.warn() by default emits a UserWarning - see https://docs.python.org/3/library/warnings.html#warning-categories
with pytest.warns(UserWarning, match=r'.*deprecated.*'): #
Showing details
Pytest will try to give useful errors for failed tests, e.g. picking up the values that didn't compare as you wanted:
- comparing long strings: a context diff is shown
- comparing long sequences: first failing indices
- comparing dicts: different entries
This can be customized, which is sometimes worth it to get more useful output from pytest runs.
On fixtures/mocking
Benchmarking,_performance_testing,_load_testing,_stress_testing,_etc.#Mocking.2C_monkey_patching.2C_fixtures Fixtures create reusable state/helpers for tests, and are great if you use the same data/objects.
In pytest, they are functions that are called before your function.
pytest has a few different things you could call fixtures.
Some given fixtures
Having a test function with a specifically named keyword, you get in some extra behaviour when pytest runs this test. Consider:
def test_download_to_file( tmp_path ):
tofile_path = tmp_path / "testfile" # this syntax works because tmp_path is a pathlib.Path object
download('https://www.example.com', tofile_path=tofile_path)
assert os.path.exists( tofile_path )
tmp_path means "create directory for you, hand it in for you to use, and we will clean it up afterwards", which is a great helper you would otherwise have to write yourself (and test in itself).
For some other given fixtures, see e.g.
- https://docs.pytest.org/en/6.2.x/fixture.html
- https://levelup.gitconnected.com/a-comprehensive-guide-to-pytest-3676f05df5a0
There is also @pytest.fixture decorator, which marks a function as a fixture To steal an example from [7], consider:
import pytest
@pytest.fixture
def hello():
return 'hello'
@pytest.fixture
def world():
return 'world'
def test_hello(hello, world):
assert "hello world" == hello + ' ' + world
To keep this the first example short, this is only remembering some values for us and handing them into functions. Which is fairly pointless.
In the real world this is probably mostly useful more useful for setup and teardown.
Consider an example from [8]
@pytest.fixture
def app_without_notes():
app = NotesApp()
return app
@pytest.fixture
def app_with_notes():
app = NotesApp()
app.add_note("Test note 1")
app.add_note("Test note 2")
return app
...which comes from a basic "soooo I spend the first lines of every test just instantiating my application, can't I move that out?"
Doing teardown seems to be done by using a generator (this is a little creative syntax-wise, but lets pytest does most of the work for you)
@pytest.fixture
def app_with_notes(app):
app.add_note("Test note 1")
app.add_note("Test note 2")
yield app # state handed to the test
app.notes_list = [] # clean up test's data
See also https://docs.pytest.org/en/7.1.x/how-to/fixtures.html
On coverage
To do coverage checking at all, add
--cov=dirname/
This mostly gets you a summary.
To see which lines aren't covered, read https://pytest-cov.readthedocs.io/en/latest/reporting.html
If you want it to generate a browsable set of HTML pages, try:
--cov-report html:coverage-report
https://docs.pytest.org/en/latest/how-to/usage.html
On parallel tests
Tests take a while, so it would be nice if you
They should be isolated things, right, right?
It's not a standard feature, presumably so that you don't blame pytest
for bad decisions in threading and nondeterminism, whether it is your own or that in a library you use (consider e.g. that selenium isn't thread-safe).
That said there is:
- pytest-xdist
- pytest-parallel
typer
Rich
3D
PyGame
Win32 interface
pywin32, previously known as win32all, provides hooks into various parts of windows. Apparently with central module win32api. (see also its help file)
downloadable from sourceforge and with a homepage here.
Some code for this:
GPGPU
Reikna
Clyther
- Write C-ish code (python subset; similar to cython)
- further removed from OpenCL, but more pythonic so ought to make for easier prototyping
- http://gpgpu.org/2010/03/09/clyther-python-opencl
PyOpenCL
- Lowish-level bindings for existing OpenCL
- useful if you already know OpenCL and may want to port stuff later
- http://mathema.tician.de/software/pyopencl/
PyStream
- Tries to tie in CUBLAS, CUFFT, seamless transfer to numpy
- https://code.google.com/p/pystream/
cudamat
- "basic dense linear algebra computations on the GPU using CUDA"
- https://github.com/cudamat/cudamat
scikits.cuda
gnumpy
Theano
Unsorted
- http://gpgpu.org/tag/python
- http://code.activestate.com/pypm/search:gpu/?tab=name
- http://fastml.com/running-things-on-a-gpu/
Creating ZIP files (in-memory)
We can use in-memory data and StringIO objects here - so this is sometimes simpler than doing the same with tarfile.
You can add content with either:
- ZipFile.write - takes filename it will open - useful when you just wanted more control over what files to add
- ZipFile.writestr takes a filename/ZipInfo and a bytestring - the filesystem isn't touched, which is probably what you want when adding in-memory data to in-memory zip files.
# Just an example snippet
zip_sio=StringIO.StringIO()
z = zipfile.ZipFile(zip_sio, "w", zipfile.ZIP_DEFLATED) # or another compression level
for filename,filedata in (('filename1.txt', 'foo'),
('filename2.txt', 'bar')):
z.writestr(filename,filedata)
z.close()
return zip_sio.getvalue()
FFTs
There are a few implementations/modules, including:
- fftpack: used by numpy.fft, scipy.fftpack
- FFTW3:
- anfft - not updated anymore
- PyFFTW3
- PyFFTW
- Apparently PyFFTW is a little newer but performs the same as PyFFTW3 (see a comparison)
Speed-wise: FFTW is faster, numpy is slower, scipy is slower yet (not sure why the np/sp difference when they use the same code)
Think a factor 2 or 3 (potentially), though small cases can be drowned in overhead anyway.
Also, Optimized_number_crunching#On_plans_and_wisdom matters.
It does matter how the coupling works - there for example are more and less direct (overhead-wise) ways of using FFTW.
TODO: figure out threading, MPI stuff.
See also:
Other extensions
- .py - source text
- .pyc - compiled bytecode
- .pyo - compiled bytecode, optimized. Written when python is used with -O. The difference with pyc is currently usually negligible.
- .pyd - a (windows) dll with some added conventions for importing
- (and path-and-import wise, it acts exactly like the above, not as a linked library)
- note: native code rather than bytecode
- (.pyi files are unrelated)
All of the above are searched for by python itself.
Python it will generate pyc or pyo
- when they are imported them (not when the modules are run directly)
- ...with some exceptions, e.g. when importing from an egg, or zip file it will not alter those
- ...which means it can, for speed reasons, be preferable to distribute those with pyc/pyo files in them
There is some talk about changing these, see e.g. PEP 488
See also:
pyi files
These are called stub files, and are part of type-checker tooling.
These should only be read for their function signatures. They are syntactically valid python but they should be considered metadata, and you should have no runtime behaviour of them.
IDEs have been known to store what they managed to infer in these files. (and I looked them up because I wanted to know the reason vscode often doesn't show me the actual code when I ask for definition (F12 / Ctrl-click). This seems to have been broken / half-fixed in ways only some people understand for a few years now)
They are metnioned in Type hints PEP (https://peps.python.org/pep-0484/ PEP-484)
Documentation
docstrings
documentation generators
https://wiki.python.org/moin/DocumentationTools
docstring formats
Not standardized. Options include:
- restructuredtext, a.k.a. reST
- now more common
- see also PEP 287
- earlier?
- numpy[10]
- google[11]
http://daouzli.com/blog/docstring.html
Further notes
!=None versus is not None
tl;dr:
- 99% of the time it makes no difference
- ...but in some cases it does. If you want a singular good habit that covers those cases too, it's is
Why?
PEP8 says that comparisons to singletons should be done with is resp. is not, not with equality (==).
Part of the reason is roughly that a class is free to implement the equality operator (==) any way it wants,
whereas is is defined by the language and will always do what you think.
Another is that you may not have perfect overview of what coercion does
- consider that 1 == True but 1 is not True.
Whenever a class doesn't redefine the equality operator, then == is often just as good.
Even when it does redefine it, it is still not necessarily a problem, due to what people usually use the equality comparison operator for.
Still, in general practice, is is cleaner.
One real-world example is numpy
Where you write
if ary==None:
What you probably wanted to check whether the variable has an array (or any object) assigned at all.
But numpy redefines the == comparison so that you can e.g. compare an array to an array, and an array to a scalar, evaluated element-wise -- and get an ndarray of bools, rather than a single python bool:
>>> numpy.array( [(1, 2), (2, 1)] ) == 2
array([[False, True],
[ True, False]])
And ary==None would almost always give an all-False matrix (...because numpy won't even be able to store None unless you use dtype=object, which is often specifically avoided in number crunching if you can).
global
(We are skipping the "never use globals" topic here)
tl;dr:
When you want to assign to a name, and you want that name to be a global,
you need to say so explicitly
...basically to make the distinction between "I want to create a new local variable with that name" or "I want you to look for a global".
So python insists you make yourself clear.
Until told olderwise, variable writes go to local scope.
So if you want to write to global scope, you have to say so:
global nameatglobalcope
As this FAQ notes,
this can be a little surprising, because you can always read variables from global scope (and other outer scopes)(verify) without asking for it. Or the fact that you can is surprising in itself.
One argument is that yes, maybe reaching into other scopes would be clearer when always explicit, but then you would either be doing that a load, or would be treating variables differently from imported modules and their functions, built-ins, and more - those are all effectively global too.
Instead, it may have more function to have to say 'global' only when you are doing something unusual -- so that it has specific debugging value.
You can argue that even if you only read, you can still use global to indicate you are accessing globals. Not because it does anything, but because it communicates to others you are mixing scopes. (eager linters consider this meh, though)
At a more technical level, global has two possible effects: (verify)
- if the variable did not exist in global scope, it now declares it there, (and binds it in your more local scope as it always would)
- if the variable already existed in global scope, it just binds it in your more local scope
nonlocal
If you understand the need for global, you may have also run into cases where that is crude, in the sense that you e.g. might just want a local function to reach the outer function's state, and didn't want to do that via an actual global.
Python 3 introduced nonlocal, which binds to the nearest containing scope that has the name defined.
https://docs.python.org/3/reference/simple_stmts.html#the-nonlocal-statement
https://stackoverflow.com/questions/5218895/python-nested-functions-variable-scoping