Python notes - semi-sorted
| Syntaxish: syntax and language · type stuff · changes and py2/3 · decorators · importing, modules, packages · iterable stuff · concurrency · exceptions, warnings
IO: networking and web · filesystem Data: Numpy, scipy · pandas, dask · struct, buffer, array, bytes, memoryview · Python database notes Image, Visualization: PIL · Matplotlib, pylab · seaborn · bokeh · plotly
Stringy: strings, unicode, encodings · regexp · command line argument parsing · XML Date and time: date and time
speed, memory, debugging, profiling · Python extensions · semi-sorted |
functools notes
functools.partial
partial on positional arguments
Partial takes a function and one or more argument values, and creates a new callable that 'freezes' those arguments:
- the existing function is always called with the supplied values,
- and the new function will have fewer positional arguments
For example:
from functools import partial
# if you are given
def power(b, e):
return b ** e
# then you can declare
square = partial(power, 2)
# which is a new function
square(4) == 16
"Can't I just write that as functions, or lambdas?"
Absolutely. Consider:
square = lambda b: power(b, 2)
# and
def square(b):
return power(b, 2)
"Then why?"
It's sometimes cleaner.
For example, if there are further arguments, you don't have to mention them, and they are still there -- you've made a specific-purpose function, and in a brief piece of code.
You didn't have to list all the further arguments, or had to worry about the difference between positional and keyword arguments, which you would have to think about more when doing the same in a lambda / more fully written out variant.
Note that it has the effect of removing initial positional arguments
- ...so any positional arguments left over still need values.
- and this is sometimes the point (for similar reasons to why lambdas are sometimes used to change the signature of a callable you hand in)
def f(a,b,c,d):
print(a,b,c,d)
g = partial(d, 1, 2)
g(3,4)
partial on keyword arguments
Partial was arguably made for positional argments.
It works on keywords, but does something different.
Particularly, keyword arguments effectively just changes defaults on the named parameters. It alters the value which get used when the caller doesn't' specify that variable - and these arguments are not removed, so a caller can always still change it.
This is sometimes great, usually if your purpose is "I wanted to make a more specialized variant of a function"
Consider for example:
parse_binary = partial(int, base=2)
# which you can then use like
parse_binary('10010') == 18 # because that is now equivalent to int('10010', base=2)
Pretty clean in that example use, right?
(And yet it also it still allows parse_binary('10010', base=4), which is mostly just confusing)
To dive a little deeper on the how and why, consider:
def f(a, b=2, c=3, d=4, e=5):
print( a,b,c,d,e )
g_partial = partial(f, c=8)
# inspect.signature(f) gives <Signature (a, b=2, c=3, d=4, e=5)>
# inspect.signature(g_partial) gives <Signature (a, b=2, *, c=8, d=4, e=5)>
Note that c is still in there.
As to the different-default part, you can think of partial as doing something like:
def g(*args, **kwargs):
kwargs.setdefault("c", 8)
return f(*args, **kwargs)
To continue the 'not removed' part: you couldn't remove without messing up anything that passed in via position.
- this new function adds *, meaning keywork arguments only after the first position of parameter you mentioned to partial
- this doesn't prevent you from calling g_partial with many positional arguments, it just breaks if you try:
g_partial(6,7,8)
# f() got multiple values for argument 'c'
- This is arguably good, in that it forces you to explicitly use keywords, which avoids ambiguity.
"Can I have partial that removes keywords?"
No.
If you want that, you have to do it yourself again:
def g_func(a, b=2, *, d=4, e=5): # the * is not required, it may just be a good idea
f(a,b=b,c=8,d,e)
# if you didn't know, lambdas can have keywords and defaults as usual, so you ''can'' do the same:
g_lambda = lambda a, b=2, *, d=4, e=5: f(a,b=b,c=8,d,e)
"What do people use it for?"
Example: passing things through without kwargs
Example: composing functions
Example: Set-and-remove arguments for interface reasons
functools.cache
functools.cache is a decorator that adds memoization at the level of the function and its parameters: it caches return values for unique argument values.
It goes without saying this is only a good idea for deterministic functions without side effects.
The classic example might be the fibonacci sequence when written in the readable-but-otherwise-dumb recursive style:
import functools
@functools.cache
def fibonacci(n):
if n==0:
return 0
elif n==1:
return 1
else:
return fibonacci(n - 1) + fibonacci(n - 2) # recursion
Without that decorator adding memoization, most time is spent in explosively many redundant calls, which starts to be very slow around 35..40.
With that cache of everything done before, you still run into python's the maximum recursion depth around 1500, but the calculation itself takes negligible time.
It would not be hard to write that cache yourself (and for more complex cases you may have to),
but the above is equivalent, and reads cleaner.
It will cache all distinct calls, which for some functions might grow a lot. If that is a problem, look to functools.lru_cache
https://docs.python.org/3/library/functools.html#functools.cache
Setting the process name
It's OS-specific and not in-python stuff, so there is no short portable way.
The easiest method is to install/use a module which has figure this out for you, e.g. setproctitle. Example use:
try:
import setproctitle
setproctitle.setproctitle( os.path.basename(sys.argv[0]) )
except ImportError:
pass
The above often means 'use the filesystem name that was used to run this' - but not always, so a hardcoded string can make sense.
Temporary files
Python's standard library module tempfile has a number of choices:
- mkstemp[1]
- creates a temporary file
- returns (open_file_descriptor, abspath)
- the open file handle is fairly secure in the sense that others cannot easily get that handle(verify)
- but if you only wanted a unique file *name* you wouldn't care about that
- you can ask it to add a a prefix and/or suffix (both default to none)
- you can ask for it to be placed in a specific directory (defaults to system default)
- you are responsible for cleanup
- (if you want that to be automatic, look to TemporaryFile)
- if the OS implements O_EXCL, that file is accessible only for the creating user (plus admins(verify))
- mkdtemp[2]
- creates a temporary directory
- with similar 'only accessible for creating user'
- returns the abspath
- you are responsible for cleanup
More help, and context managers
- TemporaryFile[3]
- acts much like mkstemp
- ...but returns file-like object (rather than file handle)
- that is destroyed on close / gc
- ...but returns file-like object (rather than file handle)
- depending on OS/filesystem, the directory entry is either not created at all, or removed immediately after creation
- helps security a little (but not hard guarantee)
- implies the filesystem will clean up the backing data only after you close (what behind the scenes is still) the file handle
- NamedTemporaryFile[4]
- like TemporaryFile but does specifically have a directory entry
- its filename is in .name
- SpooledTemporaryFile[5]
- like TemporaryFile, but makes an attempt to buffer contents in memory a bit longer
- TemporaryDirectory[6]
- acts much like mkstemp
- returns a stringy object that can be used as a context manager
- removes contents and directory (fundamentally best-effort - there are ways to break that)
These are all usable as context managers, so you can write:
with tempfile.NamedTemporaryFile() as tmp:
print(tmp.name)
tmp.write(...)
instead of
tmp = tempfile.NamedTemporaryFile()
print(tmp.name)
tmp.write(...)
tmp.close()
Notes:
- since more is handled for you, there are some more hidden edge cases, and hidden platform-specific details
- gettempdir() - finds the directory we put temporary files in (basically what is used if you supply dir=None to the above)
Temporary files are created in tempfile.gettempdir() unless you
- hand in dir= to every call
- set tempfile.tempdir to a default directory you want the module to use (has some details, so not necessarily recomended)
array, deque
A list can be used as a sort of deque, like:
append(val) # insert on right side
pop() # take from right side
insert(0,val) # insert on left side
pop(0) # take from left side
However, list is primarily efficient for stack-like use - list.pop(0) and list.insert(0, val) are O(n) operations,
...while those are O(1) operations on collections.deque (added in 2.4). deque also has appendleft(), extendleft(), and popleft(), and some others that can be convenient and/or more readable.
You may also wish to know about the queue nodule, a multi-producer, multi-consumer, thread-safe queue.
See also:
Some library notes
typer
Rich
3D
PyGame
Win32 interface
pywin32, previously known as win32all, provides hooks into various parts of windows. Apparently with central module win32api. (see also its help file)
downloadable from sourceforge and with a homepage here.
Some code for this:
GPGPU
Reikna
Clyther
- Write C-ish code (python subset; similar to cython)
- further removed from OpenCL, but more pythonic so ought to make for easier prototyping
- http://gpgpu.org/2010/03/09/clyther-python-opencl
PyOpenCL
- Lowish-level bindings for existing OpenCL
- useful if you already know OpenCL and may want to port stuff later
- http://mathema.tician.de/software/pyopencl/
PyStream
- Tries to tie in CUBLAS, CUFFT, seamless transfer to numpy
- https://code.google.com/p/pystream/
cudamat
- "basic dense linear algebra computations on the GPU using CUDA"
- https://github.com/cudamat/cudamat
scikits.cuda
gnumpy
Theano
Unsorted
- http://gpgpu.org/tag/python
- http://code.activestate.com/pypm/search:gpu/?tab=name
- http://fastml.com/running-things-on-a-gpu/
zipfile notes
The zipfile library,
Creating ZIP files (in-memory)
We can take in-memory data and write to a StringIO objects, so we never need to touch the filesystem.
For context, you can add content with either:
- ZipFile.write - takes a filename, it will open it itself.
- ZipFile.writestr takes a filename/ZipInfo and a bytestring
- the filesystem isn't touched, which is probably what you want when adding in-memory data to in-memory zip files
# Just an example snippet
import io, zipfile
zip_bio = io.BytesIO()
z = zipfile.ZipFile(zip_bio, "w", zipfile.ZIP_DEFLATED) # default compression level, you can override on each entry
for filename,filedata in (('filename1.txt', b'foo'),
('filename2.txt', b'bar')):
z.writestr(filename,filedata)
z.close()
zip_bio.getvalue() # zip file contents as bytes
FFTs
There are a few implementations/modules, including:
- fftpack: used by numpy.fft, scipy.fftpack
- FFTW3:
- anfft - not updated anymore
- PyFFTW3
- PyFFTW
- Apparently PyFFTW is a little newer but performs the same as PyFFTW3 (see a comparison)
Speed-wise: FFTW is faster, numpy is slower, scipy is slower yet (not sure why the np/sp difference when they use the same code)
- Think a factor 2 or 3 (potentially), though small cases can be drowned in overhead anyway.
Also, Optimized_number_crunching#On_plans_and_wisdom matters.
It does matter how the coupling works - there for example are more and less direct (overhead-wise) ways of using FFTW.
TODO: figure out threading, MPI stuff.
See also:
Other extensions
- .py - source text
- .pyc - compiled bytecode
- .pyo - compiled bytecode, optimized. Written when python is used with -O. The difference with pyc is currently usually negligible.
- .pyd - a (windows) dll with some added conventions for importing
- (and path-and-import wise, it acts exactly like the above, not as a linked library)
- note: native code rather than bytecode
- (.pyi files are unrelated)
All of the above are searched for by python itself.
Python it will generate pyc or pyo
- when they are imported them (not when the modules are run directly)
- ...with some exceptions, e.g. when importing from an egg, or zip file it will not alter those
- ...which means it can, for speed reasons, be preferable to distribute those with pyc/pyo files in them
There is some talk about changing these, see e.g. PEP 488
PYTHONDONTWRITEBYTECODE (or parameter -B) asks python not to create pyc and such, which is potentially useful...
- ...around docker, to reduce image size, and let containers be read-only
- ...when you know the pyc files are unlikely to match (e.g. distribution into a different python version)
- or if the same direcory is run by different python versions, which would overwrite the pyc files all the time (would not apply to dist-packages)
- ...when you edit the code continuously edited, because there are cases where py and pyc race to mismatch
- except this is not common, and py3 seems to be much better about this anyway
See also:
pyi files
These are called stub files, and are part of type-checker tooling.
These should only be read for their function signatures. They are syntactically valid python but they should be considered metadata, and you should have no runtime behaviour of them.
IDEs have been known to store what they managed to infer in these files. (and I looked them up because I wanted to know the reason vscode often doesn't show me the actual code when I ask for definition (F12 / Ctrl-click). This seems to have been broken / half-fixed in ways only some people understand for a few years now)
They are metnioned in Type hints PEP (https://peps.python.org/pep-0484/ PEP-484)
Documentation
docstrings
documentation generators
https://wiki.python.org/moin/DocumentationTools
docstring formats
Not standardized. Options include:
- restructuredtext, a.k.a. reST
- now more common
- see also PEP 287
- earlier?
- numpy[8]
- google[9]
http://daouzli.com/blog/docstring.html
Further notes
!=None versus is not None
tl;dr:
- 99% of the time it makes no difference
- ...but in some cases it does. If you want a singular good habit that covers those cases too, it's is
Why?
PEP8 says that comparisons to singletons should be done with is resp. is not, not with equality (==).
Part of the reason is roughly that a class is free to implement the equality operator (==) any way it wants,
whereas is is defined by the language and will always do what you think.
Another is that you may not have perfect overview of what coercion does
- consider that 1 == True but 1 is not True.
Whenever a class doesn't redefine the equality operator, then == is often just as good.
Even when it does redefine it, it is still not necessarily a problem, due to what people usually use the equality comparison operator for.
Still, in general practice, is is cleaner.
One real-world example is numpy
Where you write
if ary==None:
What you probably wanted to check whether the variable has an array (or any object) assigned at all.
But numpy redefines the == comparison so that you can e.g. compare an array to an array, and an array to a scalar, evaluated element-wise -- and get an ndarray of bools, rather than a single python bool:
>>> numpy.array( [(1, 2), (2, 1)] ) == 2
array([[False, True],
[ True, False]])
And ary==None would almost always give an all-False matrix (...because numpy won't even be able to store None unless you use dtype=object, which is often specifically avoided in number crunching if you can).
global
(We are skipping the "never use globals" topic here)
tl;dr:
- you can always read from global variables
- until told olderwise, variable writes go to local scope.
- ..so when you want to assign to a global variable, you need to say explicitly that you do
- basically just to make it unambiguous that you do not want to create a variable in local scope by that same name.
Other languages may allow access at all, not split reading and writing,
so as this FAQ notes,
it can be a little surprising that you can always read variables from global scope (and other outer scopes)(verify) without asking for it.
Some arguments for this seem to be that
- when you alter other scopes, that should be explicit
- but to do that for everything might be cluttered -- remember that imported modules and their functions, built-ins, and more are also all effectively global.
- in a sense, you have to say 'global' only when you are doing something slightly less usual -- so that it has specific debugging value to see that written out.
- it might be your own choice to use global even if you only read, but want to signal to other coders that you are using globals. (Not because it does anything, but because it communicates to others you are mixing scopes. eager linters consider this meh, though)
At a more technical level, global has two possible effects: (verify)
- if the variable did not exist in global scope, it now declares it there, (and binds it in your more local scope as it always would)
- if the variable already existed in global scope, it just binds it in your more local scope
https://docs.python.org/3/reference/simple_stmts.html#the-global-statement
nonlocal
If you understand the need for global, you may have also run into cases where that is crude, in the sense that you e.g. might just want a local function to reach the outer function's state, and didn't want to do that via an actual global.
Python 3 introduced nonlocal, which searches outwards,
and binds to the nearest containing scope that has the mentioned name defined.
Note that it does not consider global scope (so effectively it considers only function scopes).
Note that sometimes, nonlocal interacts weirdly with decorators,
in the sense that it only cares what eventually got defined.
So nonlocal happily finds functions that a decorator defined, which won't be the one you wrote down. That is often correct, but still potentially confusing, e.g. in stack traces, and to some tools.
https://docs.python.org/3/reference/simple_stmts.html#the-nonlocal-statement
https://stackoverflow.com/questions/5218895/python-nested-functions-variable-scoping