Python notes - semi-sorted

From Helpful
Jump to: navigation, search
Syntaxish: syntax and language · changes and py2/3 · decorators · importing, modules, packages · iterable stuff · concurrency

IO: networking and web · filesystem

Data: Numpy, scipy · pandas, dask · struct, buffer, array, bytes, memoryview · Python database notes

Image, Visualization: PIL · Matplotlib, pylab · seaborn · bokeh · plotly


Tasky: Concurrency (threads, processes, more) · joblib · pty and pexpect

Stringy: strings, unicode, encodings · regexp · command line argument parsing · XML

date and time


Notebooks

speed, memory, debugging, profiling · Python extensions · semi-sorted



Setting the process name

It's OS-specific stuff, so there is no short portable way.

The easiest method is to install/use the setproctitle module - it aims to be portable and try its best on various platforms. Example use:

try:
    import setproctitle
    setproctitle.setproctitle( os.path.basename(sys.argv[0]) )
except ImportError:
    pass

The above often means 'use the filesystem name that was used to run this' - but not always, so a hardcoded string can make sense.


Temporary files

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, or tell me)

array, deque

A list can be used as a sort of deque, like:

append(val)     # insert on right side 
pop()           # take from right side
insert(0,val)   # insert on left side
pop(0)          # take from left side

However, list is primarily efficient for stack-like use - list.pop(0) and list.insert(0, val) are O(n) operations,

...while those are O(1) operations on collections.deque (added in 2.4). deque also has appendleft(), extendleft(), and popleft(), and some others that can be convenient and/or more readable.

You may also wish to know about the queue nodule, a multi-producer, multi-consumer, thread-safe queue.

See also:

Some library notes

pytest notes

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, or tell me)


Pytest wants to find tests itself - this is an inversion of control thing, with the point is that no matter how complex tests get, pytest can do everything for us without us having to hook it in specifically.


What files does pytest pick up to be tested?

  • the filenames specified
  • if none specified:
filenames named like test_*.py or *_test.py on the directory tree under the current dir(verify)
You can control this discovery, see e.g. https://docs.pytest.org/en/6.2.x/example/pythoncollection.html


How does pytest decide what to run as tests?

  • functions prefixed
    test
    at module scope (verify)
  • Classes prefixed
    Test
    , and then functions prefixed
    test
    inside them
...but only if that class does not have an __init__.
These classes are not intended to be classes with state (and it does not actually instantiate the class(verify)), just to group test functions and pollute the namespace less.
  • classes subclassed from unittest.TestCase (see unittest)
  • by marker, e.g. pytest -m slow picks up things decorated with @pytest.mark.slow
useful to define groups of tests


What does pytest actually consider success/failure?

Roughly: each test function will be a

  • success:
if it returns, AND
if all
assert
s contained (if any) are successful
  • failure: fails on the first failing
    assert
  • failure: fails on the first exception


Expressing further things

  • testing that something throws and exception has a few alternatives, but using a context manager is probably the shortest form:
with pytest.raises(ValueError, match=r'.*BUG.*'):
    raise ValueError("BUG: tell programmer")
Another would be catching it and invoking pytest.fail
try:
   0/0
except ZeroDivisionError as exc:
   pytest.fail()


  • similarly, there is a pytest.warns
and the context object variant is probably preferred here
warnings.warn() by default emits a UserWarning - see https://docs.python.org/3/library/warnings.html#warning-categories
with pytest.warns(UserWarning, match=r'.*deprecated.*'):  #

More detailed testing can be done like

  • anything else that does asserts, like
unittest.assertEqual(a, b)
unittest.assertNotEqual(a, b)
unittest.assertTrue(x)
unittest.assertFalse(x)
unittest.assertIs(a, b)
unittest.assertIsNot(a, b)
unittest.assertIsNone(x)
unittest.assertIsNotNone(x)
unittest.assertIn(a, b)
unittest.assertNotIn(a, b)
unittest.assertIsInstance(a, b)
unittest.assertNotIsInstance(a, b)
...but most of those are shorter if you write the assert yourself.



Showing details

Pytest will try to pick up more to report for failed tests, e.g.:

comparing long strings: a context diff is shown
comparing long sequences: first failing indices
comparing dicts: different entries

There is more customization you can do - sometimes nice to get more actionable output from pytest runs.


On fixtures/mocking

Fixtures create reusable state/helpers for tests, and are great if you use the same data/objects.

For terminology, see Benchmarking,_performance_testing,_load_testing,_stress_testing,_etc.#Mocking.2C_monkey_patching.2C_fixtures

For what pytest helps with, see [1]

https://levelup.gitconnected.com/a-comprehensive-guide-to-pytest-3676f05df5a0



On coverage


To do coverage checking at all, add

--cov=dirname/

This mostly gets you a summary.


To see which lines aren't covered, read https://pytest-cov.readthedocs.io/en/latest/reporting.html

If you want it to generate a browsable set of HTML pages, try:

--cov-report html:coverage-report




https://docs.pytest.org/en/latest/how-to/usage.html

typer

Rich

3D

PyGame

Win32 interface

pywin32, previously known as win32all, provides hooks into various parts of windows. Apparently with central module win32api. (see also its help file)

downloadable from sourceforge and with a homepage here.


Some code for this:


GPGPU

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, or tell me)

Reikna

Clyther

PyOpenCL


PyStream


cudamat

scikits.cuda


gnumpy

Theano


Unsorted

Creating ZIP files (in-memory)

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, or tell me)

We can use in-memory data and StringIO objects here - so this is sometimes simpler than doing the same with tarfile.

You can add content with either:

  • ZipFile.write - takes filename it will open - useful when you just wanted more control over what files to add
  • ZipFile.writestr takes a filename/ZipInfo and a bytestring - the filesystem isn't touched, which is probably what you want when adding in-memory data to in-memory zip files.
# Just an example snippet
zip_sio=StringIO.StringIO()
z = zipfile.ZipFile(zip_sio, "w", zipfile.ZIP_DEFLATED) # or another compression level
 
for filename,filedata in (('filename1.txt', 'foo'),
                          ('filename2.txt', 'bar')):
   z.writestr(filename,filedata)
z.close()
return zip_sio.getvalue()

FFTs

There are a few implementations/modules, including:

  • fftpack: used by numpy.fft, scipy.fftpack


Speed-wise: FFTW is faster, numpy is slower, scipy is slower yet (not sure why the np/sp difference when they use the same code) Think a factor 2 or 3 (potentially), though small cases can be drowned in overhead anyway.

Also, FFTW planning matters.


It does matter how the coupling works - there for example are more and less direct (overhead-wise) ways of using FFTW.

TODO: figure out threading, MPI stuff.


See also:

Other extensions

Bytecode / resolve-related notes

  • .py - source text
  • .pyc - compiled bytecode
  • .pyo - compiled bytecode, optimized. Written when python is used with -O. The difference with pyc is currently usually negligible.
  • .pyd - a (windows) dll with some added conventions for importing
(and path-and-import wise, it acts exactly like the above, not as a linked library)
note: native code rather than bytecode
  • (.pyi files are unrelated)


All of the above are searched for by python itself.

Python it will generate pyc or pyo

when they are imported them (not when the modules are run directly)
...with some exceptions, e.g. when importing from an egg, or zip file it will not alter those
...which means it can, for speed reasons, be preferable to distribute those with pyc/pyo files in them


There is some talk about changing these, see e.g. PEP 488


See also:

pyi files

These are called stub files, and are part of type-checker tooling.

These should only be read for their function signatures. They are syntactically valid python but they should be considered metadata, and you should have no runtime behaviour of them.

IDEs have been known to store what they managed to infer in these files. (and I looked them up because I wanted to know the reason vscode often doesn't show me the actual code when I ask for definition (F12 / Ctrl-click). This seems to have been broken / half-fixed in ways only some people understand for a few years now)



They are metnioned in Type hints PEP (https://peps.python.org/pep-0484/ PEP-484)

Documentation

docstrings

documentation generators

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, or tell me)

https://wiki.python.org/moin/DocumentationTools


Further notes

!=None versus is not None

tl;dr:

  • 99% of the time it makes no difference
  • ...but for the cases where it does, it's it's good practice to use
    is
    .
  • the reason is only true for singletons like None
and not necessarily for booleans, because you are probably used to having those coerced to bool;
= True
but
1 is not True


Why?

PEP8 says that comparisons to singletons should be done with
is
/ {{inlinecode|is not||, and not equality (
==
). The reason is roughly that a class is free to implement the equality operator (==) any way it wants, whereas
is
is defined by the language and will always do what you think.


Whenever a class doesn't redefine the equality operator, then == is often just as good.

Even when it does redefine it, it is still not necessarily a problem, due to what people usually use the equality comparison operator for.

But as general practice,
is
is cleaner.


One real-world example is numpy

Where you write

if ary==None:

What you probably wanted to check whether the variable has an array (or any object) assigned at all.

But numpy defines the == comparison so that you can e.g. compare an array to a scalar, evaluated element-wise, like

>>> numpy.array( [(1, 2), (2, 1)]  ) == 2
array([[False,  True],
       [ True, False]])
And
{{{1}}}
would almost always give an all-False matrix, because numpy won't even be able to store None unless you use dtype=object, which is often specifically avoided in number crunching.