Syntaxish: syntax and language · changes and py2/3 · decorators · importing, modules, packages · iterable stuff · concurrency

IO: networking and web · filesystem

Data: Numpy, scipy · pandas, dask · struct, buffer, array, bytes, memoryview · Python database notes

Image, Visualization: PIL · Matplotlib, pylab · seaborn · bokeh · plotly

Tasky: Concurrency (threads, processes, more) · joblib · pty and pexpect

Stringy: strings, unicode, encodings · regexp · command line argument parsing · XML

date and time

Notebooks

speed, memory, debugging, profiling · Python extensions · semi-sorted

Import related notes

Specifying import fallbacks

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

You've probably seen fallback import tricks like:

import StringIO
try:
    import cStringIO as StringIO
except ImportError:
    pass

or

try:
    import cElementTree as ElementTree
except ImportError:
    import ElementTree

or

try:
    set
except NameError:
    from sets import Set as set

(For ElementTree you may want something fancier; see Python notes - XML#Lineage.2C_variants.2C_and_importing)

A reference to the module you're coding in

There are a few ways of getting a reference to the current module object (which is rarely truly necessary, and note that if you need only the names of the members, you can use dir() without arguments).

Apparently the generally preferred way is to evaluate sys.modules[__name__], because this is needs no knowledge of where you put that code, and can be copy-pasted directly. (The variable __name__ is defined in each module and package (it will be '__main__' if the python file itself is run as a script, or you are running python interactively).

Another way is to import the current module by its own name, which (because my then the module is already loaded), has the net effect of just returning that reference (and binding it to a name).

There are a few details to this, including:

you shouldn't do this at module-global scope(verify), since the module won't be loaded at that point

will work for packages, by its name as well as by __init__, but there is a difference between those two (possible confusion you may want to avoid): the former will only be a bind, while the latter is a new name so may cause a load, which might pick the pyc file that Python created, so while it should be the same code it may not be id()-identical (...in case that matters to your use)

Importing and binding, runtime-wise

In general, importing may include:

explicit module imports: you typing import something in your code
implicit module imports:

site-specific details

anything imported by modules you import

and package-specific details (see */__all__)

binding the module, or some part of it, as a local name

Module imports are recorded in sys.modules, which allows Python to import everything only once.

All later imports fetch the reference to the module from that cache and only bind it in the importing scope.

Binding specific names from a module

You import a whole module at a time.

Optionally, you can take a few things from within that module, and bind it in another scope. This is mostly personal taste - it does not change what gets evaluated during import.

Say you are interested in the function comma() from lists.format (package lists, module formats).

You can do:

import format as fm             # people seem to like this for things like numpy, because it saves typing for something you reference a lot
# binds 'fm', so usable like:
fm.lists.comma()


import format.lists
# binds 'format', so usable like:
format.lists.comma()


from format import lists
# binds lists (and not format), so:
lists.comma()


from format import lists as L
# locally binds lists as L (and neither format or lists), so:
L.comma()
 

import format.lists as L
# same as the last
L.comma()


from format.lists import *
# binds all public names from lists, so:
comma()      # and who knows what else is now in your scope


from format.lists import comma
# binds only a specific member
comma()


from format.lists import comma as C
# like the last, but binds to an alias you give it
C()

Packages

For context, a module is a python file that can be imported, and any file can be a module. The requirements amount to

the filesystem name must not be a syntax error when mentioned in code
needs a .py extension

(the import system is more complex and you can get other behaviour, but the default behaviour mostly looks for .py)

Packages are a little extra structure on top of modules, an optional way to organize modules.

A package is a directory with an __init.py__ file and that directory will typically contain some module files it is organizing.

A package doesn't much special status (it's mostly just a module with a funny name) except in how the import system handles further importing and name resolution.

What should be in __init__.py

If you are only using packages to collect modules in a namespacing tree structure sort of way, you can have __init__.py be empty

however, sometimes this only means more typing and not more structure. A lot of people like keeping things as flat as sensible

(you might still like to have a docstring, an __author__, maybe __version__ (a quasi-standard) in there)

If you like to run some code when the package is first imported (usually for some initial setup), that can go in __init__.py

however, as this amounts to module state, this is sort of a shared-global situation that most libraries intentionally avoid

You could put all code in __init__.py -- because that's just an awkward way to make what amounts to module. With a weird internal name.

If you want to use a package as an API to more complex code, and/or selective loading of package contents, read on

You can drag in some PEPs about clean style, but in the end there are mostly just suggestions.

The most convincing argument I've seen is centered around "think about what API(s) you are providing".

If you leave __init__.py empty, that means users have to explicitly import all modules.

upsides

very selective loading

downsides

lots of typing

users have to know module names already, as the package won't tell them

__init__.py itself mostly imports modules' contents into the package scope

upsides

doesn't require you to import parts before use.

Lets that package be an API of sorts, cleaner in that it shielding some private code users don't care about

for the same reason, makes it easier to reorganize without changing that API

downsides

doesn't allow user to load only a small part

__init__.py imports parts from submodules

upsides

can be cleaner, e.g. letting you mask 'private' parts from the package scope

downsides

can be messier, in that it's less

__init__.py uses __all__

see below, but roughly:

upsides: lets users have some control over whether to load submodules or not

downsides: a little black-box-magical before you understand the details

Relative imports

importing *, and `all`

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

Using import * from something / from something import * asks python to import all names from a module or package, and bind them in the scope you said that in.

Within user code, importing * may be bad style, mostly because you cannot easily know what names you are importing into your scope (they are not mentioned in your code, and may change externally), so in the long run you won't know what name collisions that may lead to. This problem is not actually common, but that itself is why it can lead to very confusing, well hidden bug once it happens (particularly as same-named things will often do similar things).

It should be much less of an issue for packages to import * from their modules, because and should be well known to the package's programmer what names that imples.

Side note: the package's code doesn't tell you want names will be there at runtime. This can be annoying when users are trying to e.g. find a particular function's code.

import * around modules

if there is no __all__ member in a module, all the module-global names that do not start with an underscore (<text>_</text>) are bound

which means that basic modules can use underscore variable to have global-like things that won't pollute others

if there is an __all__ member in a module, it will bind those things

lets a programmer minimize namespace cluttering from their own modules. If people want to throw a lot of names onto a pile, they have to work for it

import * around packages

if there is no __all__ member, it only picks up things already bound in this package scope (except underscore things)

if there is an __all__ member, it seems to go through that list and

if that name is not an attribute bound in the package, try to import it as a module.

if that name is an attribute already bound in the module, don't do anything extra

binds only names in that list

You can mix and match, if you want to confuse yourself.

For example, consider a package like:

__all__ = ['sub1', 'myvar']
__version__ = '0.1'
import sys
myvar    = 'foo'
othervar = 'bar'
import sub2

...will

always import sub2 (it's global code)
import sub1 only when you do an import * from this package, not when you import the package itself
bind sub1 and myvar, but not othervar or sub2 or sys

if __all__ includes a name that is neither bound or can be imported, it's an AttributeError

This ties into the "are you providing distinct things" discussion above in that you problably don't want explicit imports then.

importing from packages

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

The examples below assume a package named format with a module in it called lists.

To experiment yourself to see when things happen, try:

mkdir format
echo 'print "format, __init__"'  > format/__init__.py
echo 'print "format, lists"'     > format/lists.py

In an import, everything up to the last dot has to be a package/subpackage, and the last part must be a module.

The package itself can also be imported, because a __init__.py file is a module that gets imported when you import the package (or something from it) and aliased as the directory name. With the test modules from the last section:

>>> import format.lists
format, __init__
format, lists

The import above bound 'format' at local scope, within which a member 'lists' was also bound:

>>> format
<module 'format' from 'format/__init__.py'>
>>> dir(format)
['__builtins__', '__doc__', '__file__', '__name__', '__path__', 'lists']
>>> format.lists
<module 'format.lists' from 'format/lists.py'>

Modules in packages are not imported unless you (or its __init__ module) explicitly do so, so:

>>> import format
format, __init__
>>> dir(format)
['__builtins__', '__doc__', '__file__', '__name__', '__path__']

...which id not import lists.

Note that when you create subpackages, inter-package references are resolved first from the context of the importing package's directory, and if that fails from the top package.

where import looks

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

import looks in all of sys.path.

Which is only a partial answer, because sys.path is altered by a few things during intepreter startup.

Which is before before your code gets control, and might add some more

when you do that matters, though, because you rarely control the absolute order in which all that you import gets imported

Once you do you do have control, you can add sys.path.append() (docs: "A program is free to modify this list for its own purposes."[1]) -- but this is too late to do some things cleanly.

The startup steps are mostly:

it tries to put the invoking python's directory on front of sys.path (i.e. sys.path[0]) so that it takes precedence

otherwise left as '' which signifies the current directory(verify)

both meaning you can put supporting modules alongside your script

it adds entries from PYTHONPATH

this seems to come after the above, before anything else (verify)

Intended for private libraries, when you have reason to not install them into a specific python installation (...or you can't), or perhaps to override with a specific version(verify)

avoid using this to switch between python installations - that is typically better achieved by calling the specific python executable (to get its configured overall path config)

avoid using this to import specific modules from other python installations (just install it into both is likely to be cleaner and lead to less confusion)

it does an import site, which (unless -s or -S blocks things) leads to roughly four things:

add site packages (via site.addsitedir)

combines the compiled-in values of sys.prefix and sys.exec_prefix (often both /usr/local) with a few suffixes, roughly ending up on lib/pythonversion/site-packages (verify) (except on debian/ubuntu, see note below. There may be distros that customise site.py)

setting PYTHONHOME overrides that compiled-in value with the given value.[2]

fully isolated (see virtualenv) or embedded[3] pythons want to do this. Most other people do not, in part because of what it doesn't mean for subprocesses calls

add user site packages (via site.addsitedir) (see also PEP370, since roughly py2.6(verify), was introduced allow installation into homedirs without requiring virtualenv)

similar, but looking within ~/.local (platform-specific, actually) so ending up with something like ~/.local/lib/pythonversion'/site-packages

which is added only if it exists and has an appropriate owner.

that user base (~/.local) can be overridden with PYTHONUSERBASE

import sitecustomize

e.g. for site-wide development tools like profiling, coverage, and such

note that 'site' is still a specific python installation

import usercustomize (after sitecustomize)

e.g. for user-specific development tools like profiling, coverage, and such

can be disabled (e.g. for security), e.g. -s parameter avoids adding the user site directory to sys.path (and also sets site.ENABLE_USER_SITE to False)

Notes:

If you customize some of this, you need to think hard about how scripts run via subprocess won't or will get the same alterations.

pip installs into
- if a virtualenv is enabled: your virtualenv's site-packages
- if using --user, your user site-packages
- otherwise (i.e. default): the system site-packages

if run without sudo, it should fail and suggest --user

site.addsitedir() amounts to
- sys.path.append() plus
- looking for and handling *.pth files

.pth files are intended to include further paths (allows namespace-ish things).

apparently intended for slightly more intelligent search path alteration, where necessary, and must start with 'import'

...but since it allows semicolons you can abuse this for varied inline code

There is a move to design this away - once all the useful things it is used for (package version selection, virtual environment chaining, and some others) exist properly

e.g. take a peek at your site-package's .pth files

dist-packages is not standard python, it is a debian/ubuntu convention

these say that python packages installed via apt go into dist-packages, and the site-packages directory that would normally be the target is not used

pip will be installed that way, and also install there

this is intended so that admin installing things with with apt or pip go to a specific system directory (dist-utils), while e.g. your own custom compiled python won't know about this and install into its site-packages.

Which is considered better isolated. But not necessarily very clear to understamd.

dist-packages is baked into the system python's site.py(verify)

Freezing

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

(note: this is unrelated to package managers freezing a package, which is basically just listing packages and their versions, usually to duplicate elsewhere)

Freezing means wrapping your code so that it does not depend on things, and runs anywhere. This usually meaning a copy of the a python interpreter, and all external modules (it's vaguely analogous to static linking), and some duct tape to make it work (and make it independent from whatever pythons you have installed).

It often creates a relatively large directory, and doesn't really let you alter it later.

The main reason to do this is to have a self-contained copy that should run anywhere (in particular, it does not rely on an installed version of python) so is nice for packaging a production version of your desktop app.

To do this yourself, you can read things like https://docs.python.org/2/faq/windows.html#how-can-i-embed-python-into-a-windows-application

It's easier to use other people's tools.

Options I've tried:

cx_freeze

lin, win, osx

http://cx-freeze.sourceforge.net/

PyInstaller

lin, win, osx

can pack into single file

http://pyinstaller.python-hosting.com/

See also http://bytes.com/forum/thread579554.html for some get-started introduction

Untried:

bbfreeze[4]

lin, win, osx

py2exe [5] (a distutils extension)

windows (only)

can pack into single file

inactive project now?

Python's freeze.py (*nix) (I don't seem to have it, though)

py2app [6]

mac OSX (only)

Gordon McMillan's Installer (discontinued, developed on into PyInstaller)

Installation in user environments, and in dev environments; packaging; projects

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

TODO: merge with Isolating_shell_environments#Python

Doing package installs

tl;dr

for system installs

pip (or similar) will install into the same dist-utils your system package manager

system package manage should mix decently with pip installs

but it can gets confusing when you have one install things the other isn't aware of. So you might want to prefer using just one as much as possible

for distinct stacks (dev, fragile apps)

consider virtualenv

consider pipenv, conda, and similar -- sometimes doing something virtualenv-like for you is often simpler/cleaner

creating and uploading packages

look at distribute (basically a nicer setuptools)

Making python packages

Python's packaging history is a bit of a mess.

Packaging was initially a bit minimal, and pasted on.

And initially just installation, which is cool and useful and all.

...yet in development (and rolling-delivery production), you want updateability, clean dependency management, and more.

This led to people making a few too many alternative, leading to some confusion.

💤 History we can now mostly forget about

We had

distutils (2000ish)

standard library

PEP-273 introduced zip imports (2001)

can be copied into place

is then mostly equivalent to having that thing unzipped in the same location

...with some footnotes related to import's internals.

PyPI (2003)

meant as a central repository

initially just a repository of links to zips elsewhere, which you would manually download, unpack, and either setup.py install (distutil stuff) (or sometimes just copy the contents to site-packages)

setuptools (2004)

introduced eggs

introduced easy_install (which these days is no longer used)

egg (2004, see previous point. Never put into a PEP)

eggs are zip imports that adhere to some extra details, mostly for packaging systems, e.g. making them easier to discover, their dependencies resolved, and installed.

there are some variants. A good readup involves the how and why of setuptools, pkg_resources, EasyInstall / pip, and more

Ideally, you can now skip eggs

distribute (2008)

fork of setuptools, so also provides setuptools

had a newer variant of easy_install (from distribute, so ~2008)

(how relevant is this one?)

distutils2 (~2010) - made useful contributions, apparently not interesting as its own thing[7]

pip (2008)

easy_install replacement

more aware of depdendencies (verify)

can uninstall; easy_install could not

downsides:

cannot install eggs (seemingly because we wanted to replace egg with wheel?(verify))

doesn't isolate different versions (verify)

limited to python - C dependencies are still ehhh

wheel format is introduced (2013; PEP-427, PEP 491 ) as replacement for egg format.

intended to be a cleaner, better defined thing, for just installs - see On wheels

PEP-518 introduced pyproject.toml (2016)

specifies what build tools you require to build a wheel

better defined than setup.cfg did(verify) and with whatever version of setuptools someone would have installed which you would have no control over

(and using TOML format to be easier than ini/configparser?

On wheels

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

any of this and docker

Creating packages

Manual

Flit

Tries to make it easier for you to publish to PyPI

https://flit.pypa.io/

PDM

https://pdm.fming.dev/

Hatchling / Hatch

Hatch is a project manager.

Hatchling it its build backend.

https://pypi.org/project/hatchling/

Poetry

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

Projects

Project here in a vague sense, of 'directory with contains a single project that you may want to handle in a specific way'

This suggests things like

having a user environment (more specific that for the entire user)

to isolate to

to run in

to install into

possibly with specific package dependency management

building packages for distribution

pyproject.toml

https://peps.python.org/pep-0518/ PEP-518] specifies how python software specifies build dependencies.

pip notes

Install

python -m pip

The advice to use python -m pip instead of pip comes mostly from it being more obvious which of the installed python versions you're referencing.

It's otherwise identical.

pip search is dead, long live the alternatives

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

pip search never did much more than a substring search in name and summary, but the API it relied on was always considered experimental, and there was a long-term (and possibly accidental) DDoS going on that made hosting costs high. So they shut that down (See https://github.com/pypa/pip/issues/5216 and https://status.python.org/incidents/grk0k7sz6zkp for details)

pip may be working on a local replacement based on pip index (verify), but in the meantime, alternatives include:

searching on the pypi website

pip_search (seems to scrape pypi website)

pip install pip_search
pip_search scikit

pypisearch (seems to require py>=3.8, though)

git clone https://github.com/shidenko97/pypisearch & cd pypisearch & pip install .
python -m pypisearch scikit

Install from git

Can you update all packages?

There are a few hacks, but perhaps easiest is a third party tool:

pip install pip-review
pip-review --interactive

User installs

pip run with --user installs into site.USER_SITE, which python's importing will pick up - see e.g. PEP 0370

pip run as a non-root user seems to act as if --user was specified

pip and dependencies

showing package dependencies

For installed packages,

pip show spacy

...will show something like:

Name: spacy
Version: 3.4.1
Summary: Industrial-strength Natural Language Processing (NLP) in Python
Home-page: https://spacy.io
Author: Explosion
Author-email: contact@explosion.ai
License: MIT
Location: /usr/local/lib/python3.8/dist-packages
Requires: typer, langcodes, spacy-loggers, catalogue, packaging, requests, setuptools, thinc, spacy-legacy, wasabi, numpy, pydantic, 
          cymem, tqdm, srsly, preshed,  jinja2, pathy, murmurhash
Required-by: spacy-transformers, spacy-fastlang, nl-core-news-sm, nl-core-news-md, nl-core-news-lg, en-core-web-trf, en-core-web-sm, 
             en-core-web-md, en-core-web-lg, collocater

Note that the required-by only list things that require it and you have installed, not all the possible things, so will vary between installations.

A user installed package will show a different location, e.g.:

Name: jedi
Version: 0.18.1
Summary: An autocompletion tool for Python that can be used for text editors.
Home-page: https://github.com/davidhalter/jedi
Author: David Halter
Author-email: davidhalter88@gmail.com
License: MIT
Location: /home/me/.local/lib/python3.8/site-packages
Requires: parso
Required-by: ipython

Development

Reproducing the same set of packages elsewhere

One convention that has grown strong enough to almost be a de facto standard is to create a file, usually called requirements.txt, that contains the package-and-version specs for each library you want.

Each (non-comment) line is essentially arguments to a unique call to the pip CLI tool, and is parsed by pip; it is e.g. the pip documentation that notes that yes, you could add comments

So options include

FooProject
FooProject >= 1.2
FooProject >= 1.2 --global-option="--no-user-cfg"

As this documentation mentions, the last is roughly equivalent to going into FooProject and running:

python setup.py --no-user-cfg install

requirements.txt vombine well with virtual environments

Say, if you just created a venv, you can now do:

pip install -r requirements.txt

...and it will install everything this project needs.

Similarly, if you are currently within a venv, you can create a requirements.txt like

pip freeze > requitements.txt

💤 This will, however, be overly specific. That is, it will probably list the precise version you have installed, such as:

 webencodings==0.5.1
 WebOb==1.8.7
 websocket-client==0.53.0
 websockets==10.4
 Werkzeug==2.2.2

and you might actually want to edit that to be more accepting, if you want it to accept updates from each library.

pipenv originated in part from trying to make things even simpler than those manual steps of creating and picking up requirements.txt

Editable installs

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

Unsorted

DBus error on python package installs

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

No such interface “org.freedesktop.DBus.Properties” on object at path /org/freedesktop/secrets/collection/login

When you use something like pip, or something more complex like poetry or twine.

You'll probably see packages like keyring and secretstorage.

If you didn't actually need auth storage, then prepending

PYTHON_KEYRING_BACKEND=keyring.backends.null.Keyring

to your command should be a good test whether keyring is the problem - and be a good temporary workaround.

https://unix.stackexchange.com/questions/684854/free-desktop-dbus-error-while-installing-any-package-using-pip

Python usage notes - importing, modules, packages, packaging

Import related notes

Specifying import fallbacks

A reference to the module you're coding in

Importing and binding, runtime-wise

Binding specific names from a module

Packages

Relative imports

importing *, and __all__

importing from packages

where import looks

Freezing

Installation in user environments, and in dev environments; packaging; projects

Doing package installs

Making python packages

On wheels

any of this and docker

Creating packages

Manual

Flit

PDM

Hatchling / Hatch

Poetry

Projects

pyproject.toml

See also

pip notes

Install

python -m pip

pip search is dead, long live the alternatives

Install from git

Can you update all packages?

User installs

pip and dependencies

showing package dependencies

Development

Reproducing the same set of packages elsewhere

Editable installs

Unsorted

DBus error on python package installs

Navigation menu

importing *, and `all`