Python usage notes - importing, modules, packages

From Helpful
Jump to: navigation, search
Syntaxish: syntax and language · changes and py2/3 · importing, modules, packages · iterable stuff · concurrency

IO: networking and web · filesystem

Data: Numpy, scipy · pandas, dask · struct, buffer, array, bytes, memoryview · Python database notes

Image, Visualization: PIL · Matplotlib, pylab · seaborn · bokeh · plotly

Concurrency (threads, processes, more) · joblib · pty and pexpect

Stringy: strings, unicode, encodings · regexp · command line argument parsing · XML

date and time

speed, memory, debugging, profiling


Import related notes

Import fallbacks

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

You've probably seen fallback import tricks like:

import StringIO
    import cStringIO as StringIO
except ImportError:


except NameError:
    from sets import Set as set


    import cElementTree as ElementTree
except ImportError:
    import ElementTree

For ElementTree you may want something fancier; see Python notes - XML#Importing

Reference to current module

There are a few ways of getting a reference to the current module object (which is rarely truly necessary, and note that if you need only the names of the members, you can use dir() without arguments).

The generally preferred way is to evaluate sys.modules[__name__], because this is needs no knowledge of where you put that code, and can be copy-pasted directly. (The variable __name__ is defined in each module and package (it will be '__main__' if the python file itself is run as a script, or you are running python interactively).

Another way is to import the current module by its own name, which actually just binds the by-then-already loaded module, to a name that happens to be in its own scope (will also work for __main__).

There are a few details to this, including:

  • you shouldn't do this at module-global scope(verify), since the module won't be loaded at that point
  • will work for packages, by its name as well as by __init__, but there is a difference between those two (possible confusion you may want to avoid): the former will only be a bind, while the latter is a new name so may cause a load, which might pick the pyc file that Python created, so while it should be the same code it may not be id()-identical ( case that matters to your use)

Importing and binding, runtime-wise

In general, importing may include:

  • explicit module imports: you typing
    import something
    in your code
  • implicit module imports: anything imported by modules, and package-specific details (see */__all__)
  • binding the module, or some part of it, as a local name

Module imports are recorded in sys.modules, which allows Python to import everything only once.

All later
s fetch the reference to the module from that cache and only bind it in the importing scope.

Binding specific names from a module

You can also specify that you want to bind a few names from within a module. Say you are interested in the function comma() from lists.format (package lists, module formats). You can do:

import format.lists
# binds 'format', so usable like:
from format import lists
# binds lists (and not format), so:
from format import lists as L
# locally binds lists as L (and neither format or lists), so:
import format.lists as L
# same as the last
from format.lists import *
# binds all public names from lists, so:
from format.lists import comma
# binds only a specific member
from format.lists import comma as C
# like the last, but binds to an alias you give it

None of this changes importing, it's only different in what names get bound, and all just personal taste. (e.g. I like to avoid from and as, forcing my own code to mention exactly where it gets its functions)


For context, any python file can be a 'module

except where its filesystem name is a syntax error in python code
and usually you want the .py extension (you can import things that don't have it, but have to dig into loading code)

Packages are a little extra structure on top, an optional way to organize modules.

Where modules correspond to files, packages correspond to directories containing modules.

A package is a directory with an __init.py__ file.

In the real world, many files are empty or contain only an informative docstring, because the most basic use of packages is just namespacing your modules in a useful way.

Beyond grouping things under a common name, it allows things like

  • running code when the package is first imported, usually for some initial setup
...put it in the file
  • allowing selective import of the package's modules
forcing people to ask for modules explicitly, rather than loading everything

importing from packages
This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

The examples below assume a package named format with a module in it called lists.

To experiment yourself to see when things happen, try:

mkdir format
echo 'print "format, __init__"'  > format/
echo 'print "format, lists"'     > format/

In an import, everything up to the last dot has to be a package/subpackage, and the last part must be a module.

The package itself can also be imported, because a file is a module that gets imported when you import the package (or something from it) and aliased as the directory name. With the test modules from the last section:

>>> import format.lists
format, __init__
format, lists

The import above bound 'format' at local scope, within which a member 'lists' was also bound:

>>> format
<module 'format' from 'format/'>
>>> dir(format)
['__builtins__', '__doc__', '__file__', '__name__', '__path__', 'lists']
>>> format.lists
<module 'format.lists' from 'format/'>

Modules in packages are not imported unless you (or its __init__ module) explicitly do so, so:

>>> import format
format, __init__
>>> dir(format)
['__builtins__', '__doc__', '__file__', '__name__', '__path__']

...which id not import lists.

Note that when you create subpackages, inter-package references are resolved first from the context of the importing package's directory, and if that fails from the top package.

importing *, and __all__

Using import * from something is a special case.

It's also generally considered bad style, because it'll pollute the namespace you do this from.

But that's also why you get some control over what gets bound when you do this.

For modules:

  • if there is no __all__ member, all the module-global names that do not start with an underscore (<text>_</text>) are bound
  • if there is an __all__ member, only the names in that list are bound

So __all__ is useful when a programmer likes to minimize namespace cluttering from their own modules.

For packages, * is different; if __all__ were not present, python could only determine members based on filenames, but this would be unreliable on platforms such as Windows (as Windows deals with capitals on the filesystem somewhat creatively).

As consistent behaviour was preferred, for packages the process of importing only looks at __all__ in the package. If this is not present, there is no implicit binding.

For the example above, no __all__ means from format import * will only import format. When you add something like __all__= ['lists'] to the, it will import the package as well as the modules listed there, and bind those modules in the package object.

Note that some people prefer to not use import * at all, since it clutters one's namespaces and makes name collisions more likely than if you have to explicitly define those collisions.

It may be well received to, if you have helper functions, to place those in a separate module so that import * from such modules will only import a set of well-named helper functions.

Note that the import keyword is mostly a wrapper around existing functions, which in a few cases you may want to use directly.

Relative imports


This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

(note: this is unrelated to package managers freezing a package, which is basically just listing packages and their versions, usually to duplicate elsewhere)

Freezing means your code is wrapped in an independently executable way, usually meaning a copy of the a python interpreter, and all external modules (it's analogous to static linking), and some duct tape to make it work (and independent from whatever pythons you have installed).

It often creates a relatively large directory, and doesn't really let you alter it later.

The main reason to do this is to have a self-contained copy that should run anywhere (in particular, it does not rely on an installed version of python) so is nice for packaging a production version of your desktop app.

To do this yourself, you can read things like

It's easier to use other people's tools.

Options I've tried:

  • cx_freeze
lin, win, osx
  • PyInstaller
lin, win, osx
can pack into single file
See also for some get-started introduction


lin, win, osx
  • py2exe [2] (a distutils extension)
windows (only)
can pack into single file
inactive project now?
  • Python's (*nix) (I don't seem to have it, though)
mac OSX (only)
  • Gordon McMillan's Installer (discontinued, developed on into PyInstaller)

See also:

TODO: read:

Installation in user environments, and in dev environments

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

TODO: merge with Isolating_shell_environments#Python


  • for system installs
you may want to prefer your distro's package manager - it's generally not as up to date. It should mix decently with pip, though (verify)
pip is generally easiest
for things available only as egg (not as wheel, and not on PiPI), you'll still need easy_install
  • for distinct stacks (dev, fragile apps)
consider pipenv, conda, and similar -- and sometimes doing virtualenv yourself is simpler/cleaner
  • creating and uploading packages
look at distribute (basically a nicer setuptools)

Packaging was initially a bit minimal, and pasted on. And initially just installation, which is cool and useful and all.

...yet in development (and rolling-delivery production), you want updateability, clean dependency management, and more.

This led to people making a few too many alternative, leading to some confusion.

Some attempt at a historical summary:

We had

  • distutils
standard library (2000ish)
  • setuptools (2004)
introduced eggs, easy_install
  • distribute (2008)
fork of setuptools, so also provides setuptools
If installing a thing does involves running (typically the instruction is to run
python install
), that's this. And if the things were pure-python it'd work great, in other cases you become the package manage.

PyPI is the python package index.

It's been the central store since around 2003

Submitting to PyPI

  • generally, setuptools(/distribute) is still useful

Installing things from PyPI

  • use generally pip, or pipenv if you prefer

(Not to be confused with PyPA (python packaging agency), created in 2011 to simplify the mess made up to that point)

Initially PyPI was a repository of links to zips elsewhere, which you would have to manually download, unpack, and either install (distutil stuff), or sometimes just copy the contents to site-packages.

Then came

  • easy_install (from setuptools, so ~2004)
  • easy_install (from distribute, so ~2008)
Name searches go to PyPI, you can also install downloaded egg.

Note that from the PyPI side, the executable nature of was awkward to build a package management (which is mostly metadata in the end) on top. (verify)

  • pip (2008)
easy_install replacement
can uninstall
cannot install eggs (still can't, but can install wheel since shortly after wheel's introduction)
doesn't isolate different versions (verify)
limited to python - C dependencies are still ehhh

wheel format is introduced (2013) as replacement for egg format. (TODO: figure out details) [4]

Also relevant is virtualenv (2007), because it allowed easy_install and pip to install into a separated environment.

Which is great for dev and (rolling-update) production, because it's one of the few ways to get decent reproducability.

For users, if you can get your system install to accept all the libraries you want, it's arguably easier to avoid virtualenv, because it's not quite as friendly as its developer presents it.

pipenv, while quite recent (2017?), is (roughly) a cleaner, more integrated variant of pip + virtualenv + some of its support.

But devs want more separated software stacks. Initially people did this with a pip feature:

pip freeze > requirements.txt

on one side and on another host, and typically within a virtualenv, do:

pip install -r requirements.txt

Sensibly, people wanted this more practical and automted. So various things that help you integrate virtualenv - and often some package-dependency-metadata stuff and tooling around that. In these cases, you want separated software stacks

makes things more interesting yet, because ideally you want to install things into your specific project, not your system.

Buildout seems a little more focused on web dev, so there are some others that are for a more applied audience mostly academia (allowing e.g. non-python like compilers, and binaries like matlab) including:

  • buildout (2006) was initially designed for more repeatable installs
  • hashdist (2014?)
  • conda (2014?)
packaging core of anaconda, miniconda
separate from virtualenv, pip, etc though offering similar features
only binaries (doesn't build things itself)
not python-only

There are a bunch more footnotes to it, like details of helper libraries, underlying tools, but most of that you don't have to worry about, or only when putting fine polishing on creating/uploading packages.

See also:

More notes on

You may care about isolating shell environments

where import looks

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)
looks in all of

is altered by a few steps during intepreter startup.

The point of describing this is that this is before your code gets control. Once you do you do have control, you can add sys.path.append() (docs: "A program is free to modify this list for its own purposes."[5]) -- but this is too late to do some things cleanly.

These steps are mostly:

  • sys.path[0] is set to the directory that contains the thing that is invoking python (python executable if being run directly, more usually a script that invokes that)[6]
    • meaning you can always put supporting modules alongside your script
    • note that this will be
      (empty string) if python is invoked interactively, or the current directory cannot be determined(verify)
  • imports site, which roughly does four things:
    • add site packages (via site.addsitedir)
      • combines the compiled-in values of sys.prefix and sys.exec_prefix (often both /usr/local) with a few suffixes, roughly ending up on lib/pythonversion/site-packages (verify) (except on debian/ubuntu, see note below. There may be other distros that cusomise
      • setting PYTHONHOME overides that compiled-in value with the given value.[7]
      • fully isolated (see virtualenv) or embedded[8] pythons want to do this. Most other people do not, in part because of what it doesn't mean for subprocesses calls
    • add user site packages (via site.addsitedir) (see also PEP370, since roughly py2.6(verify), was introduced allow installation into homedirs without requiring virtualenv)
      • similar, but looking within ~/.local (platform-specific, actually) so ending up with something like ~/.local/lib/pythonversion'/site-packages
      • which is added only if it exists and has an appropriate owner.
      • that user base (~/.local) can be overridden with PYTHONUSERBASE
    • import sitecustomize
    • import usercustomize
      • these two are mainly meant for (site-wide or user-specific) development tools like profiling, coverage, and such
Intended for private libraries, when you have reason to not install them into a specific python installation (...or you can't)
avoid using this to switch between python installations - that is typically better achieved by calling the specific python executable
avoid using this to import specific modules from other python installations - just install it into both is likely to be cleaner


  • site.addsitedir() is sys.path.append() plus looking for and handling *.pth files
  • .pth files are intended to include further paths (allows namespace-ish things).
there is also a hack that allows code (apparently any line starting with import). : apparently intended for slightly more intelligent search path alteration, where necessary. There is a move to design this away (once all the useful things it is used for (package version selection, virtual environment chaining, and some others) exist properly)
  • site's base paths vary between *nix, win, osx

  • doing homedir installs will in practice often be done via pip --user (verify)
  • If you customize some of this, you need to think harder about how subprocess calls will or won't get the same alterations.
  • debian/ubuntu [9] says that python packages installed via apt go into dist-packages, not site-packages (dist-packages is baked into the system python's Things like pip will read it via)
this is intended to lessen conflicts between system python and non-system-pythons (but also potentially confusing)

  • pip installs into
    • if a virtualenv is enabled: your virtualenv's site-packages
    • if using --user, your user site-packages
    • otherwise (i.e. default): the system site-packages
if run without sudo, it should fail and suggest --user

See also:

eggs, zips, eggdirs, wheels

any of this and docker

Creating packages

See also

pip notes

Editable installs