Python usage notes - importing, modules, packages

From Helpful
(Redirected from Virtualenv)
Jump to: navigation, search
Various things have their own pages, see Category:Python. Some of the pages that collect various practical notes include:


Import related notes

Import fallbacks

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

You've probably seen the fallback import trick, such as:

import StringIO
try:
    import cStringIO as StringIO
except ImportError:
    pass

or

try:
    set
except NameError:
    from sets import Set as set

or

try:
    import cElementTree as ElementTree
except ImportError:
    import ElementTree

For ElementTree you may want something fancier; see Elementtree#Importing


Reference to current module

There are a few ways of getting a reference to the current module (which is rarely actually necessary, and note that if you need only the names of the members, you can use dir() without arguments).


The generally preferred way is to evaluate sys.modules[__name__], because this is needs no knowledge of where you put that code, and can be copy-pasted directly. (The variable __name__ is defined in each module and package (it will be '__main__' if the python file itself is run as a script, or you are running python interactively).


Another way is to import the current module by its own name, which actually just binds the by-then-already loaded module, to a name that happens to be in its own scope (will also work for __main__).

There are a few details to this, including:

  • you shouldn't do this at module-global scope(verify), since the module won't be loaded at that point
  • will work for packages, by its name as well as by __init__, but there is a difference between those two (possible confusion you may want to avoid): the former will only be a bind, while the latter is a new name so may cause a load, which might pick the pyc file that Python created, so while it should be the same code it may not be id()-identical (...in case that matters to your use)

Importing and binding, runtime-wise

In general, importing may include:

  • explicit module imports: you typing
    import something
    in your code
  • implicit module imports: anything imported by modules, and package-specific details (see */__all__)
  • binding the module, or some part of it, as a local name


Module imports are recorded in sys.modules, which allows Python to import everything only once.

All later
import
s fetch the reference to the module from that cache and only bind it in the importing scope.



Binding specific names from a module

You can also specify that you want to bind a few names from within a module. Say you are interested in the function comma() from lists.format (package lists, module formats). You can do:

import format.lists
# binds 'format', so usable like:
format.lists.comma()
 
 
from format import lists
# binds lists (and not format), so:
lists.comma()
 
 
from format import lists as L
# locally binds lists as L (and neither format or lists), so:
L.comma()
 
 
import format.lists as L
# same as the last
L.comma()
 
 
from format.lists import *
# binds all public names from lists, so:
comma()
 
 
from format.lists import comma
# binds only a specific member
comma()
 
 
from format.lists import comma as C
# like the last, but binds to an alias you give it
C()


None of this changes importing, it's only different in what names get bound, and all just personal taste. (e.g. I like to avoid from and as, forcing my own code to mention exactly where it gets is functions)

Packages

For context, modules are almost any python file file (where its filesystem name is not a syntax error in python code).

It's good practice to be clean in the files you'll import from, but there is no special status.


Packages are a little extra structure on top, an optional way to organize modules.

While modules correspond to files, packages correspond to directories containing modules.


A package is a directory with an __init.py__ file. In the real world, many are empty or contain only an informative docstring, because the most basic use of packages is just namespacing your modules in a useful way.

Beyond grouping things under a common name, it allows things like

  • running code when the package is first imported, usually for some initial setup
...put it in the __init__.py file
  • allowing selective import of the package's modules (forcing people to ask for some things explicitly)
in part just syntax and convenience, like from image import fourier


importing from packages
This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

The examples below assume a package named format with a module in it called lists, which you can quickly reproduce with:

mkdir format
echo 'print "format, __init__"'  > format/__init__.py
echo 'print "format, lists"'     > format/lists.py


In an import, everything up to the last dot has to be a package/subpackage, and the last part must be a module.

The package itself can also be imported, because a __init__.py file is a module that gets imported when you import the package (or something from it) and aliased as the directory name. With the test modules from the last section:

>>> import format.lists
format, __init__
format, lists

The import above bound 'format' at local scope, within which a member 'lists' was also bound:

>>> format
<module 'format' from 'format/__init__.py'>
>>> dir(format)
['__builtins__', '__doc__', '__file__', '__name__', '__path__', 'lists']
>>> format.lists
<module 'format.lists' from 'format/lists.py'>


Modules in packages are not imported unless you (or its __init__ module) explicitly do so, so:

>>> import format
format, __init__
>>> dir(format)
['__builtins__', '__doc__', '__file__', '__name__', '__path__']

...which imported no lists.


Note that when you create subpackages, inter-package references are resolved first from the context of the importing package's directory, and if that fails from the top package.

importing *, and __all__

Using import * is a special case.

(Also generally discouraged for namespace-pollution reasons)


For modules:

  • if there is no __all__ member, all the module-global names that do not start with an underscore are bound
  • if there is an __all__ member, only the names in that list are bound

So __all__ is useful when a programmer likes to minimize namespace cluttering from their own modules.


For packages, * is different; if __all__ were not present, python could only determine members based on filenames, but this would be unreliable on platforms such as Windows (as Windows deals with capitals on the filesystem somewhat creatively).

As consistent behaviour was preferred, for packages the process of importing only looks at __all__ in the package. If this is not present, there is no implicit binding.

For the example above, no __all__ means from format import * will only import format. When you add something like __all__= ['lists'] to the __init__.py, it will import the package as well as the modules listed there, and bind those modules in the package object.


Note that some people prefer to not use import * at all, since it clutters one's namespaces and makes name collisions more likely than if you have to explicitly define those collisions.

It may be well received to, if you have helper functions, to place those in a separate module so that import * from such modules will only import a set of well-named helper functions.


Note that the import keyword is mostly a wrapper around existing functions, which in a few cases you may want to use directly.


Freezing

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

(note: this is unrelated to package managers freezing a package, which is basically just listing packages and their versions, usually to duplicate elsewhere)


Freezing means your code is wrapped in an independently executable way, usually meaning a copy of the a python interpreter, and all external modules (it's analogous to static linking), and some duct tape to make it work (and independent from whatever pythons you have installed).


It often creates a relatively large directory, and doesn't really let you alter it later.

The main reason to do this is to have a self-contained copy that should run anywhere (in particular, it does not rely on an installed version of python) so is nice for packaging a production version of your desktop app.



To do this yourself, you can read things like https://docs.python.org/2/faq/windows.html#how-can-i-embed-python-into-a-windows-application


It's easier to use other people's tools.

Options I've tried:

  • cx_freeze
lin, win, osx
http://cx-freeze.sourceforge.net/
  • PyInstaller
lin, win, osx
can pack into single file
http://pyinstaller.python-hosting.com/
See also http://bytes.com/forum/thread579554.html for some get-started introduction

Untried:

lin, win, osx
  • py2exe [2] (a distutils extension)
windows (only)
can pack into single file
inactive project now?
  • Python's freeze.py (*nix) (I don't seem to have it, though)
mac OSX (only)
  • Gordon McMillan's Installer (discontinued, developed on into PyInstaller)


See also:


TODO: read:




Installation in user environments, and in dev environments

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

tl;dr

  • for system installs
you may want to prefer your distro's package manager - it's generally not as up to date. It should mix decently with pip, though (verify)
pip is generally easiest
for things available only as egg (not as wheel, and not on PiPI), you'll still need easy_install
  • for distinct stacks (dev, fragile apps)
consider pipenv, conda, and similar
sometimes doing virtualenv yourself is simpler/cleaner
  • creating and uploading packages
look at distribute (basically a nicer setuptools)



Packaging was initially a bit minimal, and pasted on. And initially just installation, which is cool and useful and all.

...yet in development (and rolling-delivery production), you want updateability, clean dependency management, and more.

This led to people making a few too many alternative, leading to some confusion.



Some attempt at a historical summary:


We had

  • distutils
standard library (2000ish)
  • setuptools (2004)
introduced eggs, easy_install
  • distribute (2008)
fork of setuptools, so also provides setuptools
If installing a thing does involves running setup.py (typically the instruction is to run
python setup.py install
), that's this.



PyPI is the python package index.

It's been the central store since around 2003

Submitting to PyPI

  • generally, setuptools(/distribute) is still useful

Installing things from PyPI

  • use generally pip, or pipenv if you prefer

(Not to be confused with PyPA (python packaging agency), created in 2011 to simplify the mess made up to that point)


Initially PyPI was a repository of links to zips elsewhere, which you would have to manually download, unpack, and either setup.py install (distutil stuff), or sometimes just copy the contents to site-packages.

Then came

  • easy_install (from setuptools, so ~2004)
  • easy_install (from distribute, so ~2008)
Name searches go to PyPI, you can also install downloaded egg.

Note that from the PyPI side, the executable nature of setup.py was awkward to build a package management (which is mostly metadata in the end) on top. (verify)


  • pip (2008)
easy_install replacement
can uninstall
downsides:
cannot install eggs (still can't, but can install wheel since shortly after wheel's introduction)
doesn't isolate different versions (verify)



wheel format is introduced (2013) as replacement for egg format. (TODO: figure out details) [4]


Also relevant is virtualenv (2007), because it allowed easy_install and pip to install into a separated environment.

Which is great for dev and (rolling-update) production, because it's the only way to get any reproduceability.

For users, if you can get your system install to accept all the libraries you want, it's arguably easier to avoid virtualenv, because it's not quite as friendly as its developer presents it.


pipenv, while quite recent (2017?), is (roughly) a cleaner, more integrated variant of pip + virtualenv + some of its support.



But devs want more separated software stacks. Initially people did this with a pip feature:

pip freeze > requirements.txt

on one side and on another host, and typically within a virtualenv, do:

pip install -r requirements.txt

Sensibly, people wanted this more practical and automted. So various things that help you integrate virtualenv - and often some package-dependency-metadata stuff and tooling around that. In these cases, you want separated software stacks



makes things more interesting yet, because ideally you want to install things into your specific project, not your system.



Buildout seems a little more focused on web dev, so there are some others that are for a more applied audience mostly academia (allowing e.g. non-python like compilers, and binaries like matlab) including:

  • buildout (2006) was initially designed for more repeatable installs
  • hashdist (2014?)
  • conda (2014?)
packaging core of anaconda, miniconda
separate from virtualenv, pip, etc though offering similar features
only binaries (doesn't build things itself)
not python-only



There are a bunch more footnotes to it, like details of helper libraries, underlying tools, but most of that you don't have to worry about, or only when putting fine polishing on creating/uploading packages.



See also:



More notes on

You may care about isolating shell environments


where import looks

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)


import
looks in all of
sys.path
, which is constructed as:
  • sys.path[0] is the directory that contains the thing that is invoking python (python executable if being run directly, more usually a script that invokes that) [5]
...or
''
(empty string) if python is invoked interactively, or the current directory cannot be determined(verify)


  • entries via the implicit import of site [6] (which is imported around interpreter startup)
does a few different things, including
combining the compiled-in values of sys.prefix and sys.exec_prefix (often both /usr/local) with a few suffixes, roughly lib/pythonversion
setting PYTHONHOME overides that compiled-in value with the given value.[7]
fully isolated (see virtualenv) or embedded[8] pythons may want to do this. Most other people do not.
...in part because of what it doesn't mean for subprocesses calls
use of virtualenv overrides these


  • entries from PYTHONPATH
https://docs.python.org/release/3.2.3/using/cmdline.html#envvar-PYTHONPATH
Intended for private libraries, when you have reason to not install them into a specific python installation (...or you can't)
avoid using this to
switch between python installations - that is typically better achieved by calling the specific python executable
import specific modules from another python installation - just install it into both


  • entries you add in code (before the relevant import), be it from
    • simple sys.path.append() (docs: "A program is free to modify this list for its own purposes."[9])
    • site.addsitedir() (see notes on site below)


On site

Site is handy flexible for packaged stuff.

site.addsitedir()

mostly just looks for and processes .pth files.
(Which site does for site-packages/site-python dirs when it is imported)
and is otherwise pretty equivalent to adding to


https://pymotw.com/2/site/

site's behaviour includes:

  • .pth files (from site-packages only)
see https://docs.python.org/2/library/site.html
  • using site.addsitedir()
Like the last, but considers .pth files where applicable ()
and sitecustomize
  • sitecustomize.py, which adds via site looks for in each sys.path entry
was intended for platform-specific things, development tools
  • usercustomize.py
like sitecustomzize, but in a user's dir, and loaded after
you'ld do something like
site.addsitedir( os.path.expanduser(os.path.join('~', 'mypy')) )


Other considerations

You often want to choose to isolate yourself from the system python, or not.


If you customize some of this, you need to think harder about subprocess calls to your own python scripts.

os.environ['PYTHONPATH'] = ':'.join(sys.path)


This technically means the behaviour varies between

  • invoked with -m
  • invoked with -c
  • invoked interactively
  • invoked as script


https://stackoverflow.com/questions/15208615/using-pth-files

https://docs.python.org/3/library/site.html

https://docs.python.org/release/3.2.3/using/cmdline.html#envvar-PYTHONHOME

https://leemendelowitz.github.io/blog/how-does-python-find-packages.html -->

eggs, zips, eggdirs, wheels

any of this and docker

Creating packages

See also



See also