Python usage notes - import related stuff
Syntaxish: syntax and language · type stuff · changes and py2/3 · decorators · importing, modules, packages · iterable stuff · concurrency · exceptions, warnings
IO: networking and web · filesystem Data: Numpy, scipy · pandas, dask · struct, buffer, array, bytes, memoryview · Python database notes Image, Visualization: PIL · Matplotlib, pylab · seaborn · bokeh · plotly
Stringy: strings, unicode, encodings · regexp · command line argument parsing · XML speed, memory, debugging, profiling · Python extensions · semi-sorted |
Importing and binding, runtime-wise
When you do:
from module import item as name
then, binding-wise, it amounts to:
import module as _tempname name = _tempname.item del _tempname
As far as the machanics of that first line:
- see if the name is already in sys.modules. If so, return that
- If not...
- make importlib.finder find it
- most modules come from disk,
- but some come from built-ins, and you can also do things like zip files, network, and more
- creates a ModuleType, e.g. annotating where it came from
- runs the module contents, capture what that defined (module locals(verify)) into what
- adds the result to sys.modules (a dict from name to these things)
In general, importing may include:
- explicit module imports: you typing import something in your code
- implicit module imports:
- site-specific details
- anything imported by modules you import
- and package-specific details (see */__all__)
- binding the module, or some part of it, as a local name
Module imports are recorded in sys.modules, which allows Python to import everything only once.
All later imports fetch the reference to the module from that cache and only bind it in the importing scope.
Stuff you can do as a result
Specifying import fallbacks
Because importing amounts to just a 'find file, run file, bind result', if that breaks off due to an exception, that mostly leaves things unaltered.
So you can do fallback import tricks like:
import StringIO
import cStringIO as StringIO
except ImportError:
Only if that try succeeds will StringIO be re-bound.
A reference to the module you're in
Note that some of the reasons you may want this are not necessary.
- Say, if you need only the names of the members, you can use dir() without arguments
If you do know you really need it, there are a few ways of getting that reference.
- Consider sys.modules[__name__], preferable in the sense that this doesn't need anything context-specific.
- (The variable __name__ is defined in each module and package (it will be '__main__' if the python file itself is run as a script, or you are running python interactively).
- Another way is to import the current module by its own name, which (because my then the module is already loaded), has the net effect of just returning that reference (and binding it to a name).
- There are a few details to this, including:
- you shouldn't do this before the module is done loading - primarily, don't do this at module-global scope(verify)
- may have specific edge cases when importing names from inside other things. Say...
- will work for packages, by its name as well as by __init__, but there is a difference between those two (possible confusion you may want to avoid): the former will only be a bind, while the latter is a new name so may cause a load, which might pick the pyc file that Python created, so while it should be the same code it may not be id()-identical ( case that matters to your use)
Binding specific names from a module
You can import a module, which gets you a reference to the module object as a whole.
Optionally, you can take a few things from within that module, and bind it in another scope.
This is mostly personal taste - it does not change what gets evaluated during import.
As mentioned a secton or two above,
from module import item as name
mostly just means
import module as _atempname name = _atempname.item del _atempname
The import is the same (including the 'oh just take it from sys.modules if it was already loaded' behavior), and instead of binding the module name, we bind only something something from inside it.
There is nothing special about it, except maybe you just don't care to have some extra names littering the namespace you do this in.
You have more alternatives, that
in terms of importing and using, are practically the same,
but differ primarily in what names you get.
Say you are interested in the function comma() from lists.format (package lists, module formats).
You can do:
import format as fm # people seem to like this for things like numpy, because it saves typing for something you reference a lot
# binds 'fm', so usable like:
import format.lists
# binds 'format', so usable like:
from format import lists
# binds lists (and not format), so:
from format import lists as L
# locally binds lists as L (and neither format or lists), so:
import format.lists as L
# same as the last
from format.lists import *
# binds all public names from lists, so:
comma() # and who knows what else is now in your scope
from format.lists import comma
# binds only a specific member
from format.lists import comma as C
# like the last, but binds to an alias you give it
For context, a module is a python file that can be imported, and any python file can be a module.
The requirements amount to
- the filename must not be a syntax error when mentioned in code
- needs a .py extension (you can alter the import system to get around it, but the default behaviour mostly looks for .py)
Packages are a little extra structure on top of modules, an optional way to organize modules.
A package is a directory with an __init.py__ file
-, typically some module files that you are organizing into that package
A package doesn't have special status, except in how the import system handles further importing and name resolution.
What should be in
If you are only using packages to collect modules in a namespacing tree structure sort of way, you can have be entirely empty
If you like to run some code when the package is first imported (usually for some initial setup), that can go in
- however, as this amounts to module state, this can lead to a shared-global situation - that many libraries may wish to intentionally avoid
You could put all code in -- but that is just an awkward way to make what amounts to module.
- ...with a weird internal name.
If you want to use a package as an API to more complex code, and/or selective loading of package contents, read on
- If you leave empty, that means users have to explicitly import all modules inside the package.
- upsides
- very selective loading, which is good for users if loading of some modules is heavy and optional
- downsides
- lots of typing for users
- users have to know module names already, as the package won't tell them
- itself mostly imports all modules' contents into the package scope
- upsides
- doesn't require you to import parts you know exist before you can use them before use
- downsides
- doesn't allow user to load only a small part; this changes naming/namespacing but little else
- imports parts from submodules
- upsides
- Lets that package provide an interface, to modules that the user doesn't need to know about
- for the same reason, makes it easier to reorganize without changing that interface
- downsides
- can be messier, in that it's less
Note that __all__ is, instead, primarily about binding, not importing(verify)
Relative imports
importing *, and __all__
Using import * from something / from something import * asks python to import all names from a module or package,
and bind them in the scope you said that in.
Within user code, importing * may be bad style,
mostly because you cannot easily know what names you are importing into your scope (they are not mentioned in your code, and may change externally),
so in the long run you won't know what name collisions that may lead to.
This problem is not actually common, but that itself is why it can lead to very confusing, well hidden bug once it happens (particularly as same-named things will often do similar things).
It should be much less of an issue for packages to import * from their modules, because and should be well known to the package's programmer what names that imples.
- Side note: the package's code doesn't tell you want names will be there at runtime. This can be annoying when users are trying to e.g. find a particular function's code.
import * around modules
- if there is no __all__ member in a module, all the module-global names that do not start with an underscore (<text>_</text>) are bound
- which means that basic modules can use underscore variable to have global-like things that won't pollute others
- if there is an __all__ member in a module, it will bind those things
- lets a programmer minimize namespace cluttering from their own modules. If people want to throw a lot of names onto a pile, they have to work for it
import * around packages
- if there is no __all__ member, it only picks up things already bound in this package scope (except underscore things)
- if there is an __all__ member, it seems to go through that list and
- if that name is not an attribute bound in the package, try to import it as a module.
- if that name is an attribute already bound in the module, don't do anything extra
- binds only names in that list
You can mix and match, if you want to confuse yourself.
For example, consider a package like:
__all__ = ['sub1', 'myvar'] __version__ = '0.1' import sys myvar = 'foo' othervar = 'bar' import sub2
- always import sub2 (it's global code)
- import sub1 only when you do an import * from this package, not when you import the package itself
- bind sub1 and myvar, but not othervar or sub2 or sys
- if __all__ includes a name that is neither bound or can be imported, it's an AttributeError
This ties into the "are you providing distinct things" discussion above
in that you problably don't want explicit imports then.
importing from packages
The examples below assume a package named format with a module in it called lists.
To experiment yourself to see when things happen, try:
mkdir format
echo 'print "format, __init__"' > format/
echo 'print "format, lists"' > format/
In an import, everything up to the last dot has to be a package/subpackage, and the last part must be a module.
The package itself can also be imported, because a file is a module that gets imported when you import the package (or something from it) and aliased as the directory name. With the test modules from the last section:
>>> import format.lists
format, __init__
format, lists
The import above bound 'format' at local scope, within which a member 'lists' was also bound:
>>> format
<module 'format' from 'format/'>
>>> dir(format)
['__builtins__', '__doc__', '__file__', '__name__', '__path__', 'lists']
>>> format.lists
<module 'format.lists' from 'format/'>
Modules in packages are not imported unless you (or its __init__ module) explicitly do so, so:
>>> import format
format, __init__
>>> dir(format)
['__builtins__', '__doc__', '__file__', '__name__', '__path__']
...which id not import lists.
Note that when you create subpackages, inter-package references are resolved first from the context of the importing package's directory, and if that fails from the top package.
Where import looks
import looks in all of sys.path.
Which is only a partial answer, because
- sys.path is altered during intepreter startup, by a few different things, before your code gets control.
- And your code might add some more.
- e.g. via sys.path.append() (docs: "A program is free to modify this list for its own purposes."[1])
- ...but this is too late to do some things cleanly - and frankly you rarely get exact control of the order in which all that you import gets imported
The parts before you get control are mostly:
- it tries to put the invoking python's directory on front of sys.path (i.e. sys.path[0]) so that it takes precedence
- otherwise left as '' which signifies the current directory(verify)
- both meaning you can put supporting modules alongside your script
- it adds entries from PYTHONPATH
- this seems to come after the above, before anything else (verify)
- Intended for private libraries, when you have reason to not install them into a specific python installation (...or you can't), or perhaps to override with a specific version(verify)
- avoid using this to switch between python installations - that is typically better achieved by calling the specific python executable (to get its configured overall path config)
- avoid using this to import specific modules from other python installations (just install it into both is likely to be cleaner and lead to less confusion)
- it does an import site[2], which (unless -s or -S blocks things) leads to roughly four things:
- add site packages (via site.addsitedir)
- combines the compiled-in values of sys.prefix and sys.exec_prefix (often both /usr/local) with a few suffixes, roughly ending up on lib/pythonversion/site-packages (verify) (except on debian/ubuntu, see note below. There may be distros that customise
- setting PYTHONHOME overrides that compiled-in value with the given value.[3]
- fully isolated (see virtualenv) or embedded[4] pythons want to do this. Most other people do not, in part because of what it doesn't mean for subprocesses calls
- add site packages (via site.addsitedir)
- add user site packages (via site.addsitedir) (see also PEP370, since roughly py2.6(verify), was introduced allow installation into homedirs without requiring virtualenv)
- similar, but looking within ~/.local (platform-specific, actually) so ending up with something like ~/.local/lib/pythonversion'/site-packages
- which is added only if it exists and has an appropriate owner.
- that user base (~/.local) can be overridden with PYTHONUSERBASE
- add user site packages (via site.addsitedir) (see also PEP370, since roughly py2.6(verify), was introduced allow installation into homedirs without requiring virtualenv)
- import sitecustomize
- e.g. for site-wide development tools like profiling, coverage, and such
- note that 'site' is still a specific python installation
- import sitecustomize
- import usercustomize (after sitecustomize)
- e.g. for user-specific development tools like profiling, coverage, and such
- can be disabled (e.g. for security), e.g. -s parameter avoids adding the user site directory to sys.path (and also sets site.ENABLE_USER_SITE to False)
- import usercustomize (after sitecustomize)
- If you customize some of this, you need to think hard about how scripts run via subprocess won't or will get the same alterations.
- pip installs into
- if a virtualenv is enabled: your virtualenv's site-packages
- if using --user, your user site-packages
- otherwise (i.e. default): the system site-packages
- if run without sudo, it should fail and suggest --user
- site.addsitedir() amounts to
- sys.path.append() plus
- looking for and handling *.pth files
- .pth files are intended to include further paths (allows namespace-ish things).
- apparently intended for slightly more intelligent search path alteration, where necessary, and must start with 'import'
- ...but since it allows semicolons you can abuse this for varied inline code
- There is a move to design this away - once all the useful things it is used for (package version selection, virtual environment chaining, and some others) exist properly
- e.g. take a peek at your site-package's .pth files
- dist-packages is not standard python, it is a debian/ubuntu convention
- these say that python packages installed via apt go into dist-packages, and the site-packages directory that would normally be the target is not used
- pip will be installed that way, and also install there
- this is intended so that admin installing things with with apt or pip go to a specific system directory (dist-utils), while e.g. your own custom compiled python won't know about this and install into its site-packages.
- Which is considered better isolated. But not necessarily very clear to understamd.
- dist-packages is baked into the system python's
See also:
Packaging python packages
Probably see Python packaging, which is also contextualized around virtual environments.