Python usage notes/Filesystem stuff
Syntaxish: syntax and language · changes and py2/3 · decorators · importing, modules, packages · iterable stuff · concurrency · exceptions, warnings
IO: networking and web · filesystem Data: Numpy, scipy · pandas, dask · struct, buffer, array, bytes, memoryview · Python database notes Image, Visualization: PIL · Matplotlib, pylab · seaborn · bokeh · plotly
Stringy: strings, unicode, encodings · regexp · command line argument parsing · XML speed, memory, debugging, profiling · Python extensions · semi-sorted |
Normalized, absolute, and real paths (and symlinks)
os.path.normpath(path)
- works only on the string
- does NOT do any syscalls / IO
- deals with multiple slashes
- deals with things like . and ..
- does NOT make the path absolute
- does NOT follow symlinks
- some behaviour is OS-specific
- Examples:
- normpath('') == '.'
- normpath('././/a/b) == 'a/b'
- normpath('./a/../..') == '..'
os.path.abspath(path)
- does do IO syscall (to get the interpreter's current working directory)
- combine that current working directory with given relative path
- produce an absolute path (also normalizing in the process(verify))
- does NOT follow symlinks
- (...so mostly amounts to os.path.normpath( os.path.join(os.getcwd(), givenpath) ))
os.path.realpath(path)
- produces an absolute path (normalizing in the process) (some people note it doesn't necessarily, but at least py2.7 code does so - possible misconception, possible change?)(verify)
- follows symlinks (if applicable to the OS/filesystem) (verify)
- does do IO/syscall (to follow symlinks -- so basically lstat and readlink)
- Does NOT resolve symlinks that are relative to the directory the symlink is in --
- ...so if you are not changing the cwd to follow along, you will probably do the wrong thing. (verify)
(so) If you want to resolve symlinks properly, none of the above are enough (verify)
- You'll need to resolve to that dir (so by themselves, neither realpath or abspath are correct)
- You'll probably want to os.path.join() the dir the symlink is in with the result from os.readlink(), then os.path.normpath() it.
There are still some edge cases, like '//'
os.walk()
os.walk() does a recursive directory walk, yielding a triple for each directory:
- that directory's path
- a list of the directories it contains
- a list of the files it contains (or rather, all nondirectory entries)
For example:
for curdir, direntries, fileentries in os.walk('/proc/tty'):
print curdir, direntries, fileentries
#on my system, this prints:
/proc/tty ['driver', 'ldisc'] ['drivers', 'ldiscs']
/proc/tty/driver [] ['serial']
/proc/tty/ldisc [] []
Notes:
- Both direntries and fileentries are relative to curdir.
- That is, entries from them have no slashes in them
- and should be joined using os.path.join(curdir,entry)
- You may often want to use abspath (or realpath) the directory
- probably easiest to do to the thing you feed to walk()
- If you do not then the resulting paths may be relative, which can cause trouble in some situations (mostly related to the interpreter's curdir).
- direntries is in part a 'know ahead of time what we will be walking later', and a lot of code doesn't use it
- You can tell it not to walk into specific directories by remove()ing enties from the directory list (direntries above).
- You want to alter the original list. Assigning a new list (e.g. slicing onto the variable reference) won't do what you want.
- don't remove while iterating, it won't remove everything
- You can os.walk non-recursive by immediately remove()ing all directories (don't set it to [], it won't work.). Note that you mostly have listdir() at this point.
- will stat()s all directory entries, to know which are directories and which are files
- If you want "avoid statting previously handled stuff" logic (which always risks going out of sync with the filesystem!), you'll have to build your own, probably using os.listdir. hint: taking walk() from os.py is a good start
For example, the following avoids hidden directories, and per directory prints all absolute paths to the first two files it sees.
import os
for curdir, dirs, files in os.walk( os.path.abspath('.') ):
print "ENTERING %r"%curdir
# Example: don't recurse into dot-directories, e.g. when picking out real files from homedirs
# and/or ignore most versioning system metadata
# iterates a copied list, because we (need to) alter the actual one
for dirname in list(dirs):
if dirname[0]=='.':
dirs.remove(dirname)
for filename in files[:2]:
fullpath = os.path.join(curdir,filename)
print ' FILE: %r'%fullpath
os.listdir
glob
Home directory (cross-OS)
The current user's home directory is probably the easiest directory you can get that is pretty much guaranteed to be writable.
It looks like the following 'get home directory of current user' code works on both in linux and in windows. In windows it seems to point to the user's folder under Documents and Settings.
os.path.expanduser('~')
When, in Windows, you care about the difference between local and roaming profile content (a distinction that only applies when you use network logins), then you likely need to do API calls through PyWin32 to get these. If we specifically want a directory in the roaming profile, you could ask for the roaming 'Application Data' directory (CSIDL_APPDATA asks for the roaming one, CSIDL_LOCAL_APPDATA for the local one):
try:
from win32com.shell import shellcon, shell
datapath = shell.SHGetFolderPath(0, shellcon.CSIDL_APPDATA, 0, 0)
except ImportError: # semi-nasty fallback for non-windows or when don't have win32com
datapath = os.path.expanduser("~")
As to the CSIDL constants, see MSDN, also helpful to figure out what all those directories in your profile actually are in the first place.
Start directory, current directory, module directory, path of started script
Said directories may all be the same, or may all be different, if you consider the main script as well as imported modules.
Observations:
- sys.argv[0] and __file__ may be relative (to cwd), so abspath is a good idea
- abspath/realpath solutions are usually equivalent, except in the presence of symlinks
- Windows does things a little differently, which seems to mean that abspath/realpath is a good idea anyway
- for eggs, executable-packaged (frozen) code and such, these may not carry the meaning you want
- An interactive shell behaves differently (but this is rarely relevant to module code)
To test with a moderately complicated case:
- given a /usr/local/bin/ptest.py
- which is actually a symlink to /usr/local/bin/foop/ptest.py
invoked from /tmp by typing ptest.py and relying on PATH | invoked from /usr/local by typing bin/ptest.py | |
---|---|---|
os.getcwd() | /tmp | /usr/local |
sys.argv[0] | /usr/local/bin/ptest.py | bin/ptest.py |
abspath(sys.argv[0]) | /usr/local/bin/ptest.py | /usr/local/bin/ptest.py |
dirname(abspath(sys.argv[0])) | /usr/local/bin | /usr/local/bin |
dirname(realpath(sys.argv[0])) | /usr/local/bin/foop | /usr/local/bin/foop |
sys.path[0] | /usr/local/bin/foop | /usr/local/bin/foop |
__file__ | /usr/local/bin/ptest.py | bin/ptest.py |
abspath(__file__): | /usr/local/bin/ptest.py | /usr/local/bin/ptest.py |
realpath(__file__) | /usr/local/bin/foop/ptest.py | /usr/local/bin/foop/ptest.py |
dirname(abspath(__file__)) | /usr/local/bin | /usr/local/bin |
dirname(realpath(__file__)): | /usr/local/bin/foop | /usr/local/bin/foop |
Note: Things work differently in frozen apps, eggs (?), embedded interpreters (e.g. mod_wsgi), and such.
In the case of importing from still-packed .eggs, __file__ may look like /home/me/the.egg/the/__init__.py which is just the importer's way of telling you it unpacked this from the /home/me/the.egg file.