Python usage notes/Filesystem stuff

From Helpful
Jump to navigation Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

Syntaxish: syntax and language · changes and py2/3 · decorators · importing, modules, packages · iterable stuff · concurrency

IO: networking and web · filesystem

Data: Numpy, scipy · pandas, dask · struct, buffer, array, bytes, memoryview · Python database notes

Image, Visualization: PIL · Matplotlib, pylab · seaborn · bokeh · plotly


Tasky: Concurrency (threads, processes, more) · joblib · pty and pexpect

Stringy: strings, unicode, encodings · regexp · command line argument parsing · XML

date and time


Notebooks

speed, memory, debugging, profiling · Python extensions · semi-sorted

Filesystem related

Normalized, absolute, and real paths (and symlinks)

os.path.normpath(path)

  • works only on the string
  • does NOT do any syscalls / IO
  • deals with multiple slashes
  • deals with things like . and ..
  • does NOT make the path absolute
  • does NOT follow symlinks
  • some behaviour is OS-specific
  • Examples:
    • normpath('') == '.'
    • normpath('././/a/b) == 'a/b'
    • normpath('./a/../..') == '..'


os.path.abspath(path)

  • does do IO syscall (to get the interpreter's current working directory)
  • combine that current working directory with given relative path
  • produce an absolute path (also normalizing in the process(verify))
  • does NOT follow symlinks
  • (...so mostly amounts to os.path.normpath( os.path.join(os.getcwd(), givenpath) ))


os.path.realpath(path)

  • produces an absolute path (normalizing in the process) (some people note it doesn't necessarily, but at least py2.7 code does so - possible misconception, possible change?)(verify)
  • follows symlinks (if applicable to the OS/filesystem) (verify)
  • does do IO/syscall (to follow symlinks -- so basically lstat and readlink)
  • Does NOT resolve symlinks that are relative to the directory the symlink is in --
...so if you are not changing the cwd to follow along, you will probably do the wrong thing. (verify)


(so) If you want to resolve symlinks properly, none of the above are enough (verify)

  • You'll need to resolve to that dir (so by themselves, neither realpath or abspath are correct)
  • You'll probably want to os.path.join() the dir the symlink is in with the result from os.readlink(), then os.path.normpath() it.


There are still some edge cases, like '//'

os.walk()

os.walk() does a recursive directory walk, yielding a triple for each directory:

  • that directory's path
  • a list of the directories it contains
  • a list of the files it contains (or rather, all nondirectory entries)

For example:

for curdir, direntries, fileentries in os.walk('/proc/tty'):
    print curdir, direntries, fileentries
#on my system, this prints:
/proc/tty ['driver', 'ldisc'] ['drivers', 'ldiscs']
/proc/tty/driver [] ['serial']
/proc/tty/ldisc [] []


Notes:

  • Both direntries and fileentries are relative to curdir.
That is, entries from them have no slashes in them
and should be joined using os.path.join(curdir,entry)
  • You may often want to use abspath (or realpath) the directory
probably easiest to do to the thing you feed to walk()
If you do not then the resulting paths may be relative, which can cause trouble in some situations (mostly related to the interpreter's curdir).
  • direntries is in part a 'know ahead of time what we will be walking later', and a lot of code doesn't use it
  • You can tell it not to walk into specific directories by remove()ing enties from the directory list (direntries above).
You want to alter the original list. Assigning a new list (e.g. slicing onto the variable reference) won't do what you want.
don't remove while iterating, it won't remove everything
You can os.walk non-recursive by immediately remove()ing all directories (don't set it to [], it won't work.). Note that you mostly have listdir() at this point.
  • will stat()s all directory entries, to know which are directories and which are files
If you want "avoid statting previously handled stuff" logic (which always risks going out of sync with the filesystem!), you'll have to build your own, probably using os.listdir. hint: taking walk() from os.py is a good start


For example, the following avoids hidden directories, and per directory prints all absolute paths to the first two files it sees.

import os
for curdir, dirs, files in os.walk( os.path.abspath('.') ):
    print "ENTERING %r"%curdir
    
    # Example: don't recurse into dot-directories,  e.g. when picking out real files from homedirs
    #   and/or ignore most versioning system metadata
    # iterates a copied list, because we (need to) alter the actual one 
    for dirname in list(dirs): 
        if dirname[0]=='.':  
            dirs.remove(dirname)
    
    for filename in files[:2]:
       fullpath = os.path.join(curdir,filename)
       print '  FILE: %r'%fullpath


os.listdir

glob

Home directory (cross-OS)

The current user's home directory is probably the easiest directory you can get that is pretty much guaranteed to be writable.


It looks like the following 'get home directory of current user' code works on both in linux and in windows. In windows it seems to point to the user's folder under Documents and Settings.

os.path.expanduser('~')



When, in Windows, you care about the difference between local and roaming profile content (a distinction that only applies when you use network logins), then you likely need to do API calls through PyWin32 to get these. If we specifically want a directory in the roaming profile, you could ask for the roaming 'Application Data' directory (CSIDL_APPDATA asks for the roaming one, CSIDL_LOCAL_APPDATA for the local one):

try:
    from win32com.shell import shellcon, shell            
    datapath = shell.SHGetFolderPath(0, shellcon.CSIDL_APPDATA, 0, 0)

except ImportError: # semi-nasty fallback for non-windows or when don't have win32com
    datapath = os.path.expanduser("~")

As to the CSIDL constants, see MSDN, also helpful to figure out what all those directories in your profile actually are in the first place.

Start directory, current directory, module directory, path of started script

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

Said directories may all be the same, or may all be different, if you consider the main script as well as imported modules.

Observations:

  • sys.argv[0] and __file__ may be relative (to cwd), so abspath is a good idea
  • abspath/realpath solutions are usually equivalent, except in the presence of symlinks
  • Windows does things a little differently, which seems to mean that abspath/realpath is a good idea anyway
  • for eggs, executable-packaged (frozen) code and such, these may not carry the meaning you want
  • An interactive shell behaves differently (but this is rarely relevant to module code)


To test with a moderately complicated case:

given a /usr/local/bin/ptest.py
which is actually a symlink to /usr/local/bin/foop/ptest.py
invoked from /tmp by typing ptest.py and relying on PATH invoked from /usr/local by typing bin/ptest.py
os.getcwd() /tmp /usr/local
sys.argv[0] /usr/local/bin/ptest.py bin/ptest.py
abspath(sys.argv[0]) /usr/local/bin/ptest.py /usr/local/bin/ptest.py
dirname(abspath(sys.argv[0])) /usr/local/bin /usr/local/bin
dirname(realpath(sys.argv[0])) /usr/local/bin/foop /usr/local/bin/foop
sys.path[0] /usr/local/bin/foop /usr/local/bin/foop
__file__ /usr/local/bin/ptest.py bin/ptest.py
abspath(__file__): /usr/local/bin/ptest.py /usr/local/bin/ptest.py
realpath(__file__) /usr/local/bin/foop/ptest.py /usr/local/bin/foop/ptest.py
dirname(abspath(__file__)) /usr/local/bin /usr/local/bin
dirname(realpath(__file__)): /usr/local/bin/foop /usr/local/bin/foop

Note: Things work differently in frozen apps, eggs (?), embedded interpreters (e.g. mod_wsgi), and such.

In the case of importing from still-packed .eggs, __file__ may look like /home/me/the.egg/the/__init__.py which is just the importer's way of telling you it unpacked this from the /home/me/the.egg file.

File reading