Python usage notes/Subprocess

From Helpful
Jump to: navigation, search
Various things have their own pages, see Category:Python. Some of the pages that collect various practical notes include:

subprocess module

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

tl;dr:

  • available in ≥py2.4
aims to replace most earlier things (os.popen, os.system, os.spawn, or the commands or popen2 modules), and is more predictable cross-platform than some of those.
  • You usually want to use the subprocess.Popen class
(It also has subprocess.call(), which is slightly shorter when you can wait for it to finish. It just creates a Popen object and wait()s on it)
  • shell=
False: an array of strings to be handled more execv style - a little leaner because it avoids that extra process, be less bothersome with escaping, and and more secure against injection attacks
True: hand in single string to be parsed by the shell it is run in. Can be more predictable behaviour-wise, or just lazier in general
  • if you want stdin, stdout, and/or stderr to go to you rather than the terminal, you need to specify that
then you can read() and write(), and/or use the gathered results
  • if you can wait until it's done, use communicate()
handles stdin/stdout/stderr in the same line
most other options (waiting/polling, reading in chunks rather than a single blob) are also valid, but more work
  • if you need to interact with it, even just read output, then blocking calls are an issue. Read up.


command in single string or array, and shell=True/False

single string and shell=True

You are handing this string to the shell to parse and execute.
Can e.g. include multiple commands, pipes, and other shell features.
Gives the same escaping trouble as you would typing commands into a normal shell.
be careful to sanitize strings. Someone can try to exploit the following example with, say, name = '"; cat /etc/shadow; echo "'
Lets you write:
span style="color: #483d9b;">'ps ax | grep "%s"'


Array of strings and shell=False

Safer, but often a little more code.
Some things will notice they're not running in a shell and act differently.
(In a few cases you can only get sensible behaviour with the variant above)
The previous example in this style would be something like:
span style="color: #483d9b;">'ps','ax''grep'


The other two combinations don't make sense

An single string with shell=False is equivalent to placing that string into a single-item list - it won't work unless it's a single command
A sequence with shell=True seems to use args[0] and ignore the rest.


Popen constructor arguments

  • args, which can be either a single string or a sequence of argument strings, see above
  • shell=False (execute through the shell, /bin/sh). Note that
    • many programs don't need shell=True, but it may be simpler for you when you use shell features like wildcards, pipes, here documents and whatnot.
    • shell=True means characters special to the shell are interpreted (may be a pain to escape them)


  • stdin=None, stdout=None, stderr=None, each of which can be
    • None: no redirection, usually meaning they stay tied to the shell that python was started from
    • subprocess.PIPE - meaning you get an object you can read() from (for stdout, stderr) or write() to (for stdin)
    • a file object, or file descriptor (integer)
    • also, you have the option of merging stderr and stdout (specify stderr=subprocess.STDOUT)
  • bufsize=0 (applies to stdin, stdout, and stderr (verify) if subprocess fdopen()s them)
    • 0 means unbuffered (default)
    • 1 means line buffering
    • ≥2 means a buffer of (approximately) that size
    • -1 / negative values imply system default (which often means fully buffered)
  • env=None
    • None (the default) means 'inherit from the calling process'
    • You can specify your own dict, e.g. copy os.environ and add your own
  • cwd=None
    • if not None, is taken as a path to change to. Useful for cases where files must be present in the current directory.
    • Does not help in finding the executable
    • does not affect the running python program's cwd (verify)
  • executable=None
    • When shell=True, you can specify a shell other than the default (/bin/sh on unix, the value of COMSPEC on windows)
    • When shell=False, you can specify the real executable here and use args[0] for a display name.


  • universal_newlines=False
    • If True, '\n', '\r', and '\r\n' in stdout and stderr arrive in python as '\n'.
    • (Depends on code that may, in fairly rare cases, not be compiled into python)
  • preexec_fn=None
    • call a python callable in subprocess, before the external call. Unix only.
  • close_fds=False
    • Closes all file handles left open (other than stdin/stdout/stderr (0/1/2))
  • startupinfo and creationflags
    • Windows-only

Popen object members

  • poll() for child process completion. Handy when you want to watch several sub-processes, or do stuff asynchronously.
    • returns process return code if it's done,
    • returns None if it's not
  • wait(), for child process completion. Returns the process's return code.
  • communicate(input=None)
    • ...sends input string, if specified, reads stdout and/or stderr into memory, returns those two as strings on process termination
    • see below for more detail
  • stdin:
    • file object if you constructed Popen with stdin=PIPE
    • None if you didn't (default)
    • (note: using communicate() is often simpler if you need only a single interaction with the process)
  • stdout,stderr (note: can be ignored if you use communicate())
    • file objects if you constructed Popen with stdout=PIPE / stderr=PIPE
    • None if you didn't (default)


  • pid: child process Process ID
  • returncode:
    • After the child process has terminated, its return code.
    • Before that, None.
    • On unix, negative values signal termination by signal numbered abs() of this.
    • if shell=True and the shell couldn't find the executable, it will probably return 127 (see
      man sh
      ,
      man bash
      ) (if shell=False, it will have raised an OSError)



On output

By default, the subprocess object's stdout and stderr streams are not touched, typically meaning they go to the underlying shell.

If you redirect with PIPE, you connect them to file handles in the the calling python process(verify), exposed as file objects stderr and stdout on the relevant subprocess object.

And e.g. communicate() will read from them and return strings.


Beware if both stdout and stderr are PIPEd, in that blocking reads can create deadlock situations.

Not specific to python - actually largely because python's read() and readline() are eventually libc's read() which (at least by default) will block indefinitely until some data is present.

The potential deadlock lies in that you can be waiting on the wrong stream indefinitely. The situations are relatively rare and specific, e.g. the contained process waiting on a blocking write() due to buffer limits, while you are blocking waiting on its stderr, which wouldn't come or EOF until that process could go on (because if the process isn't blocked, it's likely to say something, or finish).

wait() apparently has a similar issue.(verify)


Your choices:

  • For a short-running program,
    communicate()
    is easiest, which just gives you the output as strings (and is free from the deadlock trouble).
  • If you want to react as things happen or know the output might be huge, then you want to stream.
    • one possible escape is merging the two streams using
      stderr=subprocess.STDOUT, stdout=subprocess.PIPE
      - though how clean that happens depends on how exactly the underlying process flush. It's fine in most situations, but technically separating them is cleaner.
    • If you must have them separately, options include:
      • use threads (can be fairly brief and elegant, look at communicate()'s implementation)
      • you can use select(), though the logic is a little long (also, is it crossplatform?(verify))
      • use O_NONBLOCK, though that changes how your logic should work


Keep in mind that for text utilities, readline() waits for \n (python adheres to POSIX newlines), not \r. Easiest way around that seems universal_newline=True (looks for the others, and translates them for you - see also PEP 278)


On buffering

Keep in mind that (and most of this is not python-specific)

  • stdin is buffered
  • stdout is typically buffered, or line buffered on shells
  • stderr may not be buffered
buffering at all means there is no true order to what comes in on these two streams (unless you remove all buffering (usually hinders performance) and the program isn't threaded)
  • ...this applies within each process. You can often not control how a program buffers or flushes.
  • a pipe also represents a buffer


bufsize: see its mention in argument list above. This is basically the buffering applied on python's size if and when it uses fdopen() on the underlying file descriptor. (verify)


It seems that iterating over a file object for its lines, i.e.

for line in fileobject:

adds its own buffering within the iterator(verify), so if you want more realtime feedback, either add your own while loop around readline - or get the same behaviour via:

for line in iter(fob.readline, ): #note: on py3 that should typically be b


Usage notes, examples

wait()ing

Handy when you want to block until the subproces quits.

span style="color: #483d9b;">"ps ax | grep %s"


communicate()ing

Handy convenience function when you want to block and handle input and output data: The communicate() function sends in data, wait()s for the process to finish, and returns stdout and stderr, as pipes, strings, or whatnot. Example:

span style="color: #483d9b;">"sendmail -t -v"


If you want to watch several sub-processes, you'll be interesting in poll() (returns return code, or None if not yet finished(verify)).


PATH and environment

You can rely on PATH (and other inherited environment(verify)) whether shell=True or False -- unless it is explicitly cleared for some reason (sudo, some embedded interpreters).

On errors

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

The most common error to catch is probably 'command not found', which falls under the OSError exception.



On signals

Replacing older styles with subprocess

For details, see http://docs.python.org/release/2.5.2/lib/node533.html


Summary of that:

  • Most of the previous styles rely on shell parsing, so the easiest method is to pass in the string as before and set shell=True
    • ...except os.spawn*, it's list-based. If you're using this, you probably want to read up on the details anyway.
    • ...and popen2.popen[234] in cases where you give it a list (it can take a string and sequence and choose what you now handle with shell)
  • redirect as you need to, get the file objects from the Popen object
  • hand along bufsize if you need it
  • You may want to check out differences in whether the call closes open file handles
  • You may want to check the way errors arrive back in python


Older stuff

Historically, there have been a number of system call methods, mostly:


  • os members
    • os.popen()
    • os.system()
    • os.spawn...
  • commands (a convenience wrapper around os.popen)
  • popen2 (2.4, 2.5; deprecated in 2.6)
    • popen2.popen2()
    • popen2.popen3()
    • popen2.popen4()
    • popen2.Popen3 class
    • popen3.Popen4 class