Python usage notes/pty and pexpect notes

From Helpful
Jump to navigation Jump to search
Syntaxish: syntax and language · changes and py2/3 · decorators · importing, modules, packages · iterable stuff · concurrency

IO: networking and web · filesystem

Data: Numpy, scipy · pandas, dask · struct, buffer, array, bytes, memoryview · Python database notes

Image, Visualization: PIL · Matplotlib, pylab · seaborn · bokeh · plotly


Tasky: Concurrency (threads, processes, more) · joblib · pty and pexpect

Stringy: strings, unicode, encodings · regexp · command line argument parsing · XML

date and time


Notebooks

speed, memory, debugging, profiling · Python extensions · semi-sorted

pty

If you need to run a command that is barely interactive on its stdin/stdout, then subprocess may do everything you want.



When a program wants to be interactive, you may want to put a pseudoterminal inbetween that you can interact with, and something like python's pty module it a little easier than doing it bare-bones: this pty module mostly just wraps existing C functions, yet adds some convenience (though is not always friendly or well documented).


Before you write a custom pty-based class around your specific command -- which can make sense in itself -- you may want to check what pexpect can do for you - pexpect wraps pty with some pure-python convenience functions, geared to easier pattern-based interactions. Notes on pexpect are below those on pty.


communication

pty.fork() gives you a file descriptor object that is tied to the child's stdin that you can os.read() on, and stdout which you can os.write() on.

You may want to poll the file descriptor, to avoid blocking reads. See e.g. [1] -- and e.g. read pty.spawn()'s implementation, as it select()s on the fd and only calls the callback when there is something to do.

the functions

  • pty.fork() - creates a child which sees the calling process as its pseudoterminal master
    • returns a pair (child_pid, fd)
      • in the master process, child_pid is the subprocess's PID, and fd is connected to the child's controlling terminal
        • you can use os.read() and os.write() on the fd. That's stdout and stdin; stderr is not handled(verify)
      • in the child process, child_pid is 0, and fd isn't valid
    • You'll probably use one of the os.exec* functions in the child
    • basically consists of pty.openpty() plus os.fork(), and fiddling with the standard streams


  • pty.spawn(argv_string[, master_read[, stdin_read]])
    • returns nothing
    • spawns child process, sets its controlling terminal to be us
    • Not useful for interaction, but nice when you're looking just to fool a program into thinking it's in a pty
    • master_read and stdin_read arguments are functions that take a file descriptor and return (by default) 1KB of data from it. These arguments seem to exist to let you intercept the data and do something more with the data (e.g. this example logs the output to a file)


  • pty.openpty() - creates new pty (like os.openpty, but is slightly more portable than it ()(verify))
    • returns a 2-tuple: the file descriptors for master and slave end(verify)


Examples

#!/usr/bin/python
# I have this in a script named  logbatch
import pty, sys, os

if len(sys.argv)<2:
    print "Need a command to run"
    sys.exit(-1)

logfilename = '%s.log'%os.path.basename(sys.argv[1]) # executable name + '.log'
if os.path.exists(logfilename):
    print "\nRefusing to overwrite existing log %r\n\nRename or remove it and try again\n"%logfilename
    sys.exit(-1)

logfile = open(logfilename, 'wb')

def read(fd):
    data = os.read(fd, 1024)
    logfile.write(data)
    logfile.flush() # probably
    return data

sys.stdout.write('Writing log to %r\n'%logfilename)
sys.stdout.flush()

logfile.write('Log file for command: %r\n---------------------\n'%' '.join(sys.argv[1:]))
pty.spawn(sys.argv[1:], read)

sys.stdout.write('Done with logging %r\n'%cmd)

If I run logbatch ls -l, this script itself runs ls -l and write its stdin+stdout to ls.log

(It was made for automatically generated batch scripts, and to avoid typing something like scriptname_1 >& scriptname_1.log &. And no, script [2] didn't do what I wanted).

See also (pty)

pexpect

pexpect lets you deal somewhat more cleverly with things spawned via pty.spawn.

It also has a module that eases remote interaction via SSH (though there are fancier ways of doing that).


Quick intro

Consider:

import pexpect

# The simplest intraction is a "wait until you see this, then send this" thing:
p = pexpect.spawn('ftp ftp.example.com')
p.expect ('[Nn]ame')
p.sendline ('anonymous')
p.expect ('[Pp]assword')
p.sendline ('noah@example.com')
# etc.
# But when responses are conditional or otherwise flexible, this stops being sensible very quickly - it'd easily just block.


# If you hand in a list, the return value is the index to which of the patterns matched.
# This also makes it easier to deal with 
#   errors
#   timeouts
#   and varying code paths - you could build a rule system, or state machine
p = pexpect.spawn('ftp ftp.example.com')

while True:
  i = p.expect([pexpect.EOF, pexpect.TIMEOUT, '[Nn]ame', '[Pp]assword','[#\$]', 'refused'])
  if   i==0: # the EOF
    print 'program finished'
    sys.exit(0)
  elif i==1: # the TIMEOUT
    print 'program did not respond (fast enough)'
    sys.exit(2)
  elif i==2:  # ...and so on
    print 'sending username'
    p.sendline('foo')
  elif i==3:
    print 'sending password'
    p.sendline('pw12')
  elif i==4:
    print 'looks like a prompt'
    p.sendline('put myfile')
  elif i==5:
    print 'connection refused'
    sys.exit(1)
# not a working example, but you get the idea


Timeout:

  • defaults to 30 seconds.
  • You can change that default
  • you can specify a timeout on each expect() call
  • By default, timeout is raised as an exception.
If you added pexpect.TIMEOUT to the list of patterns, you get the according index instead of an exception.

spawn()

When you have arguments, you can put it in the single command string, or hand them in as a list of strings, so e.g.

pexpect.spawn('ssh -X user@example.com')
pexpect.spawn('ssh',['-X','user@example.com'])

Notes:

  • This is different from the things pty.spawn() can take (!)
(it's also different from how subprocess works)
  • You could use full paths, or (in the first style) use /bin/env, but pexpect tries to imitate the behaviour of which anyway, so ideally will Do What You Want.
  • spawn mostly just forks and execs, so if you want to use shell features in your command, your command should probably be something like 'bash -c "ls > test "' (and more escaping fun)
  • On the environment:
    • The process inherits from the spawning python process - i.e. os.environ.
    • If you hand in spawn(env=thatcopy) this is used instead of os.environ.
    • so
      • The simplest way to alter os.environ
      • The cleanest way is probably to make your own dict (copying things from os.environ as you want) and hand that in.

expect()

# The arguments to expect:
p.expect(pattern, timeout=-1, searchwindowsize=None)


pattern can be:

  • a string type - will be compiled as regular expressions
  • an already-compiled regexp object, used as-is
  • pexpect.EOF - return this, rather than raise an exception on EOF
  • pexpect.TIMEOUT - return this, rather than raise an exception on timeout
  • a list of any of these
    • return value will be the index of the matching pattern. If nothing matches, you will get a timeout as an exception
note that if it was in the list, you get it as an index (what happens if nothing matches and timeout==None?(verify))


A timeout value of -1 means fallback to the class default, which seems to be 30 (seconds). You could raise or lower it, or block indefinitely by using timeout=None.


searchwindowsize: how far back to search within the incoming data buffer. Default is to search the whole thing (since the last match), which can be unnecessarily slow if, say, you want to match the 'done' message after a large bulk of debug output)



Returns:

  • If you didn't hand in a list: 0
  • If you hand in a list: the index into the first applicable match (so if one of your patterns is a longer version of another, you probably want it first)


After expect() returns, you can inspect the contents of the latest bit of input buffer, probably one or more of:

  • p.before - the text before the match
  • p.match - the matched text
  • p.after - the text after the match


Instead of using this general-purpose expect(), you can use some of the underlying functions:

  • expect_exact() - skips regexp stuff, so can be faster when just doing simple substring tests
  • expect_list() - when your list contains only compiled regexps (or EOF or TIMEOUT), you can skip some of the overhead and go straight to the matching

Reading data without expect()

This article/section is a stub — probably a pile of half-sorted notes and is probably a first version, is not well-checked, so may have incorrect bits. (Feel free to ignore, or tell me)

Instead of using expect(), you can do the work yourself, after one of:

  • read()
  • read_nonblocking() - Can do (per-character?) timeout (may raise eof and timeout exceptions)
  • readline() - reads a single line
  • readlines() - reads until EOF, returns lines.

Sending in data

This article/section is a stub — probably a pile of half-sorted notes and is probably a first version, is not well-checked, so may have incorrect bits. (Feel free to ignore, or tell me)

For line based programs, you only need to remember sendline().

The list of sending functions:

  • sendline(s) - sends a string/bytes, plus os.linesep
  • send(s) - send a string/bytes
  • sendeof() - send EOF character
  • sendcontrol(char) - send a control character
  • sendintr() - sends SIGINT
  • write() - like send, but doesn't return value. Here to make the object usable as a generic python file object (see also read, close)
  • writelines(seq) - is basically for s in seq: write(s)


interact()

(Aside from a few potentially finicky details), this just hands over control completely:

Keystrokes are sent to the child, the child's stdout and stderr are read and printed on our own stdout and stderr (and logfile, if set).


You can hand along an escape character to break out of this interactive mode with - by default 0x1D, Ctrl] (a telnet convention, apparently).


It looks like interact() expects the user to break out before the child process closes - see the section below.


Termination of the subprocess

...handling this cleanly takes a little thought.

It seems that in essentially all cases you are expected to close() the connection to the subprocess yourself. (multiple close()s are accepted so you don't have to do a lot of codepath checking)


Waiting for EOF

Many programs will, at some point, want no further input before they eventually decide to exit themselves.


If you know this is true of a specific program, the simplest solution is probably:

  • expect(pexpect.EOF, timeout=None) to wait indefinitely for the child to exit
  • close()


Waiting until it's really done

is also one of the easiest sure ways to know the exit status is readable.
avoids early child termination if the parent python process stops before the child
may also avoid the subprocess being write-blocked on a pipe ((verify)I don't know yet)

Not waiting indefinitely

Not explicitly waiting indefinitely means a timeout applies, probably the default 30 seconds.

This often isn't enough for a "okay moving on now" thing, yet is preferable when you want to report how long the process has been at it, terminate() it when you know it's pointless or when a program would run forver but you want to stop it, etc.


Details you may care about

  • isalive() - tests whether the child process is still there
  • wait() - blocking wait for child to exit. Note that a child is considered alive until all its output is read, so wait() is only useful if you know it won't.
In most cases, expect(pexpect.EOF, timeout=None) is what you'ld want instead of wait() (verify)
  • close() - closes connection with child. If the child process had not closed its streams, this will look like a broken pipe to it (if and when it tries to use them), which often means it doesn't exit cleanly.
if you don't, the handle will stay open. If you run a lot of subprocesses via pexpect, you will run out.
  • kill(signum) - send signal.
can also be done via close's optional force argument


Once a process is done, either self.exitstatus or self.signalstatus will have a value, the other None.

Which of the two it is mostly depends on whether the process was left to terminate by itself (usually with expect(pexpect.EOF)) or was terminated (by your code, or by something else).


termination and interact()

This article/section is a stub — probably a pile of half-sorted notes and is probably a first version, is not well-checked, so may have incorrect bits. (Feel free to ignore, or tell me)

It looks like interact() expects the user to break out before the child process closes.

In most examples, the user seems expected to know this.


If the child terminates while still under interact() control, you'll probably find it doesn't handle the EOF, and you'll get a stack trace ending like:

  File "/usr/lib/python2.4/site-packages/pexpect.py", line 1510, in __interact_read
    return os.read(fd, 1000)
OSError: [Errno 5] Input/output error


You could decide to catch and ignore the OSError - I expect that if it comes from interact(), it's typically this.

logging, echo

Unsorted

On matching end of line

Since expect() matches substrings on possibly-buffered output, matching against $ won't work (will always match, on the end of the smallish window it's looking at)

So you probably want to match [\r\n] instead (verify)