Python parsing stuff
Syntaxish: syntax and language · type stuff · changes and py2/3 · decorators · importing, modules, packages · iterable stuff · concurrency · exceptions, warnings
IO: networking and web · filesystem Data: Numpy, scipy · pandas, dask · struct, buffer, array, bytes, memoryview · Python database notes Image, Visualization: PIL · Matplotlib, pylab · seaborn · bokeh · plotly
Stringy: strings, unicode, encodings · regexp · command line argument parsing · XML speed, memory, debugging, profiling · Python extensions · semi-sorted |
Command line argument parsing
getopt is the simplest form, takes few lines of code, though is not as helpful as...
optparse was historically the more flexible/helpful thing
- (≥py2.3, no development since 2.7 because...)
- is what development moved to (≥py2.7).
Note that these are mostly for command line options that adhere fairly to POSIX recommended argument syntax (short and long styles, getopt is the most basic form),
not necessarily your own creative definitions.
optparse
Example:
from optparse import OptionParser
p = OptionParser() # Has some arguments, but we're ignoring them here
p.add_option("-o", "--output", dest="outputfile", action="store",
help="write output to named file")
p.add_option("-s", "--show", dest="show", action="store_true", default=False,
help="show output image in a window")
options, args = p.parse_args() # defaults to parsing sys.argv[1:]
# note that errors in arguments means this call exits the program.
print 'options: %r'%options
print 'args: %r'%args
Basic notes:
- help generation, basic store-what logic, and basic error handling is done for you.
- ...so you mostly just specify how each argument should be handled
- typically you still want to do some checking of sensible values - e.g. in the example you may want to check whether the filename is valid, make it absolute, check that it doesn't exist
- you can specify a short (e.g. -s) and/or a long form (e.g. --show)
- -h and --help are registered by default
- the things you name in dest will sit in an attribute
- here options.outputfile and options.show
On add_option():
- you can specify a default value for each attribute to take when no storage action is taken.
- the default default value is None
- dest specifies the attribute on the options object that is returned.
- you can have the same dest from multiple options (e.g. for something that can have a handful of values)
- but frankly, in most cases it's more predictable/readable to do this in your own logic afterwards
- type requests conversion to a specific type.
- Built-in: "string", "int", "long", "float", "complex", "choice", (and you can specify your own)
- action can be one of the following: (the first few are probably most common)
- "store": takes the next string that is next on the argument list
- "store_true", store_false: specific cases of store_const for True and False. Useful for toggling things.
- "store_false", store_false: specific cases of store_const for True and False
- "store_const": store a pre-set value (from argument called const). No value is taken from the user arguments
- "append": like store, but if a value was already present, we append instead of overwrite
- "append_const": append, but with a configured value instead of a user value
- "count": count the amount of times something is mentioned. You can use it to handle something like -v, -vv, -vvv, etc. as different levels of verbosity.
- "callback": Call a named function. Mostly useful for hacking your own functionality on top, such as additional checks (e.g. checking whether one option was already set, used after another, reaction to ' -- ' meaning 'no more processing', etc.)
- "help": react to use of this argument by printing help (You'll probably rarely do this yourself, because it's registered under -h and --help by default)
- "version": react to use of this argument by printing a value handed along. You'll rarely do this yourself.
On errors:
- optparse's response to errors consists mostly of printing an error message and exiting. There is no exception to catch, or ability to ignore errors.
- If you want that, you'll have to subclass OptionParser and override its exit() and/or error().
- if you want to raise an error during your own sanitizing
- look at p.error()
- and possibly want to play with p.print_help(), sometimes change p.usage, etc.
- this is a little finicky, and one reason to use argparse, or maybe docopt
See also:
- http://www.gnu.org/prep/standards/standards.html#Command_002dLine-Interfaces
- http://www.faqs.org/docs/artu/ch10s05.html
getopt
Less powerful than optparse, also less code to write.
Separates/extracts options without values, options with values, and things not part of the options (e.g. filenames meant to be passed in) - though it seems not to deal with options after arguments(verify)
See http://docs.python.org/library/getopt.html
Example (mostly from python docs):
Getopt is generally applied to sys.args[1:] (getopt.getopt(sys.args[1:], ...).
For code-example's sake, an array is literally supplied here:
import getopt
#In the real world you would use sys.argv[1:]
example = ['-a', '-b', '-cfoo', '-d', 'bar', 'file1', 'file2']
optlist, args = getopt.getopt(example, 'abc:d:')
Now optlist is [('-a', ''), ('-b', ''), ('-c', 'foo'), ('-d', 'bar')], and args is ['file1', 'file2']
Can also take long options. Example (mostly from python docs):
try:
opts, args = getopt.getopt(sys.argv[1:],
"ho:v",
["help", "output="])
except getopt.GetoptError, err:
print str(err) # will print something like "option -a not recognized"
usage()
sys.exit(2)
# default values, can be overwritten by the actual options
output = None
verbose = False
#iterate over the things we got
for o, a in opts:
if o == "-v":
verbose = True
elif o in ("-h", "--help"):
usage()
sys.exit()
elif o in ("-o", "--output"):
output = a
# ...
else:
raise RuntimeError("unhandled command line option")
argparse
TODO: detail it
click
Allows decorators to pass command line arguments to functions.
https://click.palletsprojects.com/
Byte parsing
Simpler
Very regular patterns may be most easily parsed with struct.
...or numpy if that's where you want it to end up anyway.
construct
construct is a library that takes a protocol description and can create a byte parser, as well as create data according to this format.
Going both ways means it is a declarative language of its own that can do complex things, once you grasp how it works.
See also: