Python notes - syntax and language
Syntaxish: syntax and language · type stuff · changes and py2/3 · decorators · importing, modules, packages · iterable stuff · concurrency · exceptions, warnings
IO: networking and web · filesystem Data: Numpy, scipy · pandas, dask · struct, buffer, array, bytes, memoryview · Python database notes Image, Visualization: PIL · Matplotlib, pylab · seaborn · bokeh · plotly
Stringy: strings, unicode, encodings · regexp · command line argument parsing · XML speed, memory, debugging, profiling · Python extensions · semi-sorted |
On language, setup, environment
(Major) Python implementations
CPython is the C implementation of Python, is the usualy implementation used as a system Python, and is also the reference implemenation of Python.
Jython implements python in Java. Apparently it is only slightly slower than CPython, and it brings in the java standard library to be used from python code, though lose C extensions(verify).
IronPython compiles python to IL, to run it on the .NET VM.
It performs similarly to CPython (some things are slower, a few things faster, even) but like any other .NET language, you get .NET interaction.
You lose the direct use of C extensions (unless you have fun with C++/CLI), though .NET itself often has some other library to the same effect.
Python for .NET is different from IronPython in that it does not produde IL or run on the .NET VM, but is actually a managed C interface to CPython(verify) (which also seems to work on Mono).
While somewhat hairier than IronPython, it means you can continue to use C extensions, as well as interact with .NET libraries; the .NET library can be directly imported, and you can load assemblies.
There is also PyPy [1] [2], which is an implementation of python in python. It seems this was originally for language hacking and such (since it's easier to implement mucking with Python rather than in C), but it seems to now be a good JIT compiler (relying for a good part on RPython, a subset of Python that can be statically compiled) that can give speed improvements similar to the now-aging psyco.
Help / documentation
An pre-code(verify) unassigned string at module, class or function level is interpreted as a docstring (stored in its __doc__ attribute).
Docstrings will show up in documentation that can be automatically generated based on just about anything. For example:
>>> class a:
... "useless class"
... def b():
... "method b does nothing"
... pass
...
>>> help(a)
Help on class a in module __main__:
class a
| useless class
|
| Methods defined here:
|
| b()
| method b does nothing
Help exists on most builtins and system modeules, and also on anything of yours that you've added docstrings to:
>>> help(id)
Help on built-in function id in module __builtin__:
id(...)
id(object) -> integer
Return the identity of an object. This is guaranteed to be unique among
simultaneously existing objects. (Hint: it's the object's memory address.)
...sometimes providing nice overviews. For example, help(re) includes:
compile(pattern, flags=0)
Compile a regular expression pattern, returning a pattern object.
escape(pattern)
Escape all non-alphanumeric characters in pattern.
findall(pattern, string, flags=0)
Return a list of all non-overlapping matches in the string.
If one or more groups are present in the pattern, return a
list of groups; this will be a list of tuples if the pattern
has more than one group.
Empty matches are included in the result.
finditer(pattern, string, flags=0)
Return an iterator over all non-overlapping matches in the
string. For each match, the iterator returns a match object.
help() is useful in interactive interpreters, but you can use the same for automatic documentation generators.
See for example:
- epydoc (HTML, result looks like [5], rather like the Java API docs)
- Docutils (HTML, LaTeX, more?)
- HappyDoc (HTML, XML, SGML, PDF)
- a filter for doxygen (not as clever)
- ROBODoc?
- TwinText?
- Natural Docs?
callable
(Note: this applies to a few languages beyond python)
You'll see this word where you might expect 'function'.
Because you can call a function, method, class (or, technically, type),
More specifically, any instance with a __call__ method.
In many situations where you could pass a function, you can pass any callable, because most of the time all the backing code does is call the object.
Duck typing means you don't really need to care about what it's technically called either.
To test whether something can be called, you could use callable() (a built-in).
If you wish to test for more specific cases (callable class? function? method?), you can use the inspect module (see its help() for more details than some html documentation out there seems to give).
singularity on top of immutability
📃 These are primarily notes, intended to be a collection of useful fragments, that will probably never be complete in any sense. |
Some things in python are singular (same value implies same identity), on top of being immutable, by design.
- For example numbers - most languages do that, because things get very weird otherwise.
Things that are immutable are not necessarily singular
- for example strings
- although there are cases where they seem to act that way, for example in string literals (are there further details?(verify))
- But, for example:
a = 'foo'
b = 'foot'[:3]
c = 'foo'
assert(a is c)
assert(a==c)
assert(a is not b)
assert(a==b)
You could say this messes with the identity abstraction, but is primarily used to make life simpler, and generally does.
Identity is compared with is, which you can see as using the built-in id() function.
For example, you can test against types and None as if they are values, meaning you can use either is or == without having one of them mean something subtly but fatal-buggilly different. In practice this seems better than having to know all the peculiarities of the typing system (if only because we tend to have to know several language's).
Calling superclass methods, super()
Firstly, the standard remark: if you're making inheritance diamonds, this is complexity any way you twist it, so is it necessary rather than happy-go-lucky class modelling?
If you use super(), it should be used consistently, when you know the potential problems and can explain to other people why it won't fail. (Read the two things linked to, or something like it)
Many argue that it's more understandable and less error-prone to handle superclass calls explicitly.
Since superclassing is effectively part of a class's external interface anyway (and so is super, if you use it), you might as well be explicit, rather than have it be hidden by implied semantics.
While more verbose than super(), it's easier to follow, maybe less fragile for later changes, and some mistakes are probably easier to spot by being explicit rather than coming from magical implicit behaviour. (you can argue about the fragility - yes, it will cause errors quickly when you change arguments, but that's arguably preferable over the alternative)
One assumption here is that class inheritance is used for eliminating redundancy in your own codebase, not for flexibility.
However, when writing things like mixins (or abstract classes or interfaces), you may still need to know all about super()
Example of explicit superclass calls:
class Bee(object):
def __init__(self):
print( "<Bee/>" )
class SpecialBee(Bee):
def __init__(self):
print( "<SpecialBee>" )
Bee.__init__(self)
print( "</SpecialBee>" )
class VerySpecialBee(SpecialBee):
def __init__(self):
print( "<VerySpecialBee>" )
SpecialBee.__init__(self)
print( "</VerySpecialBee>" )
VerySpecialBee()
This goes for any method (the constructor isn't really a special case), but it's a common example of why arguments may get in the way of super() being particularly useful.
See also:
'call at python exit'
Use the atexit module.
Avoid assigning a callable to sys.exitfunc yourself, since you may be effectively removing something already set there (you could make it a function that also calls what the function was previously set to, but there are sometimes hairy details to that, like how you deal with exceptions(verify))
Note that there is never a hard guarantee that this code will get run, considering things like segfaults - which Python itself should be pretty safe from, but isn't too hard to create in a C extension.
Builtins
Built-ins are things accessible without specific imports. The following are the 2.4 built-ins, a mix of types and functions, roughly grouped by purpose.
- dir, help
- str, unicode (and their virtual superclass, basestring)
- oct, hex, ord, chr, unichr
- int, long, float, complex,
- abs, round, divmod, min, max, pow
- tuple, list
- len, sum
- filter, reduce, map, apply
- zip
- iter, enumerate
- reversed, sorted
- cmp
- range, xrange ()
- dict, intern
- set, frozenset
- bool
- coerce
- slice (used only for extended slicing - e.g. [10:0:-2])
- buffer (a hackish type convenient to CPython extensions and some IO)
- object
- hash, id
- str and repr -- see __repr__ and __str__
- hasattr, getattr, delattr
- type, isinstance, issubclass ((variations in simple comparison / subclass test)(verify)
- e.g. type() is str is functionally the same isinstance(, str) but isinstance is a little more flexible in that it lets you deal with subclassed cases
- staticmethod, classmethod
- super
- property
- exception
- callable
- locals, globals
- vars
- eval, compile
- execfile
- __import__: the function that the import statement uses
- reload
- file, open
- input, raw_input
Shallow and deep copy
General shallow/deep copies are possible (on top of the basic reference assignment).
The following demonstration uses lists as a container, but this also applies to objects. (This does not summarize real objects and mutable structures like lists, since they themself contain references, so the concept of copying such objects is ambiguous, which is why there is a distinction in shallow and deep copying.)
>>> from copy import copy #shallow copy - but note there are easier ways for lists
>>> from copy import deepcopy
>>> a = [object(),object()] #original list
>>> b = copy(a)
>>> c = deepcopy(a)
>>> a
[<object object at 0xb7d21448>, <object object at 0xb7d21468>]
>>> b
[<object object at 0xb7d21448>, <object object at 0xb7d21468>]
>>> c
[<object object at 0xb7d21450>, <object object at 0xb7d21458>]
The shallow copy, b, is a new list object (id(a)!=id(b)), into which references to the objects the old collection are inserted.
The deep copy, c, is a new list object but also creates copies of the contained objects to insert into that new container.
With objects, or structures that contain objects, what you often mean to do is making a deep copy.
Note that this creation only works when the creation of these objects does not have peculiar side effects or rely on administration data or object references that it wouldn't be used the same way in deep copying.
Such issues limits deep copy in any language. There are usually partial fixes, often in the form of some way to optionally override deep-copy behaviour with your own functionality via an interface. Note that python's deepcopy does avoid circular recursion problems.
String stuff
String formatting
% with a tuple
Those coming from C will probably appreciate the % operator, which
- acts like sprintf() and
- mostly matches the classical C format strings
- ...omits p (there are no pointers), adds r for repr()
Example:
"%d %5.1f"%( 1,2 ) == '1 2.0'
% with a mapping
If you pass it a mapping (dict or similar) you can access them by name rather than position:
"%(s)s %(foo)07o %(bar)5.1f"%{ 's':'yay', 'foo':1, 'bar':2 } == 'yay 0000001 2.0'
format()
format() seems to understand...
- positional and name arguments
- very similar (but not identical) conversion specifiers (effectively defaults to s)
but does everything else in a more flexible style.
There is a decent introduction in https://pyformat.info/
Some examples:
# {} enumerates by position by default, so e.g.
'{} {}'.format( 4,8 ) == '4 8'
# You can explicitly index
'{1} {0}'.format(4,8) == '8 4'
# You can use named indexes
'{foo} {bar}'.format(foo=1,bar=2) == '1 2'
# Similarly, you can use with dicts like
data = {'foo':1, 'bar':2}
'{foo} {bar}'.format( **data ) == '1 2'
# Also consider the ability to do:
'{data[foo]} {data[bar]}'.format( data={'foo':1, 'bar':2} ) == '1 2'
# alignment and pad like:
'|{:<10.1f}|{:^10.1f}|{:_>10.1f}|'.format( 3.14, 3.14, 3.14) == '|3.1 | 3.1 |_______3.1|'
# It understands strftime style datetime formatting
'{:%Y-%m-%d %H:%M}'.format(datetime(2001, 2, 3, 4, 5)) == '2001-02-03 04:05'
# You can pass in parameters into the formatting,
# by nesting style: (this would take multiple steps be nasty and confusing to do with %)
'{:^{width}.{prec}}'.format( 3.14159265, width=10, prec=3) == ' 3.14 '
Note that due to format() being a function, you can effectively make formatting functions, like:
tab_cols = '{count}\t{url}'.format
...which lets you later do
tab_cols(url="http://example.com", count=2)
f-string formatting
PEP 498 (implemented since py3.6) adds f-string formatting.
Introduction-by-example: where previously you might do
'{}'.format( time.time() )
you can now do
f'{time.time()}'
One difference to format() is that it additionally lets you put runtime eval (basically from current scope) for the names instead,
and you can basically put all your code inside the string (but may not want to).
This can be less less typing and can make it clearer what is getting placed where in a string.
This follows format(), not sprintf, so if you hadn't learned that before, now's a better reason to.
(...and now is the time to learn that format() doesn't coerce nearly as much -- in particular, you can't shove everything into the quite-forgiving s formatter anymore. In debugging, you may care to use {str(foo)} or {repr(foo)}
Upsides:
- can sometimes be a bunch more readable
- and shorter
- moving things inside can make it what is getting placed where, and many-item things are easily less likely to be incorrect
- compare
- '%s %s (%s) %s %s %r'%(a, lot, of, items, between, lines)
- f'{a} {lot} ({of}) {items} {between} {repr(lines)}'
- ...particularly when you add or remove entries
Arguables/downsides:
- can sometimes be a bunch less readable
- and less structured
- the more code you move inside strings, the more you
- lose syntax highlighting
- lose editor autocompletion
- lose a linter being able to signal errors
- lose useful stack traces
- it's mostly like, but not actually equivalent to .format()
- format couldn't push code inside, just do selective code-like things(verify)
- if you hadn't already shifted to format(), and come from percent formatting, you have a new syntax to learn
- and a whole new set of dunders and their behaviours and idiosyncracies. No peeking - do you know offhand what the underlying dunder and PEP definition is that means you cannot do f'{datetime.datetime.now():<20s}' while f'{datetime.datetime.now()}' is fine (the actual reason is implied by the following)
- It's not clear when it's better to do formatting ahead of time or not. Say, one would prefer
f'Updated {date.today().strftime("%d-%m-%Y")}'
- another would suggest it is probably cleaner to do
today_s = date.today().strftime("%d-%m-%Y") f'Updated {today_s}'
- another would point out
f'Updated {date.today():%d-%m-%Y}'
Some people consider them to deprecate percent-formatting and format(), others think that's not great.
__repr__ and __str__, and __format__
__repr__ and __str__ support repr() and str().
e.g. str(x) is basically equivalent to x.__str__()
{{comment|You should generally treat both str and repr (and their dunders) as useful for debugging only, and not for active functionality that other things rely on.
There is nothing stopping you from making either (or even both) actively useful and functional while also not necessarily breaking from their intended purpose), but making functional assumptions on them is likely to be an area for exciting bugs.
__repr__ intends to be unambiguous, __str__ intends to be readable
- which can sometimes be the same, practically
- ...and code-wise, in fact str() seems fall back to use __repr__ if __str__ is not there(verify)
- repr should let you distinguish between distinct objects even if they are equivalent
- for custom-defined classes this is often <class name at memory location> because that is the inherited implementation from object.__repr__
- (even if classes override this, you can get that behaviour back calling that directly)
- for anything with singleton behaviour, just the value should be enough -- and python does that for e.g. str, int
- for custom-defined classes this is often <class name at memory location> because that is the inherited implementation from object.__repr__
- python docs suggest that if possible, you should make __repr__ look like a Python expression that creates an object with the same value.
- look like (probably because it's more useful to debugging than the default?), but it need not actually be possible to evaluate.
- Nor is it required that eval(repr(foo)) == foo - that would arguably be dangerous to require and do, and which for a lot of objects is impossible anyway.
- e.g. for uuid objects
- its __str__() gives something like "2d7fc7f0-7706-11e9-94ae-0242ac110002" and
- its __repr__() gives something like "UUID('2d7fc7f0-7706-11e9-94ae-0242ac110002')" - which in this case you can instantiate like that (given that UUID is bound in the relevant scope)
Because people really like f-strings and format(),
you will find that if you expect
f'{thing}'
to do the same thing as one of
'%s'%(thing)
then you will sometimes be really confused.
Quick, what does
f'{datetime.date.today():20s}'
display?
...correct, it displays '20s'. Yeah, WTF.
Turns out sometimes there are cases you specifically want to force str() behaviour. (Also, repr() can be simpler than the format string way).
Functional-like things
Note on closures
See Closures for the basic concept. They were apparently introduced in python around 2.2.
Note that python closures are read-only (immutable), which means that trying to assign to a closured variable will actually mean creating a local variable with the same name. This works on function level, so even:
def f():
x=3
def g():
y=x #...with the idea that this would create a local y
x=y # and this a local x...
print( x )
g()
f()
...won't work; function variables are declared at function compilation time, so it means "declare x in g's scope", but then you try to use it before you assign something to it.
Lambda, map, filter, reduce
Lambda expressions are functions that take the form lambda args:expression, for example lambda x: 2*x. They must be single expressions and therefore cannot contain newlines and therefore no complex code (unless they call functions).
They can be useful in combination with e.g. map(), filter() and such. For example:
Map gives a new list that that comes from applying a function to every element of an input list/iterable. For example:
>>> list(map( lambda x:4*x, ['a',3] ))
['aaaa', 12]
Filter creates a new list whose elements for which the function returns true. For example:
>>> list(filter( lambda x: x%7==0 and x%2==1, range(100) )) #odd multiples of 7 under 100
[7, 21, 35, 49, 63, 77, 91]
>>> list(filter(lambda x: len(x[1])>0, [[1,'one'],[2,''],[3,'three']] ))
[[1, 'one'], [3, 'three']]
Reduce does a nested bracket operation (e.g. ((((1+2)+3)+4)+5) when using the + function) on a list to reduce it to one value. For example:
>>> reduce(max, [1,2,3,4,5]) # note that for the specific case of max, max([1,2,3,4,5]) is shorter
5
>>> reduce( lambda x,y: str(x)+str(y), range(12) ) # (a slow way of constructing this string, actually)
'01234567891011'
Generators: yield and next
Generators are functions that keep their state (basically coroutines), and yield things one at a time, usually generated/processed on-the-fly.
For example, if you have
def twopowers():
n = 1
while True:
yield n
n *= 2
Then you can at any time e.g. ask for the next ten, evaluated exactly when you ask for them, and there's not really a point this ends
>>> list( next(t) for _ in range(10) )
[1, 2, 4, 8, 16, 32, 64, 128, 256, 512]
>>> list( next(t) for _ in range(10) )
[1024, 2048, 4096, 8192, 16384, 32768, 65536, 131072, 262144, 524288]
>>> list( next(t) for _ in range(10) )
[1048576, 2097152, 4194304, 8388608, 16777216, 33554432, 67108864, 134217728, 268435456, 536870912]
In terms of language theory, the construct this uses is known as a continuation.
It makes streaming data and other forms of lazy evaluation easy,
and encourages a more functional programming style and a more data-streaming one.
Generator expressions are a syntax that look like list comprehensions, but create and return a generator.
gen = (x for x in range(101))
next(gen) # 1
next(gen) # 2
next(gen) # 3
...and so on.
Many are finite, but they don't have to be.
This is often done for lazy evaluation, and they can be a memory/CPU tradeoff, in that at the cost of a little more overhead you never have to store a full list.
Consider, for example (use range() in py3):
max( [x for x in xrange(10000000)] ) # Memory use spikes by ~150MB
max( list(x for x in xrange(10000000)) ) # (...because it is a shorthand for this)
max( (x for x in xrange(10000000)) ) # Uses no perceptible memory (inner expression is a generator)
max( x for x in xrange(10000000) ) # Uses no perceptible memory (inner expression is a generator)
Notes:
- Sometimes this makes for more brevity/readability (though I've seen a bunch of syntax-fu that isn't necessarily either).
- the last illustrates that in various places where the brackets aren't necessary (omitting them is unambiguous)
- In python2, you often wanted xrange, a generator-based version, where range() returned a list.
- In python3 range() is a generator-like object so the distinction no longer exists.
- Don't use the profiler to evaluate xrange vs. range; it adds overhead to each function call, of which there will be ten million with the generator and only a few when using [] / list(). This function call is cheap in regular use, but not in a profiler.
- when iterating over data, enumerate is often clearer (and less typing) than range
Iterables
In general, iterators allow walking through iterables with minimal state, usually implemented by an index into a list or a hashmap's keys, or possibly a pointer if implemented in a lower-level language.
In python, most collection types are iterable on request. Iterating dicts implies their keys. I imagine this is based on __iter__.
Notes:
- Iterations won't take it kindly when you change the data you are iterating, so something like:
a={1:1, 2:2, 3:3}
for e in a:
a.pop(e)
...won't work. The usual solution is to build a list of things to delete and doing that after we're done iterating.
Syntax things
lists
More things to do with lists...
range, xrange
>>> range(4)
[0,1,2,3]
>>> range(2,4)
[2,3]
>>> range(4,2,-1)
[4,3]
in py2, range()eturns a list and xrange is the generator-like equivalent you would prefer when giving it very large values. In py3, range() is generator based.
slicing
Slicing retrieves elements from lists very similar to how range would generate their indices.
>>> a=['zero','one','two','three','four','five']
>>> a[4:]
['four','five']
>>> a[:4]
['zero','one','two','three']
>>> a[3:5]
['three','four']
>>> a[4:0:-2] # I rarely find uses for this, but it exists
['four', 'two']
collections
defaultdict
defaultdict mostly acts like a dict, except that on access (set or get) to keys that do not exist, it creates an entry.
You specify the type (and implicitly the value) that should be created.
For example, when counting words, where you previously might have written something like:
wc = {}
for word in some_words:
if word not in wc:
wc[word] = 1
else:
wc[word] += 1
OR perhaps come clever syntax-fu to shorten that, like:
wc = {}
for word in some_words:
wc[word] = wc.get(word, 0) + 1 # get() allows a fallback, which defaults to None but here we force to 0 instead
...with defaultdict, you can do:
wc = defaultdict(int)
for word in some_words:
wc[word] += 1
(...though for this particular example, note also the existence of collections.Counter)
"Wait, wasn't that already covered by dict.setdefault()?"
For reference, setdefault[6] means something like "If the key exists, do nothing. If this key does not exist, set with this value. Either way, return the value", so you can write the above example like:
wc = {}
for word in some_words:
wc.setdefault(word, 0)
wc[word] += 1
This definitely covers similar ground, yes, and either one can be more elegant in a specific solution.
defaultdict and other types
What you hand into defaultdict is instantiated when it's not in there yet. That counting example works in part because int() happens to be 0.
Since you can also hand in things like list and dict, you can do things like:
# for example, assuming you have a functional guess_subnet_for()
# that e.g. outputs '192.168.1.0/24' for 192.168.1.1, 192.168.1.2, etc.
subnet_lister = defaultdict(list)
for ip in some_ips:
subnet_lister[ guess_subnet_for(ip) ].append( ip )
You can even go on to do nested structures (nested defaultdicts), though that can quickly get more obfuscating.
Counter
you can use defaultdict to count, but you can also get a slightly more capable counter (note: subclass of dict):
Takes any iterable, counts what comes out of that. Taking characters out of a string may not be the most useful example, but it is a short one:
from collections import Counter
# Counter('communication')
Counter({'c': 2, 'o': 2, 'm': 2, 'n': 2, 'i': 2, 'u': 1, 'a': 1, 't': 1})
# c = Counter('communication')
# c.update('communication')
# del c['u']
# c['c'] = 10
# c
Counter({'c': 10, 'o': 2, 'm': 2, 'n': 2, 'i': 2, 'a': 1, 't': 1})
# c.most_common(3)
[('c', 10), ('o', 2), ('m', 2)]
OrderedDict
Dictionaries
Dictionary views
You probably noticed that in python 3, keys(), values(), and items() (on dict and similar) not return all the data, or even a a generator as such, but a new thing.
These are dictionary view objects, which
- seem to implement __len__, which e.g. a generator would not.
- seem to implement __contains__, for in
- can be iterated over
- can be repeatedly iterated over(verify) (unlike a generator)
- it seems to reflect the data as it was when it started each new iteration(verify)
- will reflect changes of the underlying data (but doing so while still interating over them has undefined behaviour - may work perfectly fine, may not)
for-else
You can write:
for i in foo:
if bar(i):
break
else:
baz()
That else block with baz() will execute only if the for loop terminates normally -- i.e. not by the break.
(This may not be entirely obvious, though -- you would be excused thinking it would execute if there were zero things to iterate on. Which it will, but not because of that.)
While this is an extra code path to think about, and extra syntax to know about, that extra code path is sometimes a clean and short way to write something.
...yet a surprising amount of examples I've seen are cases where it is unnecessary, and as a use of a lesser-known syntax, the potential confusion seems to weigh harder.
Note that it reads somewhat similarly to exceptions in the sense that it lets you write "handle if something when iffy" code at the bottom.
breaking out of multiple layers of for
One way is to set a flag, and test for that flag at multiple layers:
for doc in docs:
for p in paragraph(doc):
for noun in p:
problem = True
break
if problem:
break
if problem:
break
Works, not the most obvious, though sometimes gives you better control of when exactly to stop (e.g. cleanup that is "okay forget this item then").
Another way would be other things that add code paths, such as
- an exception
- an (inner) function that returns (for few lines of code this is overkill, sometimes it ends up being cleaner anyway)
Another way is the above-mentioned for-else:
for doc in docs:
for p in paragraph(doc):
if problem:
break
else:
break
...although with more nesting you might end up doing a probably-less-readable:
for doc in docs:
for p in paragraph(doc):
for noun in p:
if problem:
break
else: # without this, the middle for would "swallow" that inner break and the outer else would never see it
break
else:
break
walrus operator
Fixed arguments, keyword arguments, and anonymous keyword arguments
All arguments have names, bound within the function scope.
Positional, keyword, or both
Arguments can be passed in two basic ways:
- via positional arguments
- via keyword arguments
Notes:
- you can do everything by name
- you can do everything by sequence
- you can mix the two
- you can mixthe styles when declaring,and when calling
- a non-keyword argument after a keyword argument is SyntaxError - in both function definition and in calls.
- it's common enough to define a function with a few required arguments up front (without defaults) so that they can be used positionally (and a call will fail if you don't use them), followed by a bunch of keyword (with-default) arguments so that use of each is optional.
- passing in two values to the same argument is considered a TypeError
- every parameter with a default is optional, every parameter without a default value is required
- not passing required arguments is considered a TypeError
For an example of the mix:
def within(a,b,within=0.25): return max(0.0, 1-abs(a-b)/within ) # can be called like within(a=1, b=1.1) within(1, 1.1) within(a=1, 1.1) within(1, 1.1, 0.5)
People somewhat prefer keyword arguments, as it lets that you can change the set of arguments without running into calls suddently having a different meaning
- ...except that they cannot be renamed
This e.g. can be useful to make the actual call to a big initializer function forwards and backwards compatible between vesions: The call will not fail when you use keywords that don't exist in a vertsion, or have since been removed keywords.
...whether that function will then actually do what you want is another matter. If you write your API like this, you may wish to spent more time thinking about ignoring, warning, throwing errors.
args, kwargs
At a lower level, this is arranged through a sequence and a mapping.
In general you don't have to think about that at all.
...but in some cases it makes sense to, instead of accepting a bulk of parameters, to accept them as a sequence and mapping.
In particular when you may wish to manipulate them, which can make sense e.g.
- when you are mostly passing through a bulk of arguments, or
- when you want to use super(),
...which lets you have signature differences / changes over time without as much bother.
Basically, anything on a call that is not matched to existing positional or keyword arguments (not shown above) will go into args/kwargs. Example:
def one(thing, *args, **kwargs):
print( f'thing: {repr(thing)}' )
print( f'args: {repr(args)}' )
print( f'kwargs: {repr(kwargs)}' )
>>> one(1,2,3,four=4, q='five')
thing: 1
args: (2, 3)
kwargs: {'four': 4, 'q': 'five'}
You can also call using a sequence and dict as args and kwargs.
For example:
def spoon(foo, bar='t', quu=None):
print( foo,bar,quu )
my_args=(1,)
my_kwargs={'quu':3}
spoon(*my_args,**my_kwargs) # 1 t 3
my_args=(1,2,3)
my_kwargs={}
spoon(*my_args,**my_kwargs) # 1 2 3
Actually, the definition of a call is a little more complex. You can also e.g. do:
- spoon(1,*args,**kwargs)
- spoon(1,b=2,*args,**kwargs)
In general there is no reason to do the above,
but it's great to pass through a bulk of arguments when you accepted them that way.
Around which you may have some questions.
- Say that the reason you are creating a wrapping function is that you want to change a parameter:
def f(a,b,c=3,d=4):
print(a,b,c,d)
def wrap_f(*args, **kwargs):
f(c=9, *args, **kwargs)
f(1,2) #1 2 3 4
wrap_f(1,2) #1 2 9 4
Fine. Now say we do:
def f(a,b,c=3,d=4):
print(a,b,c,d)
def wrap_f(*args, **kwargs):
f(c=11, *args, **kwargs)
One problem is that if we hand c into wrap_f (e.g. wrap_f(1,2,c=13), that makes the wrap_f raise a TypeError because it will be handing in two values for c.
We can avoid that (and making it clearer which value gets precedence) by defining it like:
def wrap_f(*args, **kwargs):
kwargs['c'] = 11
f(*args, **kwargs)
...which sometimes ends up looking something like:
def wrap_f(one_override=2, *args, **kwargs):
kwargs['one_override'] = one_override
f(*args, **kwargs)
...and yes, you can confuse yourself more with args/kwargs mixes.
See also:
- http://docs.python.org/reference/expressions.html#calls
- http://www.python.org/dev/peps/pep-3102/
- http://docs.python.org/tutorial/controlflow.html#more-on-defining-functions
Member-absence robustness
More than once I've wanted to check whether an object has a particular member -- without accessing it directly, since that would throw a TypeError if it wasn't there.
One way is to use the built-in function hasattr(object,name) (which is just a getattr that catches exceptions).
Another is 'membername' in dir(obj) (does this have edge cases?(verify)).
If you want to get the value and fall back on a default if the member wasn't there, you can use the built-in getattr(object,name[,default]), which is allows you to specify a default to return instead of throwing a TypeError.
OO
Accessors (property)
What are known as accessors in other languages can be done in python too, by overriding attributes. This is largely syntactic sugar for creating the functions.
The specific property() function a function that serves approximately the same purpose as, say, C#'s attribute syntax.
Example:
class NoNegative(object):
def __init__(self):
self.internal=1
def get_x(self):
print "Get function"
return self.internal
def set_x(self,val):
print "Set function"
self.internal=max(0,val) #make sure the value is never negative
x = property( get_x,set_x )
# Testing in the interactive shell:
>>> nn = NoNegative()
>>> print nn.x
Get function
1
>>> nn.x=-4
Set function
>>> print nn.x
Get function
0
Notes:
- The signature is property(fget, fset, fdel, doc), all of which are None by default and assignable by keyword.
- You should use new-style objects (the class has to inherit from object); without the inheritance you make a more minimalistic class in which the above would use members instead(verify).
- these won't show up in dir()s - they're not object members (which allows you to create shadow members and do other funky things)
- Note that this indirection makes this slower than real attributes
self
Python's syntax doesn't hide the fact that class functions pass/expect a reference to the object they are working on. This takes a little getting used to if you come from a language where the syntax simplifies/hides that.
Class functions must be declared with 'self', and must be called with an instance. This makes it clear to both python and to you whether you're calling a class function or not.
Consider:
def f():
print 'non-class'
class c(object):
def __init__(self):
f() # refers to the function above. prints 'non-class'
self.f() #refers to the function. Prints 'class', and the object
#note that self-calls imply adding self as the first parameter.
def f(self):
print 'class; '+str(self)
def g(): #uncallable - see note below
print 'non-class function in class'
o=c()
c.f(o) #c(f) would fail; no object to work on
You can mess with this, but it's rarely worth the trouble.
Technically, you can add non-class functions to a class, for example g() above. However, you can't call it. self.g() and o.g() fails because python adds self as the first argument, which always fails because g() takes no argument. It does this based on metadata on the function itself -- it remembers that it's an class method, so even h=o.g; h() won't work.
You can stick non-class functions onto the object after the fact, but if you do this in a class-naive way, python considers these as non-class functions and will not add self automatically. You can call these, but you have to remember they're called differently.
Details to objects, classes,and types
New-style vs. classic/old-style classes
New-style classes ware introduced in py2.2, partly to make typing make more intuitive sense by making classes types.
Old-style classes stayed the default (interpretation for classes defined like class Name: and not class Name(object):) up to(verify) and excluding python 3. Py3k removed old-style classes completely and made what was previously known as new-style behaviour the only behaviour.
When the distinction is there,new-style classes are subclasses of the object class, indirectly by subclassing another new-style object or built-in type, or directly by subclassing object:
class Name(object):
pass #things.
Differences:
- old-style objects will report being of type() instance, new-style objects will report their class name (__class__)
- ...actually, just see http://www.google.com/search?q=python+new+style+classes
Initialization, finalization
See also:
- http://www.python.org/download/releases/2.2/descrintro/#__new__
- http://docs.python.org/reference/datamodel.html#customization
Metaclasses
Consider that classes are objects that are templates for instantiations.
Metaclasses are templates to instantiate classes.
Usually, they are used as a fancy sort of class factories.
Generally, they will cause a headache. Don't use them unless you know you must, or really, really want to.
See also
- http://www.python.org/doc/2.5.2/ref/node33.html
- http://www.python.org/doc/2.6.2/reference/datamodel.html
%
% is an operator that imitates C's (s)printf. For example:
"%3d.%20s"%( 1,'quu' ) == ' 1. quu'
It expects a tuple (not a list). There is an extra feature in that you can refer to data in a dict:
d={'id':3,'name':'foo'}
"%(id)3d. %(name)s"%d
(Note: if something is not present, this will cause a KeyError)
The dynamic typing means things are slightly more flexible. You can e.g. do "%s"%(3)
__underscore__ things
Note: the __rsomething__ variants for operators were ommited from this list for readability. See the notes on them below.
Metadata:
- __doc__: docstring
- __class__: class/type
- __name__: name (of the module, class, or function/method)
- __module__: module of definition (for classes, functions/methods)
Helper functions for internal handling, some one specific types, and sometimes conveniently accessible variations on built-in functions:
- __hash__, __cmp__ (see also __eq__; these three interact in objects when you want them to be usable as dictionary keys [7])
- __len__
- __init__: constructor
- __del__: destructor
- __new__ allows you to create a bew object as a subtype of another (mostly useful to allow subclasses of immutable types) (new-style classes only)
- __repr__: supports repr().
- __reduce__
- __reduce_ex__
- __coerce__ (not always the same as the built-in coerce()(verify))
Helpers for syntax and operator handling (various are overloadable) (see also the operator module):
- __contains__: supports in
- __call__: for ()
- __getattr__, __getattribute__, __setattr__, __delattr__: for member access (using .) (See notes below)
- __getitem__, __setitem__, __delitem__: for [key]-based access (and preferably also slices; use of __getslice__, __setslice__, __delslice__ is deprecated). See also things like [8]
- __int__, __long__, __float__, __complex__: supports typecasts to these
- __hex__, __oct__: supporting hex() and oct()
- __abs__: supports abs()
- __and__: for &
- __or__: for |
- __lt__: for <
- __le__: for <=
- __gt__: for >
- __ge__: for >=
- __eq__: for ==
- __ne__: for !=
- __add__: for +
- __pos__: for unary +
- __sub__: for binary -
- __neg__: for unary -
- __mul__: for *
- __div__: for /
- __truediv__: for / when __future__.division is in effect
- __floordiv__: for //
- __mod__: for %
- __divmod__: returns a tuple, (the result of __floordiv__ and that of __mod__)
- __pow__: for **
- __lshift__: for <<
- __rshift__: for >>
- __xor__,: for ^
- __invert__: for ~
Special-purpose:
- __all__: a list of names that should be considered public(verify) (useful e.g. to avoid exposing indirect imports)
- __nonzero__: used in truth value testing
__i*__ variants
All the in-place variations (things like +=) are represented too, by __isomething__. For example: __iadd__ (+=), __ipow__ (**=), __irshift__(<<=), __ixor__ (^=), __ior__ (|=) and so on.
Why += isn't the same as + except when it is
Consider::
a = [1,2,3]
c = a
a += [4]
a = a + [5]
print(a)
print(c)
What does it output? Well, list + list like a + [5] creates a new list.
And we learned
a += b
is equivalent to
a = a + b
so it does exactly the same thing, right?
So we created a new list twice and it would output [1,2,3,4,5] [1,2,3] right?
No. It's:
[1, 2, 3, 4, 5] [1, 2, 3, 4]
What gives?
So.
- + is short for __add__
- += is short for __iadd__
And, here's the crux:
- += check for the presene of __iadd__ (in-place add). If it's there, we use that. If not, we fall back to evaluating with __add__ and assigning the result.
So the meaning of += is dynamic:
- if the left side is mutable, like list, the two are not the same.
- if the left side is immutable, like with str or int, both are the same: evaluate-new-value-and-assign
(if you replace the lists with string "123" and "4" and "5" you do get 12345 and 123)
This also means
- you really shouldn't implement iadd on immutable objects
My take-away is that, like in the C days, operator overloading is nasty because the semantics are hidden. Avoid it where possible.
__r*__ variants
the __rsomething__ are variations with swapped operands. Consider x-y. This would normally be evaluated as
x.__sub__(y)
If that operion isn't supported, python looks at at whether y has a __rsub__, and looks whether it can instead evaluate as:
y.__rsub__(x)
The obvious implementation of both makes their evaluation equivalent. This allows you to define manual types which can be used on both sides of each binary operator and do so non-redundantly, and with a self-contained definition.
(Only for binary operator use: doesn't apply to the unary - or +, or the ternary **)
__getattr__ and __getattribute__
For old-style classes, if normal member access doesn't find anything, the __getattr__ is called instead.
In new-style classes, __getattribute__(self, name) is used for all attribute access. __getattr__ will only be called unless __getattribute__ raises an AttributeError (and, obviously, __getattr__ is defined)
Since __getattribute__it is used unconditionally, it is possible to create infinite loops when you access members on self in the self.name style. This is avoided by explicitly using the base class' __getattribute__ for that.
See also: