Python notes - syntax and language

From Helpful
Jump to navigation Jump to search
Syntaxish: syntax and language · changes and py2/3 · decorators · importing, modules, packages · iterable stuff · concurrency

IO: networking and web · filesystem

Data: Numpy, scipy · pandas, dask · struct, buffer, array, bytes, memoryview · Python database notes

Image, Visualization: PIL · Matplotlib, pylab · seaborn · bokeh · plotly


Tasky: Concurrency (threads, processes, more) · joblib · pty and pexpect

Stringy: strings, unicode, encodings · regexp · command line argument parsing · XML

date and time


Notebooks

speed, memory, debugging, profiling · Python extensions · semi-sorted

On language, setup, environment

(Major) Python implementations

CPython is the C implementation of Python, is the usualy implementation used as a system Python, and is also the reference implemenation of Python.


Jython implements python in Java. Apparently it is only slightly slower than CPython, and it brings in the java standard library to be used from python code, though lose C extensions(verify).


IronPython compiles python to IL, to run it on the .NET VM. It performs similarly to CPython (some things are slower, a few things faster, even) but like any other .NET language, you get .NET interaction.

You lose the direct use of C extensions (unless you have fun with C++/CLI), though .NET itself often has some other library to the same effect.


Python for .NET is different from IronPython in that it does not produde IL or run on the .NET VM, but is actually a managed C interface to CPython(verify) (which also seems to work on Mono).

While somewhat hairier than IronPython, it means you can continue to use C extensions, as well as interact with .NET libraries; the .NET library can be directly imported, and you can load assemblies.


There is also PyPy [1] [2], which is an implementation of python in python. It seems this was originally for language hacking and such (since it's easier to implement mucking with Python rather than in C), but it seems to now be a good JIT compiler (relying for a good part on RPython, a subset of Python that can be statically compiled) that can give speed improvements similar to the now-aging psyco.

Help / documentation

An pre-code(verify) unassigned string at module, class or function level is interpreted as a docstring (stored in its __doc__ attribute).

Docstrings will show up in documentation that can be automatically generated based on just about anything. For example:

>>> class a:
...   "useless class"
...   def b():
...     "method b does nothing"
...     pass
...  
>>> help(a)
Help on class a in module __main__:

class a
 |  useless class
 |
 |  Methods defined here:
 |
 |  b()
 |      method b does nothing


Help exists on most builtins and system modeules, and also on anything of yours that you've added docstrings to:

>>> help(id)
Help on built-in function id in module __builtin__:

id(...)
    id(object) -> integer

    Return the identity of an object.  This is guaranteed to be unique among
    simultaneously existing objects.  (Hint: it's the object's memory address.)

...sometimes providing nice overviews. For example, help(re) includes:

compile(pattern, flags=0)
    Compile a regular expression pattern, returning a pattern object.

escape(pattern)
    Escape all non-alphanumeric characters in pattern.

findall(pattern, string, flags=0)
    Return a list of all non-overlapping matches in the string.

    If one or more groups are present in the pattern, return a
    list of groups; this will be a list of tuples if the pattern
    has more than one group.

    Empty matches are included in the result.

finditer(pattern, string, flags=0)
    Return an iterator over all non-overlapping matches in the
    string.  For each match, the iterator returns a match object.


help() is useful in interactive interpreters, but you can use the same for automatic documentation generators.

See for example:

  • epydoc (HTML, result looks like [5], rather like the Java API docs)
  • Docutils (HTML, LaTeX, more?)
  • HappyDoc (HTML, XML, SGML, PDF)
  • a filter for doxygen (not as clever)
  • ROBODoc?
  • TwinText?
  • Natural Docs?


callable

(Note: this applies to a few languages beyond python)


You'll see this word where you might expect 'function'.

Because you can call a function, method, class (or, technically, type),

More specifically, any instance with a __call__ method.


In many situations where you could pass a function, you can pass any callable, because most of the time all the backing code does is call the object. Duck typing means you don't really need to care about what it's technically called either.


To test whether something can be called, you could use callable() (a built-in).

If you wish to test for more specific cases (callable class? function? method?), you can use the inspect module (see its help() for more details than some html documentation out there seems to give).

singularity on top of immutability

Identity is compared with is which uses the built-in id() function.


Some things in python are singular (on top of being immutable), by design. You could say this messes with the identity abstraction, but is primarily used to make life simpler, and generally does.

For example, you can test against types and None as if they are values, meaning you can use either is or == without having one of them mean something subtly but fatal-buggilly different. In practice this seems better than having to know all the peculiarities of the typing system (if only because we tend to have to know several language's).


Numbers are immutable and singular.


Strings are immutable but not singular -- although there are cases where they seem to act that way, for example in string literals (are there further details?(verify)). For example:

a = 'foo'
b = 'foot'[:3]
c = 'foo'

assert(a is c)
assert(a==c)

assert(a is not b)
assert(a==b)

Calling superclass methods, super()

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

Firstly, the standard remark: if you're making inheritance diamonds, this is complexity any way you twist it, so is it necessary rather than happy-go-lucky class modelling?


If you use super(), it should be used consistently, when you know the potential problems and can explain to other people why it won't fail. (Read the two things linked to, or something like it)


Many argue that it's more understandable and less error-prone to handle superclass calls explicitly. Since superclassing is effectively part of a class's external interface anyway (and so is super, if you use it), you might as well be explicit, rather than have it be hidden by implied semantics.

While more verbose than super(), it's easier to follow, less magical, but less fragile for later changes, and mistakes are probably easier to spot by now not coming from implicit behaviour. (you can argue about the fragility - yes, it will cause errors quickly when you change arguments, but that's arguably preferable over the alternative)


One assumption here is that class inheritance is used for eliminating redundancy in your own codebase, not for flexibility.

However, when writing things like mixins (or abstract classes or interfaces), you may still need to know all about super()


Example of explicit superclass calls:

class Bee(object):
    def __init__(self):
        print( "<Bee/>" )


class SpecialBee(Bee):
    def __init__(self):
         print( "<SpecialBee>" )
         Bee.__init__(self)
         print( "</SpecialBee>" )


class VerySpecialBee(SpecialBee):
    def __init__(self):
         print( "<VerySpecialBee>" )
         SpecialBee.__init__(self)
         print( "</VerySpecialBee>" )

VerySpecialBee()

This goes for any method (the constructor isn't really a special case), but it's a common example of why arguments may get in the way of super() being particularly useful.


See also:

'call at python exit'

Use the atexit module.

Avoid assigning a callable to sys.exitfunc yourself, since you may be effectively removing something already set there (you could make it a function that also calls what the function was previously set to, but there are sometimes hairy details to that, like how you deal with exceptions(verify))


Note that there is never a hard guarantee that this code will get run, considering things like segfaults - which Python itself should be pretty safe from, but isn't too hard to create in a C extension.

Builtins

Built-ins are things accessible without specific imports. The following are the 2.4 built-ins, a mix of types and functions, roughly grouped by purpose.



  • dir, help


  • str, unicode (and their virtual superclass, basestring)
  • oct, hex, ord, chr, unichr
  • int, long, float, complex,
  • abs, round, divmod, min, max, pow


  • tuple, list
  • len, sum
  • filter, reduce, map, apply
  • zip
  • iter, enumerate
  • reversed, sorted
  • cmp
  • range, xrange ()
  • dict, intern
  • set, frozenset
  • bool
  • coerce
  • slice (used only for extended slicing - e.g. [10:0:-2])
  • buffer (a hackish type convenient to CPython extensions and some IO)


  • object
  • hash, id


  • hasattr, getattr, delattr
  • type, isinstance, issubclass ((variations in simple comparison / subclass test)(verify)
e.g. type() is str is functionally the same isinstance(, str) but isinstance is a little more flexible in that it lets you deal with subclassed cases
  • staticmethod, classmethod
  • super
  • property


  • exception


  • callable
  • locals, globals
  • vars
  • eval, compile
  • execfile
  • __import__: the function that the import statement uses
  • reload


  • file, open
  • input, raw_input

Shallow and deep copy

General shallow/deep copies are possible (on top of the basic reference assignment).

The following demonstration uses lists as a container, but this also applies to objects. (This does not summarize real objects and mutable structures like lists, since they themself contain references, so the concept of copying such objects is ambiguous, which is why there is a distinction in shallow and deep copying.)

>>> from copy import copy     #shallow copy - but note there are easier ways for lists
>>> from copy import deepcopy
>>> a = [object(),object()]     #original list
>>> b = copy(a)
>>> c = deepcopy(a)
>>> a
[<object object at 0xb7d21448>, <object object at 0xb7d21468>]
>>> b
[<object object at 0xb7d21448>, <object object at 0xb7d21468>]
>>> c
[<object object at 0xb7d21450>, <object object at 0xb7d21458>]

The shallow copy, b, is a new list object (id(a)!=id(b)), into which references to the objects the old collection are inserted.

The deep copy, c, is a new list object but also creates copies of the contained objects to insert into that new container.


With objects, or structures that contain objects, what you often mean to do is making a deep copy.

Note that this creation only works when the creation of these objects does not have peculiar side effects or rely on administration data or object references that it wouldn't be used the same way in deep copying.

Such issues limits deep copy in any language. There are usually partial fixes, often in the form of some way to optionally override deep-copy behaviour with your own functionality via an interface. Note that python's deepcopy does avoid circular recursion problems.


String stuff

String formatting

% with a tuple

Those coming from C will probably appreciate the % operator, which

acts like sprintf() and
mostly matches the classical C format strings
...omits p (there are no pointers), adds r for repr()

Example:

"%d %5.1f"%( 1,2 )   ==  '1   2.0'

% with a mapping

If you pass it a mapping (dict or similar) you can access them by name rather than position:

"%(s)s  %(foo)07o %(bar)5.1f"%{   's':'yay', 'foo':1, 'bar':2   }       == 'yay  0000001   2.0'

format()

format() seems to understand...

  • positional and name arguments
  • the same conversion specifiers (the type letter, which effectively defaults to s)

but does everything else in a more flexible style.


There is a decent introduction in https://pyformat.info/


Some examples:

# {} enumerates by position by default, so e.g.
'{} {}'.format( 4,8 ) == '4 8'

# You can explicitly index
'{1} {0}'.format(4,8) == '8 4'

# You can use named indexes
'{foo} {bar}'.format(foo=1,bar=2) == '1 2'

# Similarly, you can use with dicts like
data = {'foo':1, 'bar':2}
'{foo} {bar}'.format( **data ) == '1 2'

# Also consider the ability to do:
'{data[foo]} {data[bar]}'.format( data={'foo':1, 'bar':2} ) == '1 2'

# alignment and pad like:
'|{:<10.1f}|{:^10.1f}|{:_>10.1f}|'.format( 3.14, 3.14, 3.14) ==   '|3.1       |   3.1    |_______3.1|'

# It understands strftime style datetime formatting
'{:%Y-%m-%d %H:%M}'.format(datetime(2001, 2, 3, 4, 5)) == '2001-02-03 04:05'

# You can pass in parameters into the formatting, 
# by nesting style: (this would take multiple steps be nasty and confusing to do with %)
'{:^{width}.{prec}}'.format( 3.14159265, width=10, prec=3) == '   3.14   '


Note that due to format() being a function, you can effectively make formatting functions, like:

tab_cols = '{count}\t{url}'.format

...which lets you later do

tab_cols(url="http://example.com", count=2)

f-string formatting

PEP 498 (implemented since py3.6) adds f-string formatting.

Introduction-by-example: where previously you might do

'{}'.format( time.time() )

you can now do

f'{time.time()}'

One difference to format() is that it also lets you put runtime eval (so basically current scope) for the names instead, and you can basically put all your code inside the string (but may not want to).


This can be less less typing, and it can make it clearer what is getting placed where in a string.

This follows format(), not sprintf, so e.g.

formatting details such as < ^ > for alignment
details like it doesn't coerce things nearly as much, so
you need to get the type right ({} is roughly as but note quite as forgiving as %s}(verify))
you can't shove everything into the s formatter anymore(verify)
there is also the shorthand like {foo!r} instead of {repr(foo)}


Upsides:

  • can sometimes be a bunch more readable
and shorter
  • moving things inside can make it what is getting placed where, and many-item things are easily less likely to be incorrect
compare
'%s %s (%s) %s %s %r'%(a, lot, of, items, between, lines)
f'{a} {lot} ({of}) {items} {between} {repr(lines)}'
...particularly when you add or remove entries


Arguables/downsides:

  • can sometimes be a bunch less readable
and less structured
  • the more code you move inside strings, the more you
    • lose syntax highlighting
    • lose editor autocompletion
    • lose a linter being able to signal errors
    • lose useful stack traces
  • it's mostly like, but not actually equivalent to .format()
format couldn't push code inside, just do selective code-like things(verify)
  • if you hadn't already shifted to format(), and come from percent formatting, you have a new syntax to learn
and a whole new set of dunders and their behaviours and idiosyncracies. No peeking - do you know offhand what the underlying dunder and PEP definition is that means you cannot do f'{datetime.datetime.now():<20s}' while f'{datetime.datetime.now()}' is fine (the actual reason is implied by the following)
  • It's not clear when it's better to do formatting ahead of time or not. Say, one would prefer
f'Updated {date.today().strftime("%d-%m-%Y")}'
another would suggest it is probably cleaner to do
today_s = date.today().strftime("%d-%m-%Y")
f'Updated {today_s}'
another would point out
f'Updated {date.today():%d-%m-%Y}'


Some people consider them to deprecate percent-formatting and format(), others think that's not great.



__repr__ and __str__, and __format__

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

__repr__ and __str__ support repr() and str().

e.g. str(x) is basically equivalent to x.__str__()


{{comment|You should generally treat both str and repr (and their dunders) as useful for debugging only, and not for active functionality that other things rely on.

There is nothing stopping you from making either (or even both) actively useful and functional while also not necessarily breaking from their intended purpose), but making functional assumptions on them is likely to be an area for exciting bugs.


__repr__ intends to be unambiguous, __str__ intends to be readable

which can sometimes be the same, practically
...and code-wise, in fact str() seems fall back to use __repr__ if __str__ is not there(verify)


repr should let you distinguish between distinct objects even if they are equivalent
for custom-defined classes this is often <class name at memory location> because that is the inherited implementation from object.__repr__
(even if classes override this, you can get that behaviour back calling that directly)
for anything with singleton behaviour, just the value should be enough -- and python does that for e.g. str, int


python docs suggest that if possible, you should make __repr__ look like a Python expression that creates an object with the same value.
look like (probably because it's more useful to debugging than the default?), but it need not actually be possible to evaluate.
Nor is it required that eval(repr(foo)) == foo - that would arguably be dangerous to require and do, and which for a lot of objects is impossible anyway.
e.g. for uuid objects
its __str__() gives something like "2d7fc7f0-7706-11e9-94ae-0242ac110002" and
its __repr__() gives something like "UUID('2d7fc7f0-7706-11e9-94ae-0242ac110002')" - which in this case you can instantiate like that (given that UUID is bound in the relevant scope)



Because people really like f-strings and format(), you will find that if you expect

f'{thing}'

to do the same thing as one of

'%s'%(thing)

then you will sometimes be really confused.

Quick, what does

f'{datetime.date.today():20s}'

display?

...correct, it displays '20s'. Yeah, WTF.


Turns out sometimes there are cases you specifically want to force str() behaviour. (Also, repr() can be simpler than the format string way).


Type stuff

Type annotation

Around Python 3.5 and 3.6, we got a syntax and a helping module to annotate variables as described in PEP 526 (though a bunch more PEPs are relevant (18 total?))

More central of which are PEP 484 and perhaps PEP 483


Python type annotation looks like

def greeting(name: str) -> str:
    return 'Hello ' + name

You can also type variables, like

i:int = 1

...though there's not much point, because...


In practice, this is type annotation, not type checking - it has absolutely no effect at runtime.

It's basically a comment

that is also parseable
by IDEs so that they can show it to programmers
by documentation generators to similar effect
by linters like MyPy so it can show a few of the more egregious type-related mistakes (but fundamentally can never show all of them)
still useful, but do not depend on this, because...


It is dangerous to consider this type checking.

Even if you are using mypy, there are a number of things you can do at runtime that mypy cannot check - fundamentally.

If you want the safety of a statically typed language, use a statically typed language.



typing module

https://docs.python.org/3/library/typing.html


ctypes module

https://docs.python.org/3/library/ctypes.html

Functional-like things

Note on closures

See Closures for the basic concept. They were apparently introduced in python around 2.2.

Note that python closures are read-only (immutable), which means that trying to assign to a closured variable will actually mean creating a local variable with the same name. This works on function level, so even:

def f():
   x=3
   def g():
      y=x   #...with the idea that this would create a local y
      x=y   #   and this a local x...
      print( x )
   g()
f()

...won't work; function variables are declared at function compilation time, so it means "declare x in g's scope", but then you try to use it before you assign something to it.

Lambda, map, filter, reduce

Lambda expressions are functions that take the form lambda args:expression, for example lambda x: 2*x. They must be single expressions and therefore cannot contain newlines and therefore no complex code (unless they call functions).

They can be useful in combination with e.g. map(), filter() and such. For example:


Map gives a new list that that comes from applying a function to every element of an input list/iterable. For example:

>>> list(map( lambda x:4*x,  ['a',3] ))
['aaaa', 12]


Filter creates a new list whose elements for which the function returns true. For example:

>>> list(filter( lambda x: x%7==0 and x%2==1,  range(100) )) #odd multiples of 7 under 100
[7, 21, 35, 49, 63, 77, 91]
>>> list(filter(lambda x: len(x[1])>0, [[1,'one'],[2,''],[3,'three']] ))
[[1, 'one'], [3, 'three']]


Reduce does a nested bracket operation (e.g. ((((1+2)+3)+4)+5) when using the + function) on a list to reduce it to one value. For example:

>>> reduce(max, [1,2,3,4,5])                  # note that for the specific case of max, max([1,2,3,4,5]) is shorter
5
>>> reduce( lambda x,y: str(x)+str(y), range(12) )     # (a slow way of constructing this string, actually)
'01234567891011'

Generators: yield and next

Generators are functions that keep their state (basically coroutines), and yield things one at a time, usually generated/processed on-the-fly.

For example, if you have

def twopowers():
    n = 1
    while True:
        yield n
        n *= 2

Then you can at any time e.g. ask for the next ten, evaluated exactly when you ask for them, and there's not really a point this ends

>>> list( next(t)  for _ in range(10) )
[1, 2, 4, 8, 16, 32, 64, 128, 256, 512]
>>> list( next(t)  for _ in range(10) )
[1024, 2048, 4096, 8192, 16384, 32768, 65536, 131072, 262144, 524288]
>>> list( next(t)  for _ in range(10) )
[1048576, 2097152, 4194304, 8388608, 16777216, 33554432, 67108864, 134217728, 268435456, 536870912]


In terms of language theory, the construct this uses is known as a continuation. It makes streaming data and other forms of lazy evaluation easy, and encourages a more functional programming style and a more data-streaming one.



Generator expressions are a syntax that look like list comprehensions, but create and return a generator.

gen =  (x  for x in range(101))
next(gen) # 1
next(gen) # 2
next(gen) # 3

...and so on.

Many are finite, but they don't have to be.


This is often done for lazy evaluation, and they can be a memory/CPU tradeoff, in that at the cost of a little more overhead you never have to store a full list.

Consider, for example (use range() in py3):

max(     [x  for x in xrange(10000000)] )  # Memory use spikes by ~150MB
max( list(x  for x in xrange(10000000)) )  #  (...because it is a shorthand for this)

max(     (x  for x in xrange(10000000)) )  # Uses no perceptible memory (inner expression is a generator)
max(      x  for x in xrange(10000000)  )  # Uses no perceptible memory (inner expression is a generator)

Notes:

  • Sometimes this makes for more brevity/readability (though I've seen a bunch of syntax-fu that isn't necessarily either).
  • the last illustrates that in various places where the brackets aren't necessary (omitting them is unambiguous)
  • In python2, you often wanted xrange, a generator-based version, where range() returned a list.
In python3 range() is a generator-like object so the distinction no longer exists.
  • Don't use the profiler to evaluate xrange vs. range; it adds overhead to each function call, of which there will be ten million with the generator and only a few when using [] / list(). This function call is cheap in regular use, but not in a profiler.
  • when iterating over data, enumerate is often clearer (and less typing) than range



Iterables

In general, iterators allow walking through iterables with minimal state, usually implemented by an index into a list or a hashmap's keys, or possibly a pointer if implemented in a lower-level language.


In python, most collection types are iterable on request. Iterating dicts implies their keys. I imagine this is based on __iter__.


Notes:

  • Iterations won't take it kindly when you change the data you are iterating, so something like:
a={1:1, 2:2, 3:3}
for e in a:
  a.pop(e)

...won't work. The usual solution is to build a list of things to delete and doing that after we're done iterating.

Syntax things

lists

More things to do with lists...

range, xrange

>>> range(4)
[0,1,2,3]
>>> range(2,4)
[2,3]
>>> range(4,2,-1)
[4,3]

in py2, range()eturns a list and xrange is the generator-like equivalent you would prefer when giving it very large values. In py3, range() is generator based.


slicing

Slicing retrieves elements from lists very similar to how range would generate their indices.

>>> a=['zero','one','two','three','four','five']
>>> a[4:]
['four','five']
>>> a[:4]
['zero','one','two','three']
>>> a[3:5]
['three','four']
>>> a[4:0:-2]  # I rarely find uses for this, but it exists
['four', 'two']


collections

defaultdict

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

defaultdict is like a dict, but on access to things that do not exist, creates an entry (note: on fetches as well as assignment).

For which instantiation needs a parameter, which settles the type and value to start with

For example, when counting words, where you might normally write something like:



Consider wanting to count word occurrences'

With a regular dict you might write:

wc = {}

for word in some_words:
    if word not in wc:
        wc[word] = 1
    else:
        wc[word] += 1

or perhaps come clever syntax-fu to shorten that, like:

wc = {}

for word in some_words:
    wc[word] = wc.get(word, 0) + 1    # get() allows a fallback, which defaults to None but here we force to 0 instead

With defaultdict, you can do:

wc = defaultdict(int)     # what you hand in is what is instantiated when it's not in there yet
                          # this works in part because int() happens to be 0.

for word in some_words:
    wc[word] += 1


...though for this particular case, note also the existence of collections.Counter



Since you can also hand in things like list and dict, you can use defaultdict for nested structures (though this can get somewhat obfuscated).

# for example, assuming you have a functional subnet_for() that e.g. outputs '192.168.1.0/24' for 192.168.1.109

subnet_lister = defaultdict(list)

for ip in some_ips:
   subnet_lister[ subnet_for(ip) ].append( ip )

Counter

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

you can use defaultdict to count, but you can also get a slightly more capable counter (note: subclass of dict):

Takes any iterable, counts what comes out of that.

from collections import Counter

# Counter('communication')
Counter({'c': 2, 'o': 2, 'm': 2, 'n': 2, 'i': 2, 'u': 1, 'a': 1, 't': 1})

# c = Counter('communication')
# c.update('communication')
# del c['u']
# c['c'] = 10
# c
Counter({'c': 10, 'o': 2, 'm': 2, 'n': 2, 'i': 2, 'a': 1, 't': 1})
# c.most_common(3) 
[('c', 10), ('o', 2), ('m', 2)]

OrderedDict

Dictionaries

Dictionary views

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

You probably noticed that in python 3, keys(), values(), and items() (on dict and similar) not return all the data, or even a a generator as such, but a new thing.


These are dictionary view objects, which

  • seem to implement __len__, which e.g. a generator would not.
  • seem to implement __contains__, for in
  • can be iterated over
  • can be repeatedly iterated over(verify) (unlike a generator)
it seems to reflect the data as it was when it started each new iteration(verify)
will reflect changes of the underlying data (but doing so while still interating over them has undefined behaviour - may work perfectly fine, may not)


Fixed arguments, keyword arguments, and anonymous keyword arguments

All arguments have names, bound within the function scope.

Arguments can be passed in two basic ways: via positional arguments and via keyword arguments.


See also:


Some more notes

This can be useful to make the actual call to a big initializer function forwards and backwards compatible: The call will not fail when you use new or old keywords. You can choose to write your code ignoring, warning, and throwing error as is practical.

Without this, the call itself may fail when you though you were using a different version - and it's annoying when different versions of the same thing are not drop-in replacements.



One note related to defaults (not python-specific):

It sometimes makes sense to make a function react with a default by passing a value like None rather than specifying the default value in the function definition - pieces of pass-through code that may or may not take a user/config value cannot easily rely on a function-definition default using **kwargs (with the value explicitly removed) in the call, or even if-thenning with slightly different calls of the same function (ew).

Member-absence robustness

More than once I've wanted to check whether an object has a particular member -- without accessing it directly, since that would throw a TypeError if it wasn't there.


One way is to use the built-in function hasattr(object,name) (which is just a getattr that catches exceptions).

Another is to do 'membername' in dir(obj).


If you want to get the value and fall back on a default if the member wasn't there, you can use the built-in getattr(object,name[,default]), which is allows you to specify a default to return instead of throwing a TypeError.

OO

Accessors (property)

What are known as accessors in other languages can be done in python too, by overriding attributes, largely syntactic sugar for creating the function.

The specific property() function a function that serves approximately the same purpose as, say, C#'s attribute syntax. Example:

class NoNegative(object):
   def __init__(self):
      self.internal=1

   def get_x(self):
      print "Get function"
      return self.internal

   def set_x(self,val):
      print "Set function"
      self.internal=max(0,val) #make sure the value is never negative

   x=property(get_x,set_x)

# Testing in the interactive shell:
>>> nn = NoNegative()
>>> print nn.x
Get function
1
>>> nn.x=-4
Set function
>>> print nn.x
Get function
0


Notes:

  • The signature is property(fget, fset, fdel, doc), all of which are None by default and assignable by keyword.
  • You should use new-style objects (the class has to inherit from object); without the inheritance you make a more minimalistic class in which the above would use members instead(verify).
  • these won't show up in dir()s - they're not object members (which allows you to create shadow members and do other funky things)
  • Note that this indirection makes this slower than real attributes

self

Python doesn't hide the fact that class functions pass/expect a reference to the object they are working on. This takes getting used to if you've never used classes this way before.


Class functions must be declared with 'self', and must be called with an instance. This makes it clear to both python and coders whether you're calling a class function or not. Consider:

def f():
    print 'non-class'

class c(object):
    def __init__(self):
        f()      # refers to the function above. prints 'non-class'
        self.f() #refers to the function. Prints 'class', and the object
                 #note that self-calls imply adding self as the first parameter.
    def f(self):
        print 'class; '+str(self)
    
    def g(): #uncallable - see note below
        print 'non-class function in class'

o=c()
c.f(o) #c(f) would fail; no object to work on


You can mess with this, but it's rarely worth the trouble.

Technically, you can add non-class functions to a class, for example g() above. However, you can't call it. self.g() and o.g() fails because python adds self as the first argument, which always fails because g() takes no argument. It does this based on metadata on the function itself -- it remembers that it's an class method, so even h=o.g; h() won't work.

You can stick non-class functions onto the object after the fact, but if you do this in a class-naive way, python considers these as non-class functions and will not add self automatically. You can call these, but you have to remember they're called differently.

Details to objects, classes,and types

New-style vs. classic/old-style classes

New-style classes ware introduced in py2.2, partly to make typing make more intuitive sense by making classes types.


Old-style classes stayed the default (interpretation for classes defined like class Name: and not class Name(object):) up to(verify) and excluding python 3. Py3k removed old-style classes completely and made what was previously known as new-style behaviour the only behaviour.


When the distinction is there,new-style classes are subclasses of the object class, indirectly by subclassing another new-style object or built-in type, or directly by subclassing object:

 class Name(object):
     pass #things.


Differences:


Initialization, finalization

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.


See also:

Metaclasses

Consider that classes are objects that are templates for instantiations.

Metaclasses are templates to instantiate classes.

Usually, they are used as a fancy sort of class factories.

Generally, they will cause a headache. Don't use them unless you know you must, or really, really want to.

See also


%

% is an operator that imitates C's (s)printf. For example:

"%3d.%20s"%( 1,'quu' ) == '  1.                 quu'

It expects a tuple (not a list). There is an extra feature in that you can refer to data in a dict:

d={'id':3,'name':'foo'}
"%(id)3d. %(name)s"%d

(Note: if something is not present, this will cause a KeyError)


The dynamic typing means things are slightly more flexible. You can e.g. do "%s"%(3)


__underscore__ things

Note: the __rsomething__ variants for operators were ommited from this list for readability. See the notes on them below.


Metadata:

  • __doc__: docstring
  • __class__: class/type
  • __name__: name (of the module, class, or function/method)
  • __module__: module of definition (for classes, functions/methods)


Helper functions for internal handling, some one specific types, and sometimes conveniently accessible variations on built-in functions:

  • __hash__, __cmp__ (see also __eq__; these three interact in objects when you want them to be usable as dictionary keys [6])
  • __len__
  • __init__: constructor
  • __del__: destructor
  • __new__ allows you to create a bew object as a subtype of another (mostly useful to allow subclasses of immutable types) (new-style classes only)
  • __repr__: supports repr().
  • __reduce__
  • __reduce_ex__
  • __coerce__ (not always the same as the built-in coerce()(verify))


Helpers for syntax and operator handling (various are overloadable) (see also the operator module):

  • __contains__: supports in
  • __call__: for ()
  • __getattr__, __getattribute__, __setattr__, __delattr__: for member access (using .) (See notes below)
  • __getitem__, __setitem__, __delitem__: for [key]-based access (and preferably also slices; use of __getslice__, __setslice__, __delslice__ is deprecated). See also things like [7]


  • __int__, __long__, __float__, __complex__: supports typecasts to these
  • __hex__, __oct__: supporting hex() and oct()
  • __abs__: supports abs()


  • __and__: for &
  • __or__: for |
  • __lt__: for <
  • __le__: for <=
  • __gt__: for >
  • __ge__: for >=
  • __eq__: for ==
  • __ne__: for !=


  • __add__: for +
  • __pos__: for unary +
  • __sub__: for binary -
  • __neg__: for unary -
  • __mul__: for *
  • __div__: for /
  • __truediv__: for / when __future__.division is in effect
  • __floordiv__: for //
  • __mod__: for %
  • __divmod__: returns a tuple, (the result of __floordiv__ and that of __mod__)
  • __pow__: for **


  • __lshift__: for <<
  • __rshift__: for >>


  • __xor__,: for ^
  • __invert__: for ~


Special-purpose:

  • __all__: a list of names that should be considered public(verify) (useful e.g. to avoid exposing indirect imports)


  • __nonzero__: used in truth value testing


__i*__ variants

All the in-place variations (things like +=) are represented too, by __isomething__. For example: __iadd__ (+=), __ipow__ (**=), __irshift__(<<=), __ixor__ (^=), __ior__ (|=) and so on.

Why += isn't the same as + except when it is

Consider::

a = [1,2,3]
c = a

a += [4]
a = a + [5]

print(a)
print(c)

What does it output? Well, list + list like a + [5] creates a new list.

And we learned

a += b

is equivalent to

a = a + b

so it does exactly the same thing, right?

So we created a new list twice and it would output [1,2,3,4,5] [1,2,3] right?

No. It's:

[1, 2, 3, 4, 5]
[1, 2, 3, 4]


What gives?

So.

  • + is short for __add__
  • += is short for __iadd__

And, here's the crux:

  • += check for the presene of __iadd__ (in-place add). If it's there, we use that. If not, we fall back to evaluating with __add__ and assigning the result.


So the meaning of += is dynamic:

  • if the left side is mutable, like list, the two are not the same.
  • if the left side is immutable, like with str or int, both are the same: evaluate-new-value-and-assign

(if you replace the lists with string "123" and "4" and "5" you do get 12345 and 123)


This also means

  • you really shouldn't implement iadd on immutable objects


My take-away is that, like in the C days, operator overloading is nasty because the semantics are hidden. Avoid it where possible.

__r*__ variants

the __rsomething__ are variations with swapped operands. Consider x-y. This would normally be evaluated as

x.__sub__(y)

If that operion isn't supported, python looks at at whether y has a __rsub__, and looks whether it can instead evaluate as:

y.__rsub__(x)

The obvious implementation of both makes their evaluation equivalent. This allows you to define manual types which can be used on both sides of each binary operator and do so non-redundantly, and with a self-contained definition.

(Only for binary operator use: doesn't apply to the unary - or +, or the ternary **)

__getattr__ and __getattribute__

For old-style classes, if normal member access doesn't find anything, the __getattr__ is called instead.


In new-style classes, __getattribute__(self, name) is used for all attribute access. __getattr__ will only be called unless __getattribute__ raises an AttributeError (and, obviously, __getattr__ is defined)

Since __getattribute__it is used unconditionally, it is possible to create infinite loops when you access members on self in the self.name style. This is avoided by explicitly using the base class' __getattribute__ for that.

See also: