Python extensions

From Helpful
Revision as of 15:28, 24 November 2010 by Helpful (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search
These are primarily notes
It won't be complete in any sense.
It exists to contain fragments of useful information.

Python allows reasonable interfacing to and from C (primarily CPython; other implementations can do so indirecely at beast)


How you do it depends on what you want.


Much interfacing between C and Python is extending, that is, done with the purpose of putting some extra (C) code under the control of Python, whether it is to wrap some nice scripting around C, to use a C library inside python, or to code some inner loop in C.

You can also embed a python interpreter inside C. That is, you embed a python interpreter. This is usually to extend your program with scripting, though usually it involves some fairly mutual interfacing, embedding+interfacing.



Extending / bridging

ctypes

When you want to simply call functions in existing, compiled libraries (windows DLLs, most modern unix-style shared libraries), this is often what you want.

ctypes reads function names straight from the library (doesn't need a C header file or such). You can get this on windows, linux and more, mostly needing some environment-checking.

It cannot typecheck fully automatically, due to the differing naming/calling conventions.

It is often advisable to create proxy functions that work with static type calls, converting as needed, and such, because you're actually doing things at fairly low level and doing things wrong can easily lead to segfault or corrupt program state.

C API

You can write a Python C extension more or less from scratch, which compiles to shared objects Python can load and interface with. Since the C code is basically slave to python and written by you, this implies a lot of necessary boilerplate code, data conversion and awareness of some cpython implementation details.

If you have no need for the interfacing code to be clean, minimal and readable, SWIG may be an interesting timesaver.

SWIG

SWIG bridges various languages in a fairly generic way. It is useful for wrapping APIs and libraries in general. It works well, though the interfacing code isn't always too readable.

It does little more than provide bindings; you probably want a wrapper around these bindings to provide python calls that act pythonically. You also probably want to write that on the python side - if you do so on the C side it's still error-prone work in terms of memory freeing and such.

C++ specific

SIP

SIP was developed for Qt, specifically PyQt. It's like SWIG, but specifically made to bridging C++ and Python, which allows it to be cleaner than SWIG.

Boost.Python

Boost.Python is another Python-C++ bridge that is apparently a little more technically advanced than SIP and SWIG.

Inlining

Weave and PyInline allow you to write C code inside python strings, and handle compiling and binding the resultant function to a python module, so hides all the necessary glue.

I believe this is mostly syntactic sugar that handles calling a compiler and loading the compiled object - and most likely requires a compiler to be installed(verify).

These don't scale much in that you can't really organize large chunks of code well, so this is mainly useful for things like inner loop optimization. The upside is that it's probably the easiest way of doing a short bit of loop optimization.

Mixing

Pyrex is essentially a subset of the python language that uses only C types, meaning it can be compiled to C instead of python bytecode.

Pyrex can also interface with other C code, though doesn't do header files so needs hoop-jumping for libraries and such.


Cython is similar to Pyrex, largely compatible but (currently) somewhat more actively developed and more advanced than pyrex.

See also

Python C / extension API:


Pyrex:

Cython:


ctypes:

General / unsorted:


Some usage notes

These are primarily notes
It won't be complete in any sense.
It exists to contain fragments of useful information.

using ctypes

Ctypes is a python library that lets you interface with DLLs and shared objects in their compiled form.

This was written while I was reading the ctypes tutorial and playing with their examples.

Loading libraries

You need to know whether a library is based on cdecl or stdcall calling conventions (symbol mangling). Windows generally uses stdcall, other things usually cdecl. There are three styles of loading that ctypes can use:

  • cdll should be used to load cdecl-style libraries
  • windll should be used to load stdcall-style libraries
  • oledll should be used to load stdcall-style libraries, and it assumes HRESULT style error codes, adding a wrapper to raise WindowsErrors for them.

If you don't get stdcall/cdecl right, you'll usually get errors like: "ValueError: Procedure probably called with too many arguments (4 bytes in excess)".

You'll also get errors like that when you are passing along things of the wrong size. You should generally look at the respective headers or documentation, because you can also screw up type without screwing up size, and just see strange behaviour.


If you are writing ctypes code for cross-platform software, you could inspect your environment (e.g. os.name and sys.platform) and act accordingly. If your code can assume it's on linux or windows, you can cheat a little:

from ctypes import cdll
if sys.platform=='win32':
    libc=cdll.LoadLibrary('msvcrt.dll') 
else: # then assume libc6 *nix:
    libc=cdll.LoadLibrary('libc.so.6')</nowiki>

Notes:

  • While windows is usually stdcall, msvcrt is an exception: It's cdecl.
  • There is some python attribute trickery that can help in windows;
    libc=cdll.msvcrt
    has the same function as the above. I just prefer using fewer cases, for consistency.
  • There is a ctypes.util.find_library() function that will allow you to search the filesystem for a library. For example,
    find_library('c')
    would probably return libc.so.6, or nothing under windows; this is another way to check your platform / which DLLs should be loaded, and it allows you to be nice about minor library version changes (though if function typing changes between versions of libraries, doing that is dangerous).

Functions and types

Getting at functions
There is attribute trickery here too, ctypes will wrap as much as it can: with the above import,
libc.time(None)
already works.

This of course doesn't help if function names aren't valid python tokens, so there are other ways of getting references to the functions inside the libraries. Read the docs.

Because of the to-be-explained typing as well as extra logic you may want to add, it can be useful to create an easy-to-use module to hide all the ctypes scariness.

Conversion types / wrapper objects

Some python types will be implicitly wrapped in ctypes objects which can be seen as C types on the C side of things:

  • None: passed as NULL
  • str string (byte strings): char *
  • unicode strings: passed as wchar_t *
  • int and long: both as platform-sized int, after masking.


The rest will have to be handled more explicitly. For example:

printed_characters = libc.printf('foo %f', c_double(3.4))

Try that with c_float too. It's not very forgiving, no. ((verify) why)


Objects like those created by c_double() are towards automatic conversion. On the C side they can be accessed as regular C types. On the python side you can create these things, fetch their .value, and change them, and they're garbage collectible. The internal management and no-worries result is a good part of what makes ctypes neat.


The rest of the basic conversion types:

Notes:

  • Do know what you're doing with strings and particularly pointers. Read up.
  • These are wrapper objects. They are not the same as the values they indirectly store, and you should not assign them as if they were directly values. Basically, read [1].
Prototypes

You can add some type checking on the python side, since things are wrapped anyway. This means some automatic conversion, and if you try to violate this typing later, the python side will raise an ArgumentError.

If you don't set the result type, it's assumed to be an int unless set otherwise, so will not be properly converted unless this is set properly.

Example:

>>> strstr=libc.strstr
>>> print strstr('foo bar', 'o b') 
-1211344746

>>> strstr.argtypes = [c_char_p, c_char_p]
>>> strstr.restype  =  c_char_p
>>> print strstr('foo bar', 'o b') 
o bar

You can make more advanced (e.g. keywording) prototypes; see this.

Pointers

Various functions write to memory via pointers you give it; think sprintf (the resulting string), and and scanf (the actual values). This example uses sscanf, the variation that takes string input.

You need to create things sscanf will store into, then on the python side tell ctypes to hand the function pointers on the C side of things. The same 'a string is inherently a pointer' logic as in C applies.

i,f = c_int(),c_float()
s = create_string_buffer( '\x00'*32 )  #creates 32-byte bytestring filled with NULs 
                                       # (interpreted as NUL-terminated c string)
num_matched_values = libc.sscanf("1 3.14 Hello",
                                 "%d %f %s",   
                                 byref(i), byref(f), s)
 
print "stored %s values.\n'%s' '%s' '%s'"%( num_matched_values,
                                            i.value, f.value, s.value )

This is the syntax for passing references, which actually hides of the pointer detail. ctypes also has a 'real' pointer type that can point to ctype objects (among other things).


Arrays
This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

In this context, arrays are fixed-size lists of specifically typed things. Each array has have a type. Luckily ctypes makes this fairly simple. For example:

ThreesomeType = c_int * 3
three = ThreesomeType(3,1,2)

...creates a new type, and then an object containing three c_ints. You can access it from python as an array, and this would show up in C as

Structs

Structs and unions imitate C packing, and use the native byte order. Both of these details can be overridden. To steal an example from here, the C struct:

 struct passwd {
       char    *pw_name;       /* user name */
       char    *pw_passwd;     /* encrypted password */
       uid_t   pw_uid;         /* user uid */
       gid_t   pw_gid;         /* user gid */
       time_t  pw_change;      /* password change time */
       char    *pw_class;      /* user access class */
       char    *pw_gecos;      /* Honeywell login info */
       char    *pw_dir;        /* home directory */
       char    *pw_shell;      /* default shell */
       time_t  pw_expire;      /* account expiration */
       int     pw_fields;      /* internal: fields filled in */
 };

can be interfaced using:

class PASSWD(ctypes.Structure):
    _fields_ = [("name", ctypes.c_char_p),
                ("passwd", ctypes.c_char_p),
                ("uid", ctypes.c_int),
                ("gid", ctypes.c_int),
                ("change", ctypes.c_long),
                ("class", ctypes.c_char_p),
                ("gecos", ctypes.c_char_p),
                ("dir", ctypes.c_char_p),
                ("shell", ctypes.c_char_p),
                ("expire", ctypes.c_long),
                ("fields", ctypes.c_int)   ]

To copy the rest of the example too:

libc.getpwnam.argtypes = [ctypes.c_char_p]
libc.getpwnam.restype =  ctypes.POINTER(PASSWD)
#Now, after you called it, you can fetch entries on the python side using:
entry.name   # and such

See also this and this

See also

this goes through all the steps of how you could write your own C that is ctypes-interfacable.