Python extensions

From Helpful
(Redirected from Ctypes)
Jump to navigation Jump to search
Syntaxish: syntax and language · changes and py2/3 · decorators · importing, modules, packages · iterable stuff · concurrency

IO: networking and web · filesystem

Data: Numpy, scipy · pandas, dask · struct, buffer, array, bytes, memoryview · Python database notes

Image, Visualization: PIL · Matplotlib, pylab · seaborn · bokeh · plotly


Tasky: Concurrency (threads, processes, more) · joblib · pty and pexpect

Stringy: strings, unicode, encodings · regexp · command line argument parsing · XML

date and time


Notebooks

speed, memory, debugging, profiling · Python extensions · semi-sorted

These are primarily notes
It won't be complete in any sense.
It exists to contain fragments of useful information.

Python allows reasonable interfacing to and from C (primarily CPython; other implementations can do so indirecely at beast)


How you do it depends on what you want.


Much interfacing between C and Python is extending, that is, done with the purpose of putting some extra (C) code under the control of Python, whether it is to wrap some nice scripting around C, to use a C library inside python, or to code some inner loop in C.

You can also embed a python interpreter inside C. That is, you embed a python interpreter. This is usually to extend your program with scripting, though usually it involves some fairly mutual interfacing, embedding+interfacing.



Extending / bridging

ctypes

When you want to simply call functions in existing, compiled libraries (windows DLLs, most modern unix-style shared libraries), this is often what you want.

ctypes reads function names straight from the library (doesn't need a C header file or such). You can get this on windows, linux and more, mostly needing some environment-checking.

It cannot typecheck fully automatically, due to the differing naming/calling conventions.

It is often advisable to create proxy functions that work with static type calls, converting as needed, and such, because you're actually doing things at fairly low level and doing things wrong can easily lead to segfault or corrupt program state.

C API

You can write a Python C extension more or less from scratch, which compiles to shared objects Python can load and interface with. Since the C code is basically slave to python and written by you, this implies a lot of necessary boilerplate code, data conversion and awareness of some cpython implementation details.

If you have no need for the interfacing code to be clean, minimal and readable, SWIG may be an interesting timesaver.

SWIG

SWIG bridges various languages in a fairly generic way. It is useful for wrapping APIs and libraries in general. It works well, though the interfacing code tends not to be very readable.

It does little more than provide bindings; you probably want a wrapper around these bindings to provide python calls that act pythonically. You also probably want to write that on the python side - if you do so on the C side it's still error-prone work in terms of memory freeing and such.

C++ specific

SIP

SIP was developed for Qt, specifically PyQt. It's like SWIG, but specifically made to bridging C++ and Python, which allows it to be cleaner than the general-purpose SWIG.

Boost.Python

Boost.Python is another Python-C++ bridge that is apparently a little more technically advanced than SIP and SWIG.

f2py

This article/section is a stub — probably a pile of half-sorted notes and is probably a first version, is not well-checked, so may have incorrect bits. (Feel free to ignore, or tell me)

Compiles-and-uses FORTRAN code from python. Was adopted by numpy/scipy, so also integrates decently with it.


http://docs.scipy.org/doc/numpy-dev/f2py/ http://docs.scipy.org/doc/numpy-dev/f2py/getting-started.html


Inline C (weave, PyInline)

This article/section is a stub — probably a pile of half-sorted notes and is probably a first version, is not well-checked, so may have incorrect bits. (Feel free to ignore, or tell me)

Things like Weave and PyInline and allow you to write C code inside python strings, and handle (re)compiling and binding the result to a python module, so hides all the necessary glue (beyond some basic "this is input, this is output" and some learning about e.g. indexing tricks).


Weave was adopted by numpy/scipy, so deals decently with it. Also deals well with blitz.


This sort of option doesn't scale much in that you can't really organize large chunks of code well (if you want that, you should probably write a proper extension), so this is probably mainly useful for things like inner loop optimization.

The upside is that it's probably one of the easiest ways of doing such inner-loop optimizations pretty efficiently.


Keep in mind that if you handle python objects at all (approximately: anything that isn't scalar values), you'l have to learn how to talk to python objects with C. It's not horribly hard, but is some up-front reading.


Mixing (Cython, Pyrex)

This article/section is a stub — probably a pile of half-sorted notes and is probably a first version, is not well-checked, so may have incorrect bits. (Feel free to ignore, or tell me)

The approach of Cython and Pyrex is that you can write python-like code using only types that translate well to C.

They handle the compilation to C (instead of to python bytecode), and loading it as a C extension.


The useful trick is that you can selectively step away from safe python towards bare know-what-you're-doing C.

Change nothing and you basically get the same performance you would get from pure python, but you can get much faster number crunching when you tell it e.g.

  • int and float variables can be done purely with C types
  • doing array indexing in C, and take out python's added behaviour, such as
    • bounds checking can be disabled
    • omitting the code that deals with negative indexes (python's index wraparound logic)

Cython seems (as of this writing) somewhat more actively developed and more advanced than pyrex.

You can also interface with other C code, though this can be... interesting.



The actual way you write code varies.

  • Cython's 'pure python mode':
    • you write python but annotate it with specific optimization
    • Running it as python works
    • Running it using cython makes it run faster
    • Useful to be portable, but doesn't give you as much speed increase as you can get.
  • the other way is basically writing code is
    • write python-like language in .pyx files
    • run cython compiler, which outputs a .c file
    • compile that .c file into a python library
    • import that library into python


Semi-sorted

Python C / extension API:


Pyrex:

Cython:


ctypes:

General / unsorted:

Some usage notes

These are primarily notes
It won't be complete in any sense.
It exists to contain fragments of useful information.

using ctypes

Ctypes is a python library that lets you interface with existing DLLs and shared objects in their compiled form.

This was written while I was reading the ctypes tutorial and playing with their examples.


Loading libraries

You need to know whether a library is based on cdecl or stdcall calling conventions(/symbol mangling). Windows generally uses stdcall, other things usually cdecl. There are three styles of loading that ctypes can use:

  • cdll should be used to load cdecl-style libraries
  • windll should be used to load stdcall-style libraries
  • oledll should be used to load stdcall-style libraries, and it assumes HRESULT style error codes, adding a wrapper to raise WindowsErrors for them.

If you don't get stdcall/cdecl right, you'll usually get errors like: "ValueError: Procedure probably called with too many arguments (4 bytes in excess)".

You'll also get errors like that when you are passing along things of the wrong size. You should generally look at the respective headers or documentation, because you can also screw up type without screwing up size, and just see strange behaviour.


If you are writing ctypes code for cross-platform software, you could inspect your environment (e.g. os.name and sys.platform) and act accordingly. If your code can assume it's on linux or windows, you can cheat a little:

from ctypes import cdll
if sys.platform == 'win32':
    libc=cdll.LoadLibrary('msvcrt.dll') 
else: # then assume libc6 *nix:
    libc=cdll.LoadLibrary('libc.so.6')

Notes:

  • While windows is usually stdcall, msvcrt is an exception: It's cdecl.
  • There is some python attribute trickery that can help in windows; libc=cdll.msvcrt has the same function as the above. I just prefer using fewer cases, for consistency.
  • There is a ctypes.util.find_library() function that will allow you to search the filesystem for a library. For example, find_library('c') would probably return libc.so.6, or nothing under windows; this is another way to check your platform / which DLLs should be loaded, and it allows you to be nice about minor library version changes (though if function typing changes between versions of libraries, doing that is dangerous).

Functions and types

Getting at functions

There is attribute trickery here too, ctypes will wrap as much as it can: with the above import, libc.time(None) already works.

This of course doesn't help if function names aren't valid python tokens, so there are other ways of getting references to the functions inside the libraries. Read the docs.

Because of the to-be-explained typing as well as extra logic you may want to add, it can be useful to create an easy-to-use module to hide all the ctypes scariness.

Conversion types / wrapper objects

Some python types will be implicitly wrapped in ctypes objects which can be seen as C types on the C side of things:

  • None: passed as NULL
  • str string (byte strings): char *
  • unicode strings: passed as wchar_t *
  • int and long: both as platform-sized int, after masking.


The rest will have to be handled more explicitly. For example:

printed_characters = libc.printf('foo %f', c_double(3.4))

Try that with c_float too. It's not very forgiving, no. ((verify) why)


Objects like those created by c_double() are towards automatic conversion. On the C side they can be accessed as regular C types. On the python side you can create these things, fetch their .value, and change them, and they're garbage collectible. The internal management and no-worries result is a good part of what makes ctypes neat.


The rest of the basic conversion types:

Notes:

  • Do know what you're doing with strings and particularly pointers. Read up.
  • These are wrapper objects. They are not the same as the values they indirectly store, and you should not assign them as if they were directly values. Basically, read [1].
Prototypes

You can add some type checking on the python side, since things are wrapped anyway. This means some automatic conversion, and if you try to violate this typing later, the python side will raise an ArgumentError.

If you don't set the result type, it's assumed to be an int unless set otherwise, so will not be properly converted unless this is set properly.

Example:

>>> strstr=libc.strstr
>>> print strstr('foo bar', 'o b') 
-1211344746

>>> strstr.argtypes = [c_char_p, c_char_p]
>>> strstr.restype  =  c_char_p
>>> print strstr('foo bar', 'o b') 
o bar

You can make more advanced (e.g. keywording) prototypes; see this.

Pointers

Various functions write to memory via pointers you give it; think sprintf (the resulting string), and and scanf (the actual values). This example uses sscanf, the variation that takes string input.

You need to create things sscanf will store into, then on the python side tell ctypes to hand the function pointers on the C side of things. The same 'a string is inherently a pointer' logic as in C applies.

i,f = c_int(),c_float()
s = create_string_buffer( '\x00'*32 )  #creates 32-byte bytestring filled with NULs 
                                       # (interpreted as NUL-terminated c string)
num_matched_values = libc.sscanf("1 3.14 Hello",
                                 "%d %f %s",   
                                 byref(i), byref(f), s)
 
print "stored %s values.\n'%s' '%s' '%s'"%( num_matched_values,
                                            i.value, f.value, s.value )

This is the syntax for passing references, which actually hides of the pointer detail. ctypes also has a 'real' pointer type that can point to ctype objects (among other things).


Arrays
This article/section is a stub — probably a pile of half-sorted notes and is probably a first version, is not well-checked, so may have incorrect bits. (Feel free to ignore, or tell me)

In this context, arrays are fixed-size lists of specifically typed things. Each array has have a type. Luckily ctypes makes this fairly simple. For example:

ThreesomeType = c_int * 3
three = ThreesomeType(3,1,2)

...creates a new type, and then an object containing three c_ints. You can access it from python as an array, and this would show up in C as

Structs

Structs and unions imitate C packing, and use the native byte order. Both of these details can be overridden. To steal an example from here, the C struct:

struct passwd {
        char *  pw_name;        /* user name */
        char *  pw_passwd;      /* encrypted password */
        uid_t   pw_uid;         /* user uid */
        gid_t   pw_gid;         /* user gid */
        time_t  pw_change;      /* password change time */
        char *  pw_class;       /* user access class */
        char *  pw_gecos;       /* Honeywell login info */
        char *  pw_dir;         /* home directory */
        char *  pw_shell;       /* default shell */
        time_t  pw_expire;      /* account expiration */
        int     pw_fields;      /* internal: fields filled in */
};

can be interfaced using:

class PASSWD(ctypes.Structure):
    _fields_ = [("name",   ctypes.c_char_p),
                ("passwd", ctypes.c_char_p),
                ("uid",    ctypes.c_int),
                ("gid",    ctypes.c_int),
                ("change", ctypes.c_long),
                ("class",  ctypes.c_char_p),
                ("gecos",  ctypes.c_char_p),
                ("dir",    ctypes.c_char_p),
                ("shell",  ctypes.c_char_p),
                ("expire", ctypes.c_long),
                ("fields", ctypes.c_int)   ]

To copy the rest of the example too:

libc.getpwnam.argtypes = [ctypes.c_char_p]
libc.getpwnam.restype =  ctypes.POINTER(PASSWD)
#Now, after you called it, you can fetch entries on the python side using:
entry.name   # and such

See also this and this

See also

this goes through all the steps of how you could write your own C that is ctypes-interfacable.