Python usage notes - struct, buffer, array, bytes, memoryview

From Helpful
Jump to: navigation, search
Various things have their own pages, see Category:Python. Some of the pages that collect various practical notes include:
This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

array.array and struct

struct.pack and struct.unpack converts between python types and byte representation, and allows mixing of types.

It is useful for "(de)serialize this byte data this particular way", for example:

struct.unpack( '<i2c', '\x40\x00\x00\x00\x33\x54') == (64, '3', 'T')
struct.pack( '<i2c', 64, '3', 'T' ) == '@\x00\x00\x003T'
# it may save you some typing to realize that if you have
t = (64, '3', 'T')  
# then you can equivalently do:
struct.pack( '<i2c', *t )

For struct's type specifiers, see e.g. Python_usage_notes_-_Numpy,_scipy#Data_types (alongside numpy's dtypes).

array.array reads binary data of a single type into an object that is functionally much like a list.

Warning: The integer sizes seem to be platform-native and uncontrollable, so you can't use this in a portable way.(verify)

Example code:

a = array.array( '>f', '49smdffg' )
list(a) == [4.7046257325851247e+27, 1.0880330647149406e+24]

(I typically use numpy instead, more flexible)

On endianness (and platform size)

 struct                                           array
 < for little-endian,                             < for LE,                           
 > for big-endian,                                > for BE,                      
 = for native                                     = for standard-size native,
 | for not applicable                             @ for native-size native (default)                    
 (built-in types default to = or |(verify))     (verify)

See also:

buffer protocol

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

The buffer protocol allows access to the memory underlying an objects's interesting data - if the python object implements the buffer C API.

This can be useful for efficient copies (or zero-copy views) between C extensions, and optionally sliced byte views on large things, for directly tweaking underlying data when this view is read-write

Producers include:

  • numpy (verify)
  • PIL (verify)
  • bytestrings (e.g. str in py2, bytes (immutable) and bytearray (mutable) in py3)
  • mmap objects (but since mmap allows slicing it's redundant in many cases)
  • array.array objects (various are multi-byte values)

Consumers include:

  • buffer object (in py2; memoryview object in py3)
  • file write() can take bytes from a buffer-protocol object

See also:

buffer object

(py2. For py3, see memoryview)

A python object that allows an indexed view on a buffer-protocol object.

By default it views the entire given object, but it can be a (zero-copy) slice if you use the offset and/or size parameter.

Since buffer can allow read-write access, this can mean compact storage and/or fast access. Consider for example:

import numpy
a = numpy.ndarray(8000000)  # which is similar to  b=numpy.getbuffer(a,0,len(a)) though you often want to give a dtype
b # would print <read-write buffer for 0x1d7b410, size 8000000, offset 0 at 0x1e353b0>
b[:8]   # would print '\x00\x00\x00\x00\xff\x00\x00\x00'
# to see the whole thing, try str(b) or perhaps b[:]

...and a will have one weird value because you just set some bytes underlying an IEEE double (numpy's default dtype is float64).

memoryview object

(py3, backported to py2.7. For earlier, see buffer object)

Basically the py3 mostly-equivalent rewrite of the py2 buffer objects (and its protocol).

See also: