Python usage notes - struct, buffer, array, bytes, memoryview

From Helpful
Jump to: navigation, search
Various things have their own pages, see Category:Python. Some of the pages that collect various practical notes include:
This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)


struct

struct.pack and struct.unpack converts between python types and byte representation, and allows mixing of types.

It is useful for "(de)serialize this byte data this particular way", for example:

struct.unpack( '<i2c', '\x40\x00\x00\x00\x33\x54') == (64, '3', 'T')
struct.pack( '<i2c', 64, '3', 'T' ) == '@\x00\x00\x003T'
 
# it may save you some typing to realize that if you have
t = (64, '3', 'T')  
# then you can equivalently do:
struct.pack( '<i2c', *t )


For struct's type specifiers, see e.g. Python_usage_notes_-_Numpy,_scipy#Data_types (alongside numpy's dtypes).


array.array

array.array reads binary data of a single type into an object that is functionally much like a list.

Warning: The integer sizes seem to be platform-native and uncontrollable, so you can't use this in a portable way.(verify) Use struct or numpy instead.

Example code:

a = array.array( '>f', '49smdffg' )
list(a) == [4.7046257325851247e+27, 1.0880330647149406e+24]

See also:

buffer protocol

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

The buffer protocol allows access to the memory underlying an objects's interesting data.

It works when the python object implements the buffer C API.

This can be useful for efficient copies (or zero-copy views) between C extensions, sliced byte views on large things, for directly tweaking underlying data when this view is read-write

Producers include:

  • numpy (verify)
  • PIL (verify)
  • bytestrings (e.g. str in py2, bytes (immutable) and bytearray (mutable) in py3)
  • mmap objects (but since mmap allows slicing it's redundant in many cases)
  • array.array objects (various are multi-byte values)

Consumers include:

  • buffer object (in py2; memoryview object in py3)
  • file write() can take bytes from a buffer-protocol object
(a sensible optimization, as without this it or you would probably need to convert to string/bytes first)

See also:


buffer object

(py2. For py3, see memoryview)

A python object that allows an indexed view on a buffer-protocol object.

By default it views the entire given object, but it can be a (zero-copy) slice if you use the offset and/or size parameter.


Since buffer can allow read-write access, this can mean compact storage and/or fast access. Consider for example:

import numpy
a = numpy.ndarray(8000000) 
b = a.data   # which is similar to  b=numpy.getbuffer(a,0,len(a)) though you often want to give a dtype
b # would print <read-write buffer for 0x1d7b410, size 8000000, offset 0 at 0x1e353b0>
 
b[4]='\xff'
b[:8]   # would print '\x00\x00\x00\x00\xff\x00\x00\x00'
# to see the whole thing, try str(b) or perhaps b[:]

...and a will have one weird value because you just set some bytes underlying an IEEE double (numpy's default dtype is float64).


memoryview object

(py3, backported to py2.7. For earlier, see buffer object)

Basically the py3 mostly-equivalent rewrite of the py2 buffer objects (and its protocol).

See also: