Python usage notes - struct, buffer, array, bytes, memoryview

From Helpful
Jump to navigation Jump to search
Syntaxish: syntax and language · changes and py2/3 · decorators · importing, modules, packages · iterable stuff · concurrency

IO: networking and web · filesystem

Data: Numpy, scipy · pandas, dask · struct, buffer, array, bytes, memoryview · Python database notes

Image, Visualization: PIL · Matplotlib, pylab · seaborn · bokeh · plotly


Tasky: Concurrency (threads, processes, more) · joblib · pty and pexpect

Stringy: strings, unicode, encodings · regexp · command line argument parsing · XML

date and time


Notebooks

speed, memory, debugging, profiling · Python extensions · semi-sorted

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.


struct

struct.pack and struct.unpack converts between python types and byte representation, and allows mixing of types.

It is useful for "(de)serialize this byte data this particular way", for example:

struct.unpack( '<i2c', '\x40\x00\x00\x00\x33\x54') == (64, '3', 'T')
struct.pack( '<i2c', 64, '3', 'T' )                == '@\x00\x00\x003T'


# it may save you some typing to realize that if you have
t = (64, '3', 'T')  
# then you can equivalently do:
struct.pack( '<i2c', *t )


For struct's type specifiers, see e.g. Python_usage_notes_-_Numpy,_scipy#Data_types (alongside numpy's dtypes).

array.array

array.array reads binary data of a single type into an object that is functionally much like a list.

Warning: The integer sizes seem to be platform-native and uncontrollable, so you can't use this in a portable way.(verify) Use struct or numpy instead.

Example code:

a = array.array( '>f', '49smdffg' )
list(a) == [4.7046257325851247e+27, 1.0880330647149406e+24]

See also:

buffer protocol

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

The buffer protocol allows access to the memory underlying an objects's interesting data.

It works when the python object implements the buffer C API.

This can be useful for efficient copies (or zero-copy views) between C extensions, sliced byte views on large things, for directly tweaking underlying data when this view is read-write

Producers include:

  • numpy (verify)
  • PIL (verify)
  • bytestrings (e.g. str in py2, bytes (immutable) and bytearray (mutable) in py3)
  • mmap objects (but since mmap allows slicing it's redundant in many cases)
  • array.array objects (various are multi-byte values)

Consumers include:

  • buffer object (in py2; memoryview object in py3)
  • file write() can take bytes from a buffer-protocol object
(a sensible optimization, as without this it or you would probably need to convert to string/bytes first)

See also:


buffer object

(py2. For py3, see memoryview)

A python object that allows an indexed view on a buffer-protocol object.

By default it views the entire given object, but it can be a (zero-copy) slice if you use the offset and/or size parameter.


Since buffer can allow read-write access, this can mean compact storage and/or fast access. Consider for example:

import numpy
a = numpy.ndarray(8000000) 
b = a.data   # which is similar to  b=numpy.getbuffer(a,0,len(a)) though you often want to give a dtype
b # would print <read-write buffer for 0x1d7b410, size 8000000, offset 0 at 0x1e353b0>

b[4]='\xff'
b[:8]   # would print '\x00\x00\x00\x00\xff\x00\x00\x00'
# to see the whole thing, try str(b) or perhaps b[:]

...and a will have one weird value because you just set some bytes underlying an IEEE double (numpy's default dtype is float64).


memoryview

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

(py3, backported to py2.7. For earlier, see buffer object)


Memoryview is like py2 buffer objects (and built on the same protocol), but a little fancier in that you can have typed memoryviews.

Consider:

import numpy
a = numpy.ndarray(8000000) 

# typed
b = memoryview(a)
b[:2] == [0.0, 0.0]

# get at the bytes like:
list( b[:2].tobytes() ) == [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]


While working purely in python code, memoryview is primarily useful to represent a slice of something else (e.g. a bytes object) without copying the data - which is what slices would usually be required to do.

One place where this is useful is parsing a large file, handing things from a memoryview on a bytes to struct



See also: