Python usage notes - struct, buffer, array, bytes, memoryview
Syntaxish: syntax and language · changes and py2/3 · decorators · importing, modules, packages · iterable stuff · concurrency
IO: networking and web · filesystem Data: Numpy, scipy · pandas, dask · struct, buffer, array, bytes, memoryview · Python database notes Image, Visualization: PIL · Matplotlib, pylab · seaborn · bokeh · plotly
Stringy: strings, unicode, encodings · regexp · command line argument parsing · XML speed, memory, debugging, profiling · Python extensions · semi-sorted |
This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, or tell me) |
struct
struct.pack and struct.unpack converts between python types and byte representation, and allows mixing of types.
It is useful for "(de)serialize this byte data this particular way", for example:
struct.unpack( '<i2c', '\x40\x00\x00\x00\x33\x54') == (64, '3', 'T') struct.pack( '<i2c', 64, '3', 'T' ) == '@\x00\x00\x003T' # it may save you some typing to realize that if you have t = (64, '3', 'T') # then you can equivalently do: struct.pack( '<i2c', *t )
For struct's type specifiers, see e.g. Python_usage_notes_-_Numpy,_scipy#Data_types (alongside numpy's dtypes).
array.array
array.array reads binary data of a single type into an object that is functionally much like a list.
Warning: The integer sizes seem to be platform-native and uncontrollable, so you can't use this in a portable way.(verify) Use struct or numpy instead.
Example code:
a = array.array( '>f', '49smdffg' ) list(a) == [4.7046257325851247e+27, 1.0880330647149406e+24]
See also:
- http://docs.python.org/2/library/array.html
- http://docs.python.org/2/library/struct.html#format-characters (types)
- http://docs.python.org/2/library/struct.html#byte-order-size-and-alignment (byte order)
buffer protocol
This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, or tell me) |
The buffer protocol allows access to the memory underlying an objects's interesting data.
It works when the python object implements the buffer C API.
This can be useful for efficient copies (or zero-copy views) between C extensions, sliced byte views on large things, for directly tweaking underlying data when this view is read-write
Producers include:
- numpy (verify)
- PIL (verify)
- bytestrings (e.g. str in py2, bytes (immutable) and bytearray (mutable) in py3)
- mmap objects (but since mmap allows slicing it's redundant in many cases)
- array.array objects (various are multi-byte values)
Consumers include:
- buffer object (in py2; memoryview object in py3)
- file write() can take bytes from a buffer-protocol object
- (a sensible optimization, as without this it or you would probably need to convert to string/bytes first)
See also:
buffer object
(py2. For py3, see memoryview)
A python object that allows an indexed view on a buffer-protocol object.
By default it views the entire given object, but it can be a (zero-copy) slice if you use the offset and/or size parameter.
Since buffer can allow read-write access, this can mean compact storage and/or fast access.
Consider for example:
import numpy a = numpy.ndarray(8000000) b = a.data # which is similar to b=numpy.getbuffer(a,0,len(a)) though you often want to give a dtype b # would print <read-write buffer for 0x1d7b410, size 8000000, offset 0 at 0x1e353b0> b[4]='\xff' b[:8] # would print '\x00\x00\x00\x00\xff\x00\x00\x00' # to see the whole thing, try str(b) or perhaps b[:]
...and a will have one weird value because you just set some bytes underlying an IEEE double (numpy's default dtype is float64).
memoryview
This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, or tell me) |
(py3, backported to py2.7. For earlier, see buffer object)
Memoryview is like py2 buffer objects (and built on the same protocol),
but a little fancier in that you can have typed memoryviews.
Consider:
import numpy a = numpy.ndarray(8000000) # typed b = memoryview(a) b[:2] == [0.0, 0.0] # get at the bytes like: list( b[:2].tobytes() ) == [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
While working purely in python code, memoryview is primarily useful to represent a slice of something else (e.g. a bytes object) without copying the data - which is what slices would usually be required to do.
- One place where this is useful is parsing a large file, handing things from a memoryview on a bytes to struct
See also: