Difference between revisions of "Python usage notes  Numpy, scipy"
m (→Correlation) 
m (→Semisorted) 

Line 1,582:  Line 1,582:  
===Semisorted===  ===Semisorted===  
+  
+  <!  
+  
+  http://wiki.scipy.org/Tentative_NumPy_Tutorial  
+  
+  
+  http://www.engr.ucsb.edu/~shell/che210d/numpy.pdf  
+  
+  
+  http://pages.physics.cornell.edu/~myers/teaching/ComputationalMethods/python/arrays.html  
+  
+  
+  >  
+  
====Correlation====  ====Correlation====  
Line 1,592:  Line 1,606:  
covariance(X, Y) / (stdv(X) * stdv(Y)  covariance(X, Y) / (stdv(X) * stdv(Y)  
: which makes a number of assumptions about the distribution, basically that it's gaussian  : which makes a number of assumptions about the distribution, basically that it's gaussian  
−  
* Spearman’s correlation  * Spearman’s correlation  
: similar idea, based on rank of each sample, basically to not assume much about the distribution {{verify}}  : similar idea, based on rank of each sample, basically to not assume much about the distribution {{verify}}  
−  
* Kendall's tau is something you'ld use on [[ordinal data]]  * Kendall's tau is something you'ld use on [[ordinal data]]  
Line 1,605:  Line 1,617:  
>  >  
−  
<!  <!  
−  +  numpy.corrcoef  pearson's r  
−  +  
−  numpy.corrcoef  +  
* if running on matrices, it will give a correlation matrix  * if running on matrices, it will give a correlation matrix  
* if you meant overall correlation piecewise, then .flatten() may be the simplest solution  * if you meant overall correlation piecewise, then .flatten() may be the simplest solution  
+  
+  
+  numpy.correlate  is crosscorrelation  
+  : \Sum a[n+k] * conj(v[n])  
Latest revision as of 16:03, 5 December 2019
Various things have their own pages, see Category:Python. Some of the pages that collect various practical notes include:

Contents
 1 Intro
 2 numpy
 2.1 Some concepts you may care about
 2.2 Filtering, searching, counting
 2.3 Iterating
 2.4 Sorting
 2.5 Serializing
 2.6 Going between python objects and numpy
 2.7 Manipulating shape, size, and axes
 2.8 Histograms, Binning as in dealing with values
 2.9 Semisorted
 2.10 Warnings and errors
 2.10.1 RuntimeWarning: invalid value encountered in less_equal
 2.10.2 ValueError: data type must provide an itemsize
 2.10.3 ValueError: zerosize array to some function without identity
 2.10.4 RuntimeWarning: numpy.ufunc size changed, may indicate binary incompatibility
 2.10.5 underflow encountered in exp
 2.10.6 The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
 2.10.7 Using a nontuple sequence for multidimensional indexing is deprecated
 2.10.8 elementwise == comparison failed and returning scalar instead; this will raise an error or perform elementwise comparison in the future
 3 scipy
 4 On speed
 5 See also
Intro
numpy is a C extension that does ndimensional arrays  a relatively generic basis that other things can build on.
Numpy itself mostly does basic matrix operations, and some linear algebra, and interfaces with BLAS and LAPACK, so is fairly fast (certainly much preferable ver number crunching in purepython code. It can still help to build against a specific BLAS).
It also defines the array interface, which allows sharing of data without knowing each other's APIs, so there are various other libraries that build on it with more specific goals, or just cooperate nicely, including:
 pandas  for various data analysis, for a good part easing the pragmatic sides of loading and sellecting and summarizing and other massaging
 PIL / Pillow does a bunch of image loading, saving and altering  and lets you view images as arrays, making it easy and fast to go back and forth between other libraries mentioned here
 matplotlib and mayavi for visualization (see also matplotlib notes)
 Just because you may be interested, others include:
...etc.
There will be overlap between these (and they will often have subtle differences in behaviour or capabilities), e.g.
between numpy and scipy's SVD,
between FFTs in different places,
between scipy's and PIL's image functions.
numpy
Some concepts you may care about
On sizes/shapes and axes
It helps to have a mental model of the way numpy deals with dimensions, yet it's also too flexible to have one true interpretation.
So this axis stuff is partially just your own selfconsistent convention, but it often pays to have the same conventions as other people, and that of numpy functions ('s default axis behaviour) where it has them.
As such, unless you have a reason otherwise, you may wish to think of:
 axis 0 as rows
 axis 1 as columns
 axis 2 as depth in a 3D box, slices in a stack of samesized images, or such.
Doing this can avoid confusion later, and sometimes helps performance.
If you dig deeper e.g. because you want efficient transfers to other software, you also end up having to know about C order versus Fortran order and such. In general use you often wouldn't care about this.
This is also why shape is a little funny to some people, depending on whether you come from a math background or not.
Math often describes matrices as ybyx ,rowbycolumn), so:
array([ [1, 2, 3], [2, 4, 5]] )
would be a 2by3 matrix, and its shape is (2,3)
Many other fields (e.g. image processing) may prefer widthbyheight, and are going to be more confused.
 Reshaping
Since array data is a contiguous block, the shape is actually just metadata thae control how you and functions view/use that data.
Which means you can change the shape as long as the total size stays constant. This has few direct use cases, mostly flattening away and introducing dimensions.
a=np.arange(15) # now a is array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]) # and a.shape is (15,) a.reshape(3,5) # means a is array([[ 0, 1, 2, 3, 4], [ 5, 6, 7, 8, 9], [10, 11, 12, 13, 14]]) # while a.reshape(3,5,1) # would give an array like array([[[ 0], [ 1], [ 2], [ 3], [ 4]], [[ 5], [ 6], [ 7], [ 8], [ 9]], [[10], [11], [12], [13], [14]]])
Indexing / slicing
It's not just you, this stuff takes getting used to.
It's quite consistent, though, so eventually becomes natural enough.
To steal and shorten some examples from here, given:
q = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
Numpy's array slicing syntax is similar to python's.
q[1:7] == array([1, 2, 3, 4, 5, 6]) # with a non1 step size: q[1:7:2] == array([1, 3, 5]) # negative indices interpreted as counts from the end (i.e. length plus it). q[3:4] == array([3, 4, 5]) # negative stepping is also allowed q[4:3:1] == array([6, 5, 4])
 Moredimensional slicing:
Perdimension wishes are separated with commas and yields a sameorlowerdimensional array, depending on whether you ask for a range or specific index.
# Say you have: >>> r = np.array([ [[1],[2],[3]], [[4],[5],[6]]]) # Which has shape (2, 3, 1)
For example, selecting the first row:
r[0,:,:] == array([[1], [2], [3]]) # To go from shape (2,3,1) to (2,3), you would probably go for: s = r[:,:,0]
There are some shorthands you may care to know about.
Aside from views on the data, slicing is also useful to selectively apply operations (ufuncs) to a matrix. For example:
a[: , 1] /= 2 # divide the second column of a 2D matrix by two a[0::3 , :] /= 2 # divide every third row by two
...and if the array was 3D, you effectively omitted the third dimensional wish, so you selected all of it.
ufuncs and vectorization
Consider you can e.g. add or multiply by scalars:
>>> a = np.array([1,2,3,4]) >>> a * 2 array([2, 4, 6, 8])
This actually requires some glue to work (particularly because it mixes a python scalar with a numpy array), but it's so basic and useful that many such things are specialcased to just work (and that glue is often relatively simple, because...).
Numpy models the concept of a perelement function, called a 'universal function', better known as a ufunc.
This lets you mix many different things (generating, transforming, etc) and allows for things like:
>>> S = np.array([.25, .5, .75, 1]) >>> np.sin( np.pi * S ) array([ 7.07106781e01, 1.00000000e+00, 7.07106781e01, 1.22464680e16 ]) >>> B = np.linspace(1,10, 10) # (which is array([1,2,3,4,5,6,7,8,9,10])) >>> 1 + np.exp( np.cos( np.pi * B ) )
Notes:
 Applying ufuncs is usually faster than alternatives  not because it's parallel or such, but because the iteration and evaluation can often happen in C (sometimes with vectorized computation, where possible)
 ufuncs include arithmetic, algebra, comparison logic, bitwise logic, things like conjugates.
 combinations of ufuncs are effectively also ufuncs(verify)
Generating 1D arrays
Your options to generate such matrices include:
 numpy.arange(), similar to python's range() but produces a numpy array
 evenly spaced
 Your basic start,stop,step
 deals with floats
>>> numpy.arange(1, 5, 0.3) array([ 1. , 1.3, 1.6, 1.9, 2.2, 2.5, 2.8, 3.1, 3.4, 3.7, 4. , 4.3, 4.6, 4.9])
 numpy.linspace()[1]
 Evenly spaced
 start, stop, and amount_of_steps in which to get there
 avoids rounding errors at the endpoints (verify) so the last value will always be stop
>>> numpy.linspace(1,5,14) array([ 1. , 1.30769231, 1.61538462, 1.92307692, 2.23076923, 2.53846154, 2.84615385, 3.15384615, 3.46153846, 3.76923077, 4.07692308, 4.38461538, 4.69230769, 5. ])
Generating higherD grids
It is regularly useful to evaluate functions for a given set of values, e.g. on a grid. Options include:
 reshape something, often one of the above, e.g.
# numpy.arange(1,10).reshape(3,3) array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
 numpy.meshgrid()
 ndimensional coordinate matrices, from 1D coordinate vectors
 modelled after matlab's meshgrid
 intuitively (for two): assume these two things need to be in the X and Y axis, so broadcast them according to the shape of the other
# e.g. a 2D x = numpy.array([1, 2, 3, 4]) # using linspace here is often more flexible y = numpy.array([10, 20, 30]) XX, YY = numpy.meshgrid(x, y) XX # array([[1, 2, 3, 4], # [1, 2, 3, 4], # [1, 2, 3, 4]]) YY # array([[10, 10, 10, 10], # [20, 20, 20, 20], # [30, 30, 30, 30]]) XX + YY # array([[11, 12, 13, 14], # [21, 22, 23, 24], # [31, 32, 33, 34]]) # This is helpful e.g. when trying to write cleanish ufunc combinations that use both, e.g x = numpy.array([0.1, 0.2, 0.3, 0.4, 0.5]) y = numpy.array([0.1, 0.2, 0.3, 0.4, 0.5]) XX, YY = numpy.meshgrid(x, y) numpy.sqrt( XX*XX + YY*YY ) #array([[0.141, 0.224, 0.316, 0.412, 0.51 ], # [0.224, 0.283, 0.361, 0.447, 0.539], # [0.316, 0.361, 0.424, 0.5 , 0.583], # [0.412, 0.447, 0.5 , 0.566, 0.64 ], # [0.51 , 0.539, 0.583, 0.64 , 0.707]])
When it is enough to use integer based matrices, the above use has some shorter alternatives:
XX, YY = numpy.mgrid[10:40:10, 1:5] # same as the first XX, YY above
XX, YY = numpy.ogrid[10:40:10, 1:5] XX # array([[10], # [20], # [30]]) YY # array([[1, 2, 3, 4]]) XX+YY # array([[11, 12, 13, 14], # [21, 22, 23, 24], # [31, 32, 33, 34]])
This can be useful when you're working with images, or other pixel/voxel style data. For example, say you want the radius from an arbitrary center point.
# Some setup (you may want to inspect intermediates yourself to see what's happening) height, width = (5,5) # subtract half: fractional pixels and zerobased numbers # intuitively: draw a 3x3 and 4x4 matrix and figure out where the center really is cx = 0.5 + (0.5*width) cy = 0.5 + (0.5*height) yarr,xarr = np.indices([height,width]) # this creates incrementing indices in each direction, so: # xarr is [0 1 2 3 4], [0 1 2 3 4], ... # yarr is [0 0 0 0 0], [1 1 1 1 1], ... # ...which makes the following easy to express (note: in a vectorized way): numpy.sqrt((xarrcx)**2+(yarrcy)**2) #array([[2.828, 2.236, 2. , 2.236, 2.828], # [2.236, 1.414, 1. , 1.414, 2.236], # [2. , 1. , 0. , 1. , 2. ], # [2.236, 1.414, 1. , 1.414, 2.236], # [2.828, 2.236, 2. , 2.236, 2.828]])
# Or more interestingly, something like a gaussian (here intended for a fourierspace filter) import matplotlib.pyplot, numpy as np h = 10 # in pixels height, width = (20,20) cx = 0.5 + (0.5*width) cy = 0.5 + (0.5*height) yarr,xarr = np.indices([height,width]) # this creates incrementing indices in each direction, so: gaussian_lowpass = np.exp( ( ( (xarrcx)**2 / h**2 ) +((yarrcy)**2 / h**2)) ) matplotlib.pyplot.imshow(gaussian_lowpass, cmap='gray') matplotlib.pyplot.show()
There is also:
 scitools.numpyutils.meshgrid extended version of numpy.meshgrid to 1D, 2D and 3D grids, with sparse or dense coordinate arrays and matrix or grid indexing
 scitools.numpyutils.ndgrid same as calling meshgrid with indexing=’ij’ (matrix indexing)
 scitools.UniformBoxGrid
 scitools.BoxGrid
...and a number of tools around them.
Broadcasting
When combining matrices with operations that are elementwise, the most obvious requirement would be identical shapes.
Combining nonidentical shapes works by following some specific rules.
Broadcasting basically means one array will be repeated to fit (much as if there's some code looping it for you)
a = array([[ 1., 2.], [ 3., 4.], [ 5., 6.]]) b = array([1., 3.]) a + b # will yield: array([[ 0., 5.], [ 2., 7.], [ 4., 9.]]) # ...because b was repeated downwards
This works when relevant axis sizes match, or are 1. For example:
array([[1, 2, 3], [4, 5, 6]]) + array([10]) array([[1, 2, 3], [4, 5, 6]]) + array([[10], [100]]) array([[1, 2, 3], [4, 5, 6]]) + array([10,20,30])
Cases may intuitively feel somewhat ambiguous when there are several axes onto which you could combine.
For example:
a = array([[ 1., 2.], [ 3., 4 ]]) b = array([1., 3.]) a + b # array([[ 0., 5.], # [ 2., 7.]]) # It's not that it chooses the first axis rather than the second, # it's that b's shape effectively a row vector # If you want it to apply to the second axis a + b[:,np.newaxis] array([[ 0., 1.], [ 6., 7.]]) # Transposing wouldn't work  b is 1D with shape (2,). # you could introduce the new dimension (now shape (1,2)) and then transpose it, yes. a + b[numpy.newaxis,:].T
See also:
 http://scipylectures.github.io/intro/numpy/numpy.html#broadcasting
 http://www.scipy.org/EricsBroadcastingDoc
Structured arrays / record arrays
You can load/save/use mixed data types, in a Cstructlike way.
You can also give fields names, which lets you access them that way. (This is handled within the numpy.recarray class, a subclass of numpy.ndarray).
Some notes:
 There are (confusingly) many different ways of telling numpy about the format. See: http://docs.scipy.org/doc/numpy/user/basics.rec.html#definingstructuredarrays
 You can mix them, but that's just confusing
 I prefer the longnameswithbitsize variant (e.g. float32, uint16) for readability
 each type can be prefixed by a repetition, or a shape, e.g. 3int8, (2,3)float64
 You can define a data type with names, which gives you an array that you can also address with those names:
pt = np.dtype([ ('chunk_id', 'a1'), ('chunk_good', 'bool'), # size of a bool is one byte ('chunk_used', 'bool'), ('chunk_size', '<u4'), # littleendian unsigned 32bit integer ('format', 'S4'), # 4byte string ]) a = np.frombuffer('A\x00\x01\x01\x02\x03\x04qqhrB\x01\x00\x03\x06\x11\x12qqlr', dtype=pt) a[0]['format'] # will be 'qhrr' print repr(a) # will show something like: # array([('A', False, True, 67305985L, 'qqhr'), # ('B', True, False, 303105539L, 'qqlr')], # ...plus the dtype
See also:
Data types
NOTE: much of this needs to be cleaned, completed, or at least doublechecked.
The below oversimplifies things, omits some details on implicit conversions, some serialized forms (such as 16bit floats), and more. For completeness, check the actual documentation.
Most of the interesting predefined types you want to hand to dtype arguments are on the left:
numpy dtype object numpy string spec string spec for python module struct (only when using the nondefault "standard mode"!) (see note below) numpy.float32 f4 or float32 f numpy.float64 f8 or float64 d numpy.int64 i8 or int64 q numpy.uint64 u8 or uint64 Q numpy.int32 i4 or int32 i or l numpy.uint32 u4 or uint32 I or L numpy.int16 i2 or int16 h numpy.uint16 u2 or uint16 H numpy.int8 i1 or int8 b numpy.uint8 u1 or uint8 B numpy.bool (uses a byte) b1 or bool ? numpy.complex64 (two 32bit floats) c4 or complex64 numpy.complex128 (two 64bit floats) c8 or complex128 < for littleendian > for bigendian = for native  for not applicable (default is often = or (verify)) fixedsize listlike things: # being the length you want 'S#' / 'a#' bytestring s or p 'U#' unicode string (Py_UNICODE, so apparently UCS2 or UCS4 depending on the platform) 'V#' raw data
Notes:
 Defaults:
 The default float is often f8 (float64) (also when converting from python float)
 'complex' defaults to complex128.
 the default integer may be i4 or i8 (platformsized?(verify))
 I mention the struct module, I left out array.
 array's int sizes are platformnative, and there's no ability to control it like there is with struct, so it doesn't really fit in the above table.
 struct is better, but still artuably more bother than necessary if you can use numpy
 for the struct column, these are only valid when not using standard size (when using '<', '>', '!' or '='), not in native size (most int types can vary) See types. When you want portability, use standard mode.
Type hierarchy, testing
numpy has a hierarchy to its dtypes, which is useful when you e.g. want to test something in broader strokes  is it any variant of integer, is it signed or not, is it any sort of inexact number, etc.
Note that
 the supertype of integer types is numpy.integer, which is the supertype of numpy.unsignedinteger and numpy.signedinteger
 np.int and np.uint exist but are concrete types, so will not be a supertype of anything. Specifically, they are nativesized ints/uints
 the supertype of float and complex is numpy.inexact
 the supertype of float is numpy.floating
 the superclass of string types and void* is numpy.flexible
 For the tree, look for the docstring of numpy.core.numerictypes
So, for example, to see if something is any sort of integer you can use:
np.issubdtype(some_dtype, np.integer) # or use the kind # (most commonly you are interested in f, u, i, or c) ary.dtype.kind in ('u','i')
Or going further:
numpy.issubclass_(numpy.float32, numpy.inexact) == True numpy.issubclass_(numpy.complex64, numpy.inexact) == True numpy.issubclass_(numpy.int32, numpy.integer) == True numpy.issubclass_(numpy.int32, numpy.signedinteger) == True numpy.issubclass_(numpy.int32, numpy.unsignedinteger) == False numpy.issubclass_(numpy.uint32, numpy.integer) == True numpy.issubclass_(numpy.uint32, numpy.signedinteger) == False numpy.issubclass_(numpy.uint32, numpy.unsignedinteger) == True
On byte order (endianness)
Data's endianness is listed in dtype.byteorder, but that'll often be dtype.descr seems to be more useful (verify)
changing endianness
ary.byteswap(inplace=True) # inplace meaning the backing bytes are reordered ary.byteswap(inplace=False) # new array?(verify) # as a view: ( alternative: arr.view(arr.dtype.newbytorder(thatchar)) ) swapped = o.newbyteorder('S') # swapped from its current endianness as_little = o.newbyteorder('<') as_big = o.newbyteorder('>') as_native = o.newbyteorder('=')
To find platform endianness', look at sys.byteorder, or do something like: (this should be clear, there are briefer ways)
oneone = numpy.ones(1, dtype='=i2') littleoneone = oneone.newbyteorder('<') native_is_little = (oneone[0] == littleoneone[0])
See also:
 http://docs.scipy.org/doc/numpy/reference/arrays.scalars.html
 http://docs.scipy.org/doc/numpy/reference/arrays.dtypes.html
Filtering, searching, counting
Iterating
This article/section is a stub — probably a pile of halfsorted notes, is not wellchecked so may have incorrect bits. (Feel free to ignore, fix, or tell me) 
http://docs.scipy.org/doc/numpy/reference/arrays.nditer.html
Sorting
This article/section is a stub — probably a pile of halfsorted notes, is not wellchecked so may have incorrect bits. (Feel free to ignore, fix, or tell me) 
Sorting along an axis, independently
To do a sort along an axis, each separately, use sort() without order.
# new array: numpy.sort(a, [axis, kind]) # inplace a.sort([axis, kind])
 axis  the axis to sort over
 defaults to 1, the last axis (which seems to be the only variant that does not keep temporary data(verify), so is faster)
 axis=None flattens the array
 kind is one of: [2]
 'quicksort' (fast, not stable)
 'mergesort' (stable, extra work space)
 'heapsort' (not stable)
Examples:
# given: a = np.array( [[3, 1, 2], [4, 9, 6], [0, 2, 5]] ) # To sort within rows: (implied axis=1, here the columns, and also equivalent to axis=1) np.sort(a) == array( [[1, 2, 3], [4, 6, 9], [0, 2, 5]]) # To sort within columns: np.sort(a, axis=0) == array([[0, 1, 2], [3, 2, 5], [4, 9, 6]]) # Flatten (to give a 1D sorted array of all values) np.sort(a, axis=None) == array([1, 1, 2, 2, 3, 3, 4, 6, 9])
Sorting a table
...like you're sorting a table by one of its columns: reorder all columns the same way, by the values in one.
This article/section is a stub — probably a pile of halfsorted notes, is not wellchecked so may have incorrect bits. (Feel free to ignore, fix, or tell me) 
If you do not have fields defined (not a structured array), then argsort() is probably the most flexible but it's not inplace, which can matter to you for large arrays. If it does, see below.
("argsort returns an array of indices of the same shape as a that index data along the given axis in sorted order."[3])
# sort columns, by the values in the second column >>> a[ a[:,1].argsort() ] array([[3, 1, 2], [1, 2, 3], [4, 9, 6]]) # sort rows, by the values in the third row >>> a[ a[2,:].argsort() ] array([[3, 1, 2], [4, 9, 6], [1, 2, 3]])
When sorting a structured array' (i.e. when you have fields defined):
>>> a=np.array([[(3, 2, 1)], [(9, 7, 8)], [(4, 5, 6)]], dtype=[('FOO', '<i8'), ('BAR', '<i8'), ('QUU', '<i8')])
...then you can sort by those fields (columns), inplace or not:
>>> a.sort(order=['FOO'], axis=0) # it seems you have to explicitly mention the right axis >>> a # even though it's the only one that makes sense array([[(3, 2, 1)], [(4, 5, 6)], [(9, 7, 8)]], dtype=[('FOO', '<i8'), ('BAR', '<i8'), ('QUU', '<i8')])
You can order by one field, then another for cases where the first are equal. (no example; the data above does not have such a case).
To columnsort fieldless arrays inplace (can make sense for very big arrays which you are using as tables), you have to fake fields.
You can use a view to add fields onto an existing array. Then call the view's .sort() to sort the backing data inplace.
Caveats:
 you can still only really sort by columns.
 You have to get the dtype right. Getting a view via a hardcoded dtype, e.g. a.view('i4,i4,i4')(or i, or similar) won't be remotely portable, unless you also created the array with that same hardcoded type (below, numpy chooses a's type, which could be i4 or i8 depending on context)
>>> a = np.array([(3, 1, 2), (4, 9, 6), (1, 2, 3)]) >>> b = a.view( ','.join( [str(a.dtype)]*a.shape[1]) ) # TODO: check this >>> b array([[(3, 1, 2)], [(4, 9, 6)], [(1, 2, 3)]], dtype=[('f0', '<i8'), ('f1', '<i8'), ('f2', '<i8')]) >>> b.sort(order=['f0'], axis=0) >>> a array([[(1, 2, 3)] [(3, 1, 2)] [(4, 9, 6)]])
See also
 http://docs.scipy.org/doc/numpy/reference/generated/numpy.sort.html#numpysort
 http://stackoverflow.com/questions/2828059/sortingarraysinnumpybycolumn
 http://stackoverflow.com/questions/5047407/pythonnumpyarraysorting
Serializing
This article/section is a stub — probably a pile of halfsorted notes, is not wellchecked so may have incorrect bits. (Feel free to ignore, fix, or tell me) 
....often to/from disk, sometimes memory.
Note that if instead of a local backandforth to storage you want to use these for interchange, then in most cases you have to do the extra bookkeeping/metadata stuff related to shape and data type (including signedness and endianness).
 pickle
You can use pickle on numpy objects.
You probably want to explicitly use pickle.HIGHEST_PROTOCOL as it's smaller on disk and writes and reads a lot faster than the simpleandstupid default. I regularly see a factor 10 or 20 faster for writing (more when using cPickle over pickle) and reading (with little difference between pickle/cPickle).
 Text files
 Raw bytestrings (specify dtype)(verify)
Conversion to and from bytestrings:
 tostring()basically constructs a python bytestring copy of just the array's byte storage
 you'll have to keep track of shape and data type (including endianness) yourself, meaning this isn't hugely portable.
 There is a similar numpy.fromstring(a,type), though chances you'll only pair it with numpy.tostring(a)
This can be a convenient way to read in known raw structures, avoiding allocation when backing files are large. Note, however, that these are limited to 2GB even on 64bit archs.
When compared to fromstring it can help avoid double allocation.
 Matlab matrix data
Again, this can be nice for interchange with Matlab (or other things that know this format). (the saving speed, loading speed, and space used is pretty good.)
 npy binary format
which is one of the few options that does store enough metadata for portability(verify), allows Corder and Fortran order, and allows multiple arrays.
For details of the file format used, look for npy format
Note that you can save to a StringIO/BytesIO, which can be nice when passing this into a database.
Going between python objects and numpy
In general, work in numpy, and minimize the amount of conversions you do.
Numpy to python
 You're probably looking for tolist(), which creates a python list(oflists) filled with conversions to the closest applicable types.
 Avoid going to python until/unless necessary  the copy to python is slowish, and working in python is slow compared to numpy
Python to numpy
 Since numpy allocates its own contiguous array, this will also be a copy.
 ...although objects that implement the array protocol (and satisfy some other requirements?{{verify}) may be viewed instead.
Manipulating shape, size, and axes
Reshaping, reordering
These two do not change the backing data, it just looks at the same data differently (and so requires that new size fits exactly).
It returns a differentshaped view on the same backing data.
For example, say that you have:
>>> a = array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 ]) # e.g. from linspace >>> a.reshape( (6,2) ) array([[ 1, 2 ], [ 3, 4 ], [ 5, 6 ], [ 7, 8 ], [ 9, 10 ], [ 11, 12 ]]) >>> a.reshape( (2,6) ) array([[ 1, 2, 3, 4, 5, 6 ], [ 7, 8, 9, 10, 11, 12 ]])
Notes:
 One of the lengths can be 1, which means "it's an exact multiple and you figure it out".
 Note that when you reshape an array, the order= argument does not mean 'output should be this ordering' (as it usually does), but (?). If a copy is necessary, it will be Cordered. If it can be done without a copy, the ordering depends on other factors. If you need the output in a specific ordering, you may need a .copy() with order= argument.
 1D flattening
 numpy.ravel() is a reshape into 1D, and is essentially reshape(1, order=order)
 ndarray.flatten() seems to always returns a copy (verify)
 ndarray.flat() gives a numpy iterator (relatively like a numpy iterator)
 numpy.ravel() is a reshape into 1D, and is essentially
For certain specific conversions, allocation as (or readout as, which can be differnt) Fortran order or C order indexing can be relevant.
Reminder:
 Cstyle index ordering: the last axis index changing fastest, back to the first axis index changing slowest.
 Fortranstyle index order: with the first index changing fastest, and the last index changing slowest
ndarray.flags tells you the array's current properties
 numpy.asfortranarray()</tt>
 numpy.ascontiguousarray()</tt> (C order)
numpy.require() can require ordering among other things (alignment, writeability, that we own the data) and will make a copy if necessary
http://stackoverflow.com/questions/17558388/reorderingofnumpyarrays
Notes:
 Iteration is always / defaults to C order (verify)
Resizing (in the resampling/interpolation sense)
This article/section is a stub — probably a pile of halfsorted notes, is not wellchecked so may have incorrect bits. (Feel free to ignore, fix, or tell me) 
(Note that most of these will return copied data)
Integerfactor reductions
 decimating by taking every nth pixel via indexing, and throwing away the rest
 size down only, integer multiples only
 e.g. image[::2,::2]
 with default arguments, this is 'block means', which in image processing contexts this is sometimes called binning or coarsening
 size down only, integer multiples only
 does something per nonoverlapping rectangle, e.g.
block_reduce(image, block_size=(2,2), func=np.mean) # default func is np.sum block_reduce(image, block_size=(2,3,1), func=np.max)
 Presumably the code here does the same thing (haven't really looked yet)
 size down only, integer factor only
 uses interpolation, specifically order8 Chebyshev type I, or 30 point FIR
Not very sorted yet
 one of the functions in scipy.interpolate
 e.g. map_coordinates(), which is actually way more general
 scipy.signal.resample  resample using fourier space (zeropadding in frequency domain)
 in one dimension only?(verify)
 scipy.signal.resample_poly  polyphase and FIR filter
Somewhat imagespecific
 nearest, bilinear, bicubic, cubic, lanczos
 meant for PIL interaction so coerces results to integer(verify)
 e.g. zoom(), using spline interpolation of, by default, order 3
combining data
splitting
chunk up along a particular axis, return them as separate arrays (views?). (Sort of the counterpart of stacking)
 numpy.split(axis=something) splits an array into a list of multiple subarrays of equal size.
 numpy.hsplit() is split() with axis=1, i.e. horizontally, columnwise
 numpy.vsplit() is split() with axis=0, i.e. vertically, rowwise
 numpy.dsplit() is split() with axis=2, i.e. depthwise
 numpy.array_split() is similar, but when splitting into a given amount of items it does not require equal division.
>>> c = numpy.array([[ 1., 2.], [ 3., 4.], [ 5., 6.], [ 7., 8.]]) >>> numpy.vsplit(c, 2) [array([[ 1., 2.], [ 3., 4.]]), array([[ 5., 6.], [ 7., 8.]])] >>> numpy.hsplit(c, 2) [array([[ 1.], [ 3.], [ 5.], [ 7.]]), array([[ 2.], [ 4.], [ 6.], [ 8.]]) ]
>>> x = np.arange(9.0) >>> np.split(x, 3) [array([ 0., 1., 2.]), array([ 3., 4., 5.]), array([ 6., 7., 8.])] >>> >>> np.split(x, [3, 5, 6, 10]) [array([ 0., 1., 2.]), array([ 3., 4.]), array([ 5.]), array([ 6., 7., 8.]), array([], dtype=float64)] >>> np.array_split(x, 4) [array([ 0., 1., 2.]), array([ 3., 4.]), array([ 5., 6.]), array([ 7., 8.])]
See also:
 http://docs.scipy.org/doc/numpy/reference/generated/numpy.split.html#numpy.split
 http://docs.scipy.org/doc/numpy/reference/generated/numpy.array_split.html#numpy.array_split
 http://docs.scipy.org/doc/numpy/reference/generated/numpy.hsplit.html#numpy.hsplit
 http://docs.scipy.org/doc/numpy/reference/generated/numpy.vsplit.html#numpy.vsplit
 http://docs.scipy.org/doc/numpy/reference/generated/numpy.dsplit.html#numpy.dsplit
Growing
You can't really change the size of a numpy array without reallocating and copying.
 a basic implication of it being backed contiguously in memory, which is good for speed
 (could the realloc expand the existing allocation?(verify))
 yes, this implies that numpy arrays that are ~30% of memory may not make you happy. (You may wish to consider betterinformed C code, or some wellinformed numpy calls)
If you know the eventual size of the array, allocate that size immediately and assign data as you go.
 This is probably the most memoryefficient way to go
If you don't need numpy operations on the axis of growth, consider making a python list of numpy matrices.
 (pretty much because that isn't contiguous)
 If you don't know the final size, but do want to end up with a single numpy array,
 ...consider first making the python list of numpy arrays, and eventually join or stack them
 it will, for a short time, mean you use twice the memory that the data needs.
 ...but so will attempting to grow an existing array
Padding
delete, insert
newaxis
More axis manipulation
Unsorted
0D array, scalars
This article/section is a stub — probably a pile of halfsorted notes, is not wellchecked so may have incorrect bits. (Feel free to ignore, fix, or tell me) 
Histograms, Binning as in dealing with values
This article/section is a stub — probably a pile of halfsorted notes, is not wellchecked so may have incorrect bits. (Feel free to ignore, fix, or tell me) 
(cf. binning as in blockmeans coarsening  see resizing)
numpy.digitize [4]
scipy.stats.binned_statistic [5]
numpy.bincount [6]
bin_means = numpy.histogram(data, bins, weights=data)[0] / numpy.histogram(data, bins)[0]
http://stackoverflow.com/questions/23594262/binningvaluesofafunctioninpythonnumpy
Semisorted
Correlation
distance calculation
None
Converting a python array containing None
 becomes NaN under float/double
 is a TypeError casting to int
 without explicit dtype can force the array to object
Bits
Bitmasks or other booleans
numpy.unpackbits converts 8bitsinanuint8 to 1bitperuint8
>>> np.unpackbits(array([[2], [7], [23]], dtype=np.uint8), axis=1) array([[0, 0, 0, 0, 0, 0, 1, 0], [0, 0, 0, 0, 0, 1, 1, 1], [0, 0, 0, 1, 0, 1, 1, 1]], dtype=uint8)
numpy.packbits converts 1bitpersomeinteger to uint8, 8 bits at a time
 padded to 8 with zeroes on the right side per element
 so when you specify fewer than 8 you actually specify the leftmost bits  a bit odd perhaps, and meaning you sometimes need to do trickery like ([0]*(8len(a)%8))+a
 For example:
>>> np.packbits(array([ [ [1,0,1], [0,1,0] ], [ [1,1,0], [0,0,1] ] ]) array( [ [ [160],[64] ] , [ [192],[32]] ], dtype=uint8)
integers with unusual edges
Say you want to load from unusuallysized integers that imply packing across byte boundaries, often raw image data, e.g. 4, 10, 12, or 14 bits per pixel, 24 bits per sample.
Since there is no native data format for that, you will probably be loading that into uint8, uint16, uint16, uint16, and uint32 respectively.
If the data isn't huge, and this isn't a bottleneck in your app, then it's probably easiest to load it into a covnenientlysized integer type, then bitmassage it into multiple arrays with the dtype you want, and probably merge them. (As much in numpy code as possible so it's native)
If data is large, you probably want to avoid the intermediate allocations, and do the described arraytoarray conversion in one step.
See e.g. the cython example here.
Numpy and native code isn't quite as scary as it sounds.
 For example, cython is aware of numpy
 write a small numpy C extension
 use an existing library via ctypes
 Consider Fortran, in simpler cases it's somewhat less hairy than
To avoid any intermediate storage (relevant when working on huge data),
you may to do the conversion during file load, probably bypassing numpy's loading completely.
(Note you'll have to consider endianness and such yourself)
Spherical/polar coordinates
Making masks
Warnings and errors
RuntimeWarning: invalid value encountered in less_equal
...or in another comparison function (less, greater, etc)
This is most likely about np.nan or nan.infinity.
Those values can be avoidable when they come from
 a div by zero, or log of zero
 conversion from python None
In other cases they are useful (e.g. it's a matrix or timeseries missing data points), but are doing statistics on them, then
 check whether there are nanaware functions (e.g. nanmedian, nanpercentile, nanmedian)
 failing that, make a (flat) array with just the nonNAN values like:
x[numpy.logical_not(numpy.isnan(x))]
or the shorthand
x = x[~numpy.isnan(x)]
ValueError: data type must provide an itemsize
Means the type you handed in can't be coerced to the given dtype (which my well be the default, often double).
For example, you handed in an array of integerasstrings,
or an array mixing types where just one fails.
ValueError: zerosize array to some function without identity
Really just means you're handing along an empty array.
There are a bunch of utility functions that assume they won't be handed an empty array, usually because it makes no sense.
In my case, I had an offbyone error in some slicing logic.
RuntimeWarning: numpy.ufunc size changed, may indicate binary incompatibility
These warnings come from something compiled against an older numpy than is installed, and signaling that the ABI might have changed.
It often comes from scipy.
Which is typically built against the lowest supported numpy version, in part to be ABIcompatible, so it's typically fine.
If your distro packaging doesn't fix this,
you can usually also fix this by one of:
 downgrading numpy
 getting pip to compile scipy against your installed numpy.
 filter out these warnings.
underflow encountered in exp
The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
Using a nontuple sequence for multidimensional indexing is deprecated
More fully:
FutureWarning: Using a nontuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
The actual warning comes from numpy.
This actually is about the behaviour of slicing, and specifically numpy ~1.15.0 deprecating (multidimensional) indexing/slicing with anything other than tuple arguments.
As https://github.com/numpy/numpy/pull/9686 mentions, this because there used to be distinct behaviours depending on how you handed the samevalued arguments in, which is was probably a confusing/bad idea; your code probably should be changed updated if only to check whether or not it relied on the old behaviour.
If triggered from scipy, it's an older version of scipy, and not an issue.
Since it has been fixed in scipy since, upgrading it gets rid of the warning.
elementwise == comparison failed and returning scalar instead; this will raise an error or perform elementwise comparison in the future
https://github.com/numpy/numpy/issues/9210
scipy
This article/section is a stub — probably a pile of halfsorted notes, is not wellchecked so may have incorrect bits. (Feel free to ignore, fix, or tell me) 
Scipy builds more interesting stuff on top of the generalpurpose numpy.
Scipy and the scikits may make it easier to quickly manipulate to and to visualize data (also via matplotlib). There are other packages you may like for visualisation, such as mayavi.
See also:
scikits (scipy toolkits)
Separately installable scipy toolsets, often because they are specific specializations that just happen to build on scipy/numpy, or cannot be part of scipy because of licensing details.
See also:
Includes:
 sound, DSP, and speech things:
 audiolab [7] (a.k.a. pyaudiolab, previously pyaudio), which wraps sndfile for various file format reading and writing
 samplerate (a.k.a. pysamplerate)
 Talkbox (signal/speech processing)
 (Unsorted):
 (Sci)pytoMatlab bridge (mlabwrap) [8]
 pymachine, machine learning (unfinished?) [9]
 Numerical optimization frameworks ([10] and [11])
 Approximate Nearest Neighbor Library (a wrapper for ANN)
 delaunay (...trangulation; see also voronoi)
 TimeSeries [12]
 RSFormats (reads common remotesensing data formats)
As all Wiki pages, this page is editable, this means that you can modify the contents of this page simply by using your webbrowser. Simply click on the "Edit this page" link at the bottom of the page. WikiFormatting will give you a detailed description of available Wiki formatting commands.
TracGuide is a good place to start.
http://scipy.org/scipy/scikits/wiki
Duplication between numpy and scipy
This article/section is a stub — probably a pile of halfsorted notes, is not wellchecked so may have incorrect bits. (Feel free to ignore, fix, or tell me) 
Since numpy already has some useful functionality, there is occasionally some duplication between numpy and scipy, e.g. in some linear algebra, some FFT.
In general, choose the scipy code. The stuff in numpy is often there for historical and backwardscompatibility reasons.
The scipy code will often be a little more fleshed out, and sometimes faster.
In a few cases it's just different implementations of the same idea.
See also:
Some specifics:
 Scipy FFT versus numpy FFT
 note that pyfftw.interfaces provides dropin functions for both
 numpy.fft and scipy.fftpack both use FFTPACK so should perform comparably
 in practice scipy can be cleverer, e.g. runs checks whether your data is real and uses rfft if so. Yes you can do that yourself, but lazier programmers may find scipy isn't "randomly" slower.
 scipy interleaves the result (verify)'s real and imaginary parts, though, which is slower to work with
 See also http://stackoverflow.com/questions/6363154/whatisthedifferencebetweennumpyfftandscipyfftpack
 It seems that packaged numpy and scipy are typically built against ATLAS
 If you want to build them against e.g. OpenBLAS, you will need some work.
 In theory it's little more than a pip install numpy/scipy after you have openblas, in practice it is more.
http://osdf.github.io/blog/numpyscipywithopenblasforubuntu1204.html http://stackoverflow.com/questions/7496547/doespythonscipyneedblas
On speed
See also
The array interface used by numpy and related:
Related in some way or other:
 http://wiki.python.org/moin/NumericAndScientific (some other scientificstyle)
 http://dirac.cnrsorleans.fr/plone/software/scientificpython/