Linear algebra library notes
BLAS (Basic Linear Algebra Subprograms)
- a specification for numerical linear algebra - matrix and vector stuff
- abstract, so allows optimization on specific platforms
- ...and BLAS-using code doesn't have to care which implementation ends up doing the calculations
- (origins lie in a Fortran library)
There are quite a few implementations. Some of the better known ones:
- Netlib reference implementation, not optimized for speed
- AT for 'Automatically Tuned' - basically compiles many variants and finds which is fastest for the host it's compiling on
- Doesn't make as much sense as a binary package
- makes it portable - a works-everywhere, reasonably-fast-everywhere implementation.
- apparently has improved recently, now closer to OpenBLAS/MKL
- specifically tuned for a set of modern processors
- (note: also covers the most common LAPACK calls(verify))
- also quite portable -- sometimes easier to deal with than ATLAS
- Specific to ~2002-2008 processors. Very good at the time, since merged into OpenBLAS?
- MKL, Intel Math Kernel Library
- (covers BLAS, LAPACK, and some other things that is sometimes very convenient to have)
- known to be quite fast for the Intel processors it is tuned for (best of this list on some operations)
- ACML, AMD Core Math Library
- Comparable to MKL, but for AMD
- and free
- (apparently does not scale up to multicore as well as MKL?)
- Apple Accelerate Framework
- includes BLAS, LAPACK, and various other things
- easy choice when programming only for apple, because it's already installed (verify)
Functionally: In comparison with BLAS, LAPACK solves some more complex, higher-level problems.
Also, LAPACK is a specific impementation, not an abstract spec.
It contains some rather clever algorithms (in some cases the only open-source implementation of said algorithm).
In other words, for some applications you are happy with just BLAS, in some cases you want to add LAPACK (or similar).
Pragmatically: a slightly-slower and slightly-more-portable alternative to FFTW or others
As far as I can tell FFTPACK does not use AVX, which means that in some conditions (mostly larger transforms), FFTW (≥3.3), MKL, and such can be more than a little faster.