Multi-dimensional array ordering

From Helpful
Jump to: navigation, search
This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

When you have to serialize matrix data onto linear storage, you can choose between:

  • Row-major means that elements logically adjacent within a row are adjacent in memory.
  • Column-major means that elements logically adjacent within a column are adjacent in memory.

You should only need to think about this when

saved data is effectively memory dumps, and can come from a source with different ordering from your code's / your current platform's


you work with both orderings from the same code, e.g. when mixing C and Fortran

Most of the time you don't really have to think about this.

Within a single language, its indexing convention does that for you
One program reading its own fata from disk often implies much the same

The potential issue comes when you communicate data to libraries/languages that may have an opinion different than your own language's.

Historically this was C (row-major) versus Fortran (column-major), which is why this is sometimes called C ordering and Fortran ordering.

Opinions differ even now for efficiency reasons, varying on the most likely operations on (multi-dimensional) arrays. The idea is that iterating over contiguous memory will be faster due to spatial locality feeding the cache with a sort of implied readahead (read up on how hardware caches work).

Most modern general-purpose languages use row-major, while things like MATLAB, Octave, and statistical packages might use column-major.

Some things support or at least consider both, e.g. numpy.

Notes that the way you access such arrays, e.g. m[column][row], does not always reflect the memory layout (though frequently it does)

Which makes detailed explanation a lot more confusing than the concept really is.