Endianness

From Helpful
Revision as of 19:35, 26 August 2015 by Helpful (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search
This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Common meaning

Endianness usually refers to the in-memory order of the parts in a larger structure.


Endianness usually refer to byte order. When endianness refers to other concepts, such as bit order or word order, this is usually explicitly mentioned or easily understood from context.


Byte order is the order of bytes in a multi-byte structure - how it is actually stored if you were to look at that memory byte by byte (which, at a low enough level, you almost always are).


The compiler takes care of this for primitive/built-in types in memory, letting us deal with numbers, masks, and such of larger sizes, without having to think about endianness at all.

Whenever you store or communicate data (file, network, RPC, whatnot) in an interoperable way, you'll have to choose one and stick to it.


Endianness sometimes refers to bit order - the storage/transfer order of bits within bytes or larger sequences. Consider serial communication, the order of data lines in parallel communication (e.g. PCI's data lines), and low-level bitwise calculations.

However, most of those are usually abstracted away at hardware, driver, or sometimes API level, so even where they matter you rarely have to deal with this as an everyday programmer.


In unusual architectures endianness can have different meanings yet. For example, if a machine has 16-bit memory units (instead of the much more usual 8-bit octets/bytes), the endianness concept is the same, but the units that can appear in different orders are said 16-bit words.

(You can even argue about the order of bytes/bits in those 16-bit units -- however, this is usually moot as in most cases, you will only ever see values through operations that give you 16-bit values.)


Little-endian, Big-endian, LSB, MSB

Little-endian: lowest value first, or leftmost/lowest in memory, increasing numeric significance with increasing memory addresses (or, in networking, time)

  • Little-endian architectures store the least significant part first (in the lowest memory location)
  • They include the x86 line of processors (x86, AMD64 a.k.a. x86-64)
  • In byte architectures, little-endian is also known as LSB, referring to the Least Significant Byte coming first.


Examples: consider a 32-bit integer

  • 12345 would, shown in hexidecimal, be 0x39 0x30 0x00 0x00
  • 287454020 would be 0x44 0x33 0x22 0x11


Big-endian: highest value first, decreasing significance

  • Big-endian architectures store the most significant part first (in the lowest memory location).
  • Includes the Motorola 68000 line of processors (e.g. pre-Intel Macintosh), PowerPC G5 a.k.a. PowerPC 970
  • In byte architectures, big-endian is also known as MSB, referring to the Most Significant Byte coming first.


Examples: consider a 32-bit integer

  • 12345 would, shown in hexidecimal, be 0x00 0x00 0x30 0x39
  • 287454020 would be 0x11 0x22 0x33 0x44


Network byte order is a term used by various RFCs, and refers to big-endian (with a few exceptions?[1])



Regardless of endianness, the memory address of a multi-byte variable is the lowest-in-memory byte of a multi-byte variable. This can be helpful mental aid (because without reference, big and little are sort of arbitrary terms), in that MSB start big, LSB starts small.

Less usual cases

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)


Some architecture designs (e.g. ARM and Intel Itanium) are bi-endian -- allow handling of both endiannesses.

In some cases, this is handled by transparent hardware conversion, meaning that the non-native order is a smidgen slower than the native order as the (very simple, but still present) conversion has to happen.

In rarer cases, the hardware will have real, equal support for both, to avoid that minor speed hit.


A separate issue comes in with hardware. For example, when Apple added PCI slots to its computers, most graphics cards would not actually work in them because Apple basically ignored the endianness of the PCI cards's implementations -- meaning the only cards that would work were those that were implemented to support Macs. (Whether this was strategic or stupid is an arguable point)


Mixed endianness can also be said to exist -- although this is not a strictly defined or agreed-on term.

It usually describes architectures which deal with non-byte-sized units in memory addressing. For example, storing a 32-bit int 0x11223344 in a 16-bit-word architecture could lead to 0x11 0x22 0x33 0x44 -- or 0x33 0x44 0x11 0x22 (verify), depending on architecture details.


Lack of endianness also exists.

That is, while endianness is usually a platform thing, 8-bit platforms don't have endianness when they have no registers larger than a byte, which also typically means they have few or no opcodes handling data larger than a byte.

For example, AVR microcontrollers deal almost entirely with bytes. On this platform, larger variables come from the compiler adding the extra work, and means it's the compiler's choice how to lay things out in memory. When using avr-gcc you get little-endian variables.

(Actually, AVRs may technically be called mixed-endian, in that there are a few low-level things happening in big-endian, and a few register-related things that are fairly little-endian. But neither of these is likely to affect your programming much unless you work at assembly level)


See also