Endianness

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

Common meaning

Endianness usually refers to the in-memory order of the parts in a larger structure.

In code we typically don't have to care (certainly not for built-in types). When all you do is copy values around, then the language specs and/or compiler will implicitly do so consistently.

But when you store or communicate data (file, network, RPC, whatnot), it is often seen as an ordered stream of bytes, and communicating them between machines potentially mixes endianness, and would end up interpreting in a different way.

So to be interoperable, you should pick one, and use it for all translations to and from bare bytes.

Byte order is the order of bytes in a multi-byte structure. This usually means we are working on an architecture where the smallest addressable unit is a byte, but we also often deal with large ones (e.g. 16, 32, 64 bits)

Bit order refers to order of bits, within bytes or larger sequences. This turns up around any sort of serial communication.

There are others, like word order, but you'll rarely run into them.

Endianness usually refer to byte order, because that is the most common variant programmers will ru n into. When it refers to bit order, or even word order, but this is usually explicitly mentioned, or understood from context.

Little-endian (LSB) means we start with the least significant part in the lowest address.

Big-endian (MSB) means we start with the most significant part.

For example, 16-bit integer 0x1234 would be stored in bytes as 0x12 0x34 (LSB) or 0x34 0x12 (MSB).

Again, standards mean this is usually handled for you and you don't have to care until you're coding for low level hardware.

In unusual architectures, endianness can have different meanings yet. For example, if a machine's smallest memory units are 16-bit (instead of the much more usual 8-bit octets/bytes), the endianness concept is the same, but the units that can appear in different orders are said 16-bit words.

(You can even argue about the order of bytes/bits in those 16-bit units -- however, this is usually moot as in most cases, you will only ever see values through platform operations that give you 16-bit values.)

Little-endian, Big-endian, LSB, MSB

Little-endian: lowest value first, or leftmost/lowest in memory, increasing numeric significance with increasing memory addresses (or, in networking, time)

Little-endian architectures store the least significant part first (in the lowest memory location)
They include the x86 line of processors (x86, AMD64 a.k.a. x86-64)

In byte architectures, little-endian is also known as LSB, referring to the Least Significant Byte coming first.

Examples: consider a 32-bit integer

12345 would, shown in hexidecimal, be 0x39 0x30 0x00 0x00
287454020 would be 0x44 0x33 0x22 0x11

Big-endian: highest value first, decreasing significance

Big-endian architectures store the most significant part first (in the lowest memory location).
Includes the Motorola 68000 line of processors (e.g. pre-Intel Macintosh), PowerPC G5 a.k.a. PowerPC 970

In byte architectures, big-endian is also known as MSB, referring to the Most Significant Byte coming first.

Examples: consider a 32-bit integer

12345 would, shown in hexidecimal, be 0x00 0x00 0x30 0x39
287454020 would be 0x11 0x22 0x33 0x44

Network byte order is a term used by various RFCs, and refers to big-endian (with a few exceptions?[1])

Regardless of endianness, the memory address of a multi-byte variable is the lowest-in-memory byte of a multi-byte variable. This can be helpful mental aid (because without reference, big and little are sort of arbitrary terms), in that MSB start big, LSB starts small.

Less usual cases

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

Some architecture designs (e.g. ARM and Intel Itanium) are bi-endian -- allow handling of both endiannesses.

In some cases, this is handled by transparent hardware conversion, meaning that the non-native order is a smidgen slower than the native order as the (very simple, but still present) conversion has to happen.

In rarer cases, the hardware will have real, equal support for both, to avoid that minor speed hit.

A separate issue comes in with hardware. For example, when Apple added PCI slots to its computers, most graphics cards would not actually work in them because Apple basically ignored the endianness of the PCI cards's implementations -- meaning the only cards that would work were those that were implemented to support Macs. (I'm not sure whether this was strategic or stupid)

Mixed endianness is not a strictly defined or agreed-on term.

When used, it usually describes architectures which deal with differently-sized units in memory addressing. For example, storing a 32-bit int 0x11223344 in a 16-bit-word architecture could lead to 0x11 0x22 0x33 0x44 -- or 0x33 0x44 0x11 0x22 (verify), depending on architecture details.

Lack of endianness can also be said to exist.

That is, while endianness is usually a platform thing, 8-bit platforms don't have endianness when they have no registers larger than a byte, which also typically means they have few or no opcodes handling data larger than a byte.

For example, AVR microcontrollers deal almost entirely with bytes. On this platform, larger variables come from the compiler adding the extra work, and means it's the compiler's choice how to lay things out in memory. When using avr-gcc you get little-endian variables.

(Actually, AVRs may technically be called mixed-endian, in that there are a few low-level things happening in big-endian, and a few register-related things that are fairly little-endian. But neither of these is likely to affect your programming much unless you work at assembly level)

Endianness

Contents

Common meaning

Little-endian, Big-endian, LSB, MSB

Less usual cases

See also

Navigation menu