Difference between revisions of "UUID, GUID notes"

From Helpful
Jump to: navigation, search
m (URNs)
m (See also)
Line 145: Line 145:
 
=See also=
 
=See also=
 
* RFC 4122 (''A Universally Unique IDentifier (UUID) URN Namespace'')
 
* RFC 4122 (''A Universally Unique IDentifier (UUID) URN Namespace'')
 
* [http://crypto.stackexchange.com/questions/3495/cryptographic-guid "Cryptographic GUID?"]
 

Revision as of 19:33, 30 August 2012

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

A UUID is a standardized way of generating a universally unique identifier. (There are a few variants -- see versions and variants).

You may have seen these in windows, mozilla, various RPC mechanisms, and other places.


Why

You use UUIDs when it's useful to put an identifier on something, don't have the ability for a thorough check, and still want a high probability of avoiding collisions.


The most robust way of unique identifiers is checking with a central authority that knows about everything previously identified.

It may be inconvenient to have to use such a service, intractable to host one, or it could be a distributed system's primary bottleneck (though there are ways of making it scale rather better than the simple-and-stupid case).


In many situations it can be more convenient (and scalable) to have independent generation of a large (enough) and randomish (enough) number in a way that makes it extremely unlikely that the same identifier will ever be created again. This is the major case that UUID is for.


There are a few different ways of ensuring a value is unlikely to be generated twice.

One idea is to use your network card's MAC address (itself unique), add the current time (preferably in high resolution), and perhaps a (pseudo)random number or incremental counter for good measure.

This is pretty simple to generate, but also makes it easy to extract information about its place and time of generation, which isn't ideal in some situations.


UUIDs refer to a mostly-standardized way of generating such identifiers -- a few different methods, some more anonymous (adding hashing) than others.


Layout

A UUID is a 128-bit number, conventionally grouped and dashed as 32-16-16-16-48 bits (8-4-4-4-12 hex characters, 4-2-2-2-6 octets), which reflects the way it is parsed.

The bit layout (from RFC 4122 ):

   0                   1                   2                   3
   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  |                          time_low                             |
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  |       time_mid                |         time_hi_and_version   |
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  |clk_seq_hi_res |  clk_seq_low  |         node (0-1)            |
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  |                         node (2-5)                            |
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


Note that various fields are multi-purpose. For example, the 60-bit time value can also be used to store a pseudorandom number, while the node field can be a MAC or a hash of something. (verify)



Versions and variants, a few layout details

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)


The version specifies the means of generation, and is stored in the most significant four bits of time_hi_and_version (octet 6).

It seems there are currently:

  • 0001: Version 1: Based on time + node (MAC address of any of the cards present)
  • 0010: Version 2: DCE Security version (with embedded POSIX UIDs)
  • 0011: Version 3: Name-based (MD5 hash)
  • 0100: version 4: random
  • 0101: Version 5: Name-based (SHA1 hash)

Versions 1, 3, 4, and 5 are defined in RFC 4122. If the UUID variant is not RFC4122 based, the version value need not necessarily comply to the above.


The variant is specified by the most significant three bits in clk_seq_hi_res (octet 8), and controls the meaning/layout of the rest of the number. Apparently:

  • 001 ?
  • 000 NCS (reserved for backward compatibility)
  • 010 Current variant
  • 011 ?
  • 100 ?
  • 101 ?
  • 110 Microsoft (reserved for backward compatibility)
  • 111 Reserved for future use


How

Use a UUID library. In most cases it's easier than doing it yourself and sticking to the standard.

The examples below generate random UUIDs, i.e. version 4, either explicitly or because that's usually the default.


In Python: Use the uuid module (in the standard library since 2.5)

random_uuid = uuid.uuid4()  # Still an object, you can str(), or use one of the get_*() functions

In Java, use java.util.UUID. For example:

UUID random_uuid = UUID.randomUUID(); // (quite possibly followed by toString())

In .NET, use System.Guid. To create one, as a string:

string random_uuid = System.Guid.NewGuid().ToString();

In JavaScript: There is no standard-library function. You can generate a large random number and, assuming you care about RFC 4122 compliance, twiddle the bits to have it be a valid version 4 UUID - something like this. There is a real problem in Math.random() not giving randomness guarantees, which is why there are better libraries which you might as well use. They include:


See also

  • RFC 4122 (A Universally Unique IDentifier (UUID) URN Namespace)