Difference between revisions of "UUID, GUID notes"

From Helpful
Jump to: navigation, search
m (How)
m (What)
(47 intermediate revisions by the same user not shown)
Line 1: Line 1:
{{stub}}
 
  
A UUID is a standardized way of generating a universally unique identifier.
+
'''For UUIDs used in filesystems, see [[Fstab#Device]])'''
(There are a few variants -- see versions and variants).
+
  
You may have seen these in windows, mozilla, various RPC mechanisms, and other places.
 
  
 +
==What==
 +
 +
A UUID is a standardish way of generating and formatting a probably-universally-unique 128-bits number.
 +
 +
 +
There are a few variants, some more anonymous (e.g. adding hashing, or simply being random) than others (e.g. contain network MAC address and time).
 +
 +
 +
There are a few different ways of ensuring a value is unlikely to be generated twice.
 +
 +
 +
Version 1's idea is to use your network card's MAC address (itself unique), add the current time (preferably in high resolution), and perhaps a (pseudo)random number or incremental counter for good measure.
 +
This is pretty simple and cheap to generate, and almost necessarily will never collide with a value generated later (until AD 3603) or generated elsewhere (MAC reuse is rare).
 +
 +
 +
...but also makes it possible to extract information about its place and time of generation, which isn't ideal in some situations, so there are versions of UUID that are more anonymous.
 +
 +
In which cases you generally end up on version 4, random, which is basically just a 122-bit random integer,
 +
and you're just counting on collisions being extremely rare.
  
 
==Why==
 
==Why==
  
Say you want to use an identifier on something -- on a ''lot'' of things, and distributed among many computers.
+
Usually used for identifiers.
The most robust way to do that is to check with a central authority that knows about ''everything'' previously identified. And, probably, to ask it for a new identifier.
+
  
It may be inconvenient to have to use such a service, intractable to host one, and/or be a distributed system's primary bottleneck {{comment|(though there are ways of making it scale rather better than the simple-and-stupid implementation)}}.
 
  
 +
One of the use cases is when want to add identifiers to a ''lot'' of things, on varied computers.
  
You can use UUIDs when you want independently generated identifiers that have a high probability of being unique.
+
And to be able to generate them without checking with a central authority (which would also need to store ''everything'' it generated/saw before).
This can be both more convenient and more scalable, and is arguably the primary use case for UUIDs.
+
 +
It may be inconvenient to have such a service, intractable to host one, and/or be a distributed system's primary bottleneck {{comment|(though there are ways of making it scale better than the simple-and-stupid implementation)}}.
  
UUIDs refer to a mostly-standardized way of generating such identifiers -- a few different methods, some more anonymous (adding hashing, or simply being random) than others.
 
  
 +
If instead you generate number sindependently, in a space large enough to have a ''very very'' high probability of being unique, you eliminate the need for centralization, at negligible risk of ever having colliding IDs {{comment|(consider that 2<sup>122</sup> is 5316911983139663491615228241121378304)}}.
  
  
There are a few different ways of ensuring a value is unlikely to be generated twice.
+
{{comment|(And yes, generating 128-bit numbers yourself is functionally the same - though not directly a valid UUID because of a few special bits)}}
  
One idea is to use your network card's MAC address (itself unique), add the current time (preferably in high resolution), and perhaps a (pseudo)random number or incremental counter for good measure. This is pretty simple and cheap to generate.
 
  
...but also makes it possible to extract information about its place and time of generation, which isn't ideal in some situations, so there are versions of UUID that are more anonymous.
+
 
 +
 
 +
=How=
 +
 
 +
Use a UUID library. Recent languages may have it in your standard libary.
 +
 
 +
It's easier, and more likely to stick to the standard than 5 minutes of DIY.
 +
 
 +
 
 +
For example:
 +
 
 +
In '''Python''': Use [http://docs.python.org/library/uuid.html the <tt>uuid</tt> module] {{comment|(in the standard library since 2.5)}}, for example {{inlinecode|<nowiki>myuuid = uuid.uuid4()</nowiki>}}
 +
: ...which is still an object. You can use <tt>str(myuuid)</tt> for the hexadecimal string with dashes. <!--
 +
(there are some other ways of  
 +
: .hex (hexadecimal string, no dashes)
 +
: .int128 (128-bit integer)
 +
: .bytes (fields internally big-endian)
 +
: .bytes_le (fields little-endian)
 +
-->
 +
 
 +
 
 +
In '''Java''', use [http://docs.oracle.com/javase/1.5.0/docs/api/java/util/UUID.html java.util.UUID], for example {{inlinecode|<nowiki>UUID random_uuid = UUID.randomUUID();</nowiki>}}, quite possibly followed by <tt>toString()</tt>.
 +
 
 +
 
 +
In '''.NET''', use [http://msdn.microsoft.com/en-us/library/system.guid.aspx System.Guid], for example: {{inlinecode|<nowiki>random_uuid = System.Guid.NewGuid()</nowiki>}}, quite possibly followed by a <tt>.ToString()</tt>
 +
 
 +
 
 +
 
 +
When you have or want to do it yourself, the simplest (and probably most common) to generate are random UUIDs, i.e. version 4.
 +
You can generate a large random number, then twiddle the bits to have it be an RFC-compliant version 4 UUID.
 +
 
 +
For example, '''JavaScript''' has no standard-library function. Perhaps the simplest implementation to generate a random UUID is [http://ajaxian.com/archives/uuid-generator-in-javascript something like this] [https://jsfiddle.net/briguy37/2MVFd/ or this], but there is a potential problem in [http://web.archive.org/web/20101106000458/http://baagoe.com/en/RandomMusings/javascript/ Math.random() not giving randomness guarantees], which is why there are better libraries (with their own [[PRNG]]) which you might as well use.
 +
They include:
 +
* https://github.com/broofa/node-uuid
 +
* http://frugalcoder.us/post/2012/01/13/javascript-guid-uuid-generator.aspx
  
 
=Layout=
 
=Layout=
  
A UUID is a 128-bit number, conventionally grouped and dashed as 32-16-16-16-48 bits {{comment|(8-4-4-4-12 hex characters, 4-2-2-2-6 octets)}}, which reflects the way it is parsed.
+
A UUID is a 128-bit number, conventionally grouped and dashed as 32-16-16-16-48 bits {{comment|(8-4-4-4-12 hex characters<!-- 4-2-2-2-6 octets-->)}}, which reflects the way it is parsed.
 +
 
 +
The bit layout (from RFC 4122 ): <small>
 +
0                  1                  2                  3
 +
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
 +
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 +
|                          time_low                            |
 +
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 +
|      time_mid                |        time_hi_and_version  |
 +
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 +
|clk_seq_hi_res |  clk_seq_low  |        node (0-1)            |
 +
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 +
|                        node (2-5)                            |
 +
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 +
</small>
  
The bit layout (from RFC 4122 ):
 
    0                  1                  2                  3
 
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
 
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 
  |                          time_low                            |
 
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 
  |      time_mid                |        time_hi_and_version  |
 
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 
  |clk_seq_hi_res |  clk_seq_low  |        node (0-1)            |
 
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 
  |                        node (2-5)                            |
 
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 
  
 +
Note that fields vary purpose between versions.
  
Note that various fields are multi-purpose.
 
 
For example, the 60-bit time value can also be used to store a pseudorandom number, while the node field can be a MAC or a hash of something. {{verify}}
 
For example, the 60-bit time value can also be used to store a pseudorandom number, while the node field can be a MAC or a hash of something. {{verify}}
  
Line 65: Line 116:
 
-->
 
-->
  
 
+
==Versions and variants, a few layout details==
 
+
=Versions and variants, a few layout details=
+
 
{{stub}}
 
{{stub}}
  
  
 
The '''version''' specifies the means of generation, and is stored in the '''most significant four bits of time_hi_and_version''' (octet 6).  
 
The '''version''' specifies the means of generation, and is stored in the '''most significant four bits of time_hi_and_version''' (octet 6).  
 +
 +
: {{comment|...which also means that, in hex representation, you can see the version right after the second dash. For example:}}
 +
:: {{comment|<tt>6e8bc430-9c3a-'''1'''1d9-9669-0800200c9a66</tt> is a version 1 UUID, }}
 +
:: {{comment|<tt>550e8400-e29b-'''4'''1d4-a716-446655440000</tt> is a version 4.}}
 +
  
 
It seems there are currently:
 
It seems there are currently:
Line 80: Line 134:
 
* 0101: Version 5: Name-based (SHA1 hash)
 
* 0101: Version 5: Name-based (SHA1 hash)
  
Versions 1, 3, 4, and 5 are defined in RFC 4122. If the UUID variant is not RFC4122 based, the version value need not necessarily comply to the above.
+
Versions 1, 3, 4, and 5 are defined in RFC 4122.
 +
If the UUID variant is not RFC4122 based, the version value need not necessarily comply to the above.
  
  
Line 94: Line 149:
  
  
 +
<!--
 +
Basically, the 120 or 121 bits (not all implementations use the last bit in the nibble the variant is in{{verify}}) not used in the version and variant are availalble for identifier data, so amount of distinct values space is is 2<sup>120</sup> (~1.3E36) or 2<sup>121</sup> (~2.6E36).
 +
 +
Note that some versions will use all of it equally (e.g. version 4, random), others less so (e.g. version 1, due to MAC being essentially constant)
 +
-->
  
==How==
 
 
Use a UUID library.
 
In most cases it's easier than doing it yourself and sticking to the standard.
 
 
The examples below generate random UUIDs, i.e. version 4, either explicitly or because that's usually the default.
 
  
  
In '''Python''': Use [http://docs.python.org/library/uuid.html the <tt>uuid</tt> module] {{comment|(in the standard library since 2.5)}}, for example {{inlinecode|<nowiki>random_uuid = uuid.uuid4()</nowiki>}} ...which is Still an object, you can str(), or use one of the get_*() functions.
 
 
 
In '''Java''', use [http://docs.oracle.com/javase/1.5.0/docs/api/java/util/UUID.html java.util.UUID], for example {{inlinecode|<nowiki>UUID random_uuid = UUID.randomUUID();</nowiki>}}, quite possibly followed by <tt>toString()</tt>.
 
 
 
In '''.NET''', use [http://msdn.microsoft.com/en-us/library/system.guid.aspx System.Guid], for example:
 
{{inlinecode|<nowiki>random_uuid = System.Guid.NewGuid()</nowiki>}}, again, possibly followed by a <tt>.ToString()</tt>
 
 
 
In '''JavaScript''': There is no standard-library function. You can generate a large random number and, assuming you care about RFC 4122 compliance, twiddle the bits to have it be a valid version 4 UUID - [http://ajaxian.com/archives/uuid-generator-in-javascript something like this].
 
There is a real problem in [http://web.archive.org/web/20101106000458/http://baagoe.com/en/RandomMusings/javascript/ Math.random() not giving randomness guarantees], which is why there are better libraries which you might as well use. They include:
 
* https://github.com/broofa/node-uuid
 
* http://frugalcoder.us/post/2012/01/13/javascript-guid-uuid-generator.aspx
 
  
  
Line 122: Line 162:
 
=UUIDs within a few namespaces=
 
=UUIDs within a few namespaces=
 
Using UUIDs in/as [[URN]]s is possible - and sometimes a preferred way of displaying one.
 
Using UUIDs in/as [[URN]]s is possible - and sometimes a preferred way of displaying one.
 +
 
It looks like:
 
It looks like:
 
  urn:uuid:6e8bc430-9c3a-11d9-9669-0800200c9a66
 
  urn:uuid:6e8bc430-9c3a-11d9-9669-0800200c9a66
 +
-->
  
 +
=See also=
 +
* RFC 4122 (''A Universally Unique IDentifier (UUID) URN Namespace'')
  
I forget why I added all these links:
 
  
See also:
+
And perhaps:
* '''RFC 4122''' (''A Universally Unique IDentifier (UUID) URN Namespace'')
+
* ISO/IEC 9834-8:2005, ''Information technology — Procedures for the operation of object identifier registration authorities — Part 8: Generation of universally unique identifiers (UUIDs) and their use in object identifiers''
 
+
* [http://www.itu.int/ITU-T/studygroups/com17/oid.html '''ITU-T Rec. X.667'''] (''Procedures for the operation of OSI Registration Authorities: Generation and registration of Universally Unique Identifiers (UUIDs) and their use as ASN.1 Object Identifier components'')
+
  
 
* [http://www.iso.org/iso/catalogue_detail.htm?csnumber=2229 '''ISO/IEC 11578:1996'''] (''Information technology -- Open Systems Interconnection -- Remote Procedure Call (RPC)'')
 
* [http://www.iso.org/iso/catalogue_detail.htm?csnumber=2229 '''ISO/IEC 11578:1996'''] (''Information technology -- Open Systems Interconnection -- Remote Procedure Call (RPC)'')
-->
 
  
=See also=
+
* [http://www.itu.int/ITU-T/studygroups/com17/oid.html '''ITU-T Rec. X.667'''] (''Procedures for the operation of OSI Registration Authorities: Generation and registration of Universally Unique Identifiers (UUIDs) and their use as ASN.1 Object Identifier components'')
* RFC 4122 (''A Universally Unique IDentifier (UUID) URN Namespace'')
+

Revision as of 17:27, 1 December 2021

For UUIDs used in filesystems, see Fstab#Device)


What

A UUID is a standardish way of generating and formatting a probably-universally-unique 128-bits number.


There are a few variants, some more anonymous (e.g. adding hashing, or simply being random) than others (e.g. contain network MAC address and time).


There are a few different ways of ensuring a value is unlikely to be generated twice.


Version 1's idea is to use your network card's MAC address (itself unique), add the current time (preferably in high resolution), and perhaps a (pseudo)random number or incremental counter for good measure. This is pretty simple and cheap to generate, and almost necessarily will never collide with a value generated later (until AD 3603) or generated elsewhere (MAC reuse is rare).


...but also makes it possible to extract information about its place and time of generation, which isn't ideal in some situations, so there are versions of UUID that are more anonymous.

In which cases you generally end up on version 4, random, which is basically just a 122-bit random integer, and you're just counting on collisions being extremely rare.

Why

Usually used for identifiers.


One of the use cases is when want to add identifiers to a lot of things, on varied computers.

And to be able to generate them without checking with a central authority (which would also need to store everything it generated/saw before).

It may be inconvenient to have such a service, intractable to host one, and/or be a distributed system's primary bottleneck (though there are ways of making it scale better than the simple-and-stupid implementation).


If instead you generate number sindependently, in a space large enough to have a very very high probability of being unique, you eliminate the need for centralization, at negligible risk of ever having colliding IDs (consider that 2122 is 5316911983139663491615228241121378304).


(And yes, generating 128-bit numbers yourself is functionally the same - though not directly a valid UUID because of a few special bits)



How

Use a UUID library. Recent languages may have it in your standard libary.

It's easier, and more likely to stick to the standard than 5 minutes of DIY.


For example:

In Python: Use the uuid module (in the standard library since 2.5), for example
myuuid = uuid.uuid4()
...which is still an object. You can use str(myuuid) for the hexadecimal string with dashes.


In Java, use java.util.UUID, for example
UUID random_uuid = UUID.randomUUID();
, quite possibly followed by toString().


In .NET, use System.Guid, for example:
random_uuid = System.Guid.NewGuid()
, quite possibly followed by a .ToString()


When you have or want to do it yourself, the simplest (and probably most common) to generate are random UUIDs, i.e. version 4. You can generate a large random number, then twiddle the bits to have it be an RFC-compliant version 4 UUID.

For example, JavaScript has no standard-library function. Perhaps the simplest implementation to generate a random UUID is something like this or this, but there is a potential problem in Math.random() not giving randomness guarantees, which is why there are better libraries (with their own PRNG) which you might as well use. They include:

Layout

A UUID is a 128-bit number, conventionally grouped and dashed as 32-16-16-16-48 bits (8-4-4-4-12 hex characters), which reflects the way it is parsed.

The bit layout (from RFC 4122 ):

0                   1                   2                   3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                          time_low                             |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|       time_mid                |         time_hi_and_version   |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|clk_seq_hi_res |  clk_seq_low  |         node (0-1)            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                         node (2-5)                            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


Note that fields vary purpose between versions.

For example, the 60-bit time value can also be used to store a pseudorandom number, while the node field can be a MAC or a hash of something. (verify)


Versions and variants, a few layout details

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)


The version specifies the means of generation, and is stored in the most significant four bits of time_hi_and_version (octet 6).

...which also means that, in hex representation, you can see the version right after the second dash. For example:
6e8bc430-9c3a-11d9-9669-0800200c9a66 is a version 1 UUID,
550e8400-e29b-41d4-a716-446655440000 is a version 4.


It seems there are currently:

  • 0001: Version 1: Based on time + node (MAC address of any of the cards present)
  • 0010: Version 2: DCE Security version (with embedded POSIX UIDs)
  • 0011: Version 3: Name-based (MD5 hash)
  • 0100: version 4: random
  • 0101: Version 5: Name-based (SHA1 hash)

Versions 1, 3, 4, and 5 are defined in RFC 4122. If the UUID variant is not RFC4122 based, the version value need not necessarily comply to the above.


The variant is specified by the most significant three bits in clk_seq_hi_res (octet 8), and controls the meaning/layout of the rest of the number. Apparently:

  • 001 ?
  • 000 NCS (reserved for backward compatibility)
  • 010 Current variant
  • 011 ?
  • 100 ?
  • 101 ?
  • 110 Microsoft (reserved for backward compatibility)
  • 111 Reserved for future use





See also

  • RFC 4122 (A Universally Unique IDentifier (UUID) URN Namespace)


And perhaps:

  • ISO/IEC 9834-8:2005, Information technology — Procedures for the operation of object identifier registration authorities — Part 8: Generation of universally unique identifiers (UUIDs) and their use in object identifiers
  • ISO/IEC 11578:1996 (Information technology -- Open Systems Interconnection -- Remote Procedure Call (RPC))
  • ITU-T Rec. X.667 (Procedures for the operation of OSI Registration Authorities: Generation and registration of Universally Unique Identifiers (UUIDs) and their use as ASN.1 Object Identifier components)