UUID, GUID notes: Difference between revisions

From Helpful
Jump to navigation Jump to search
mNo edit summary
 
(2 intermediate revisions by the same user not shown)
Line 1: Line 1:
{{#addbodyclass:tag_tech}}




==What==
==What==


A UUID is a standardish way of generating and formatting a ''probably-''universally-unique 128-bits number (a few bits are used as ).
A UUID is a standardish way of generating and formatting a ''probably-''universally-unique 128-bits number (practically a few bits less).






There are a different, all relatively basic ways of ensuring a value is unlikely to be generated twice.
There are a few different ways of generating the value,  
: some of them intentionally less anonymous (e.g. contains time, and network card identifier),
: others more opaque (e.g. [[hashing]] something volatile, or simply being a random number).


Some are less anonymous (e.g. contain network [[MAC address]] and time) than others (e.g. adding hashing, or simply being a random number).


For example, '''version 1''' is the idea to use your network card's MAC address (itself unique), add the current time (preferably in high resolution), and if you can, a (pseudo)random number or counter for good measure.


For example, '''version 1''' is the idea to use your network card's MAC address (itself unique), add the current time (preferably in high resolution), and if you can, a (pseudo)random number or counter for good measure.
This is cheap to generate and (except for the counter) simple and stateless, and these two ingredients should mean it will almost necessarily
This is cheap to generate and (except for the counter) simple and stateless, and almost necessarily will
: never collide with a value generated at a different time (until AD 3603)
: never collide with a value generated later (until AD 3603)  
: never collide with values generated elsewhere (MAC reuse is rare except except in some counterfeited, and some DIY).
: never collide with values generated elsewhere (MAC reuse is rare except except in some counterfeited, and some DIY).


On the flipside, the ability to extract information about time of generation, and perhaps match it to a computer and place, really isn't ideal in some situations.


...but also makes it possible to extract information about its place and time of generation, which isn't ideal in some situations, so there are versions of UUID that are more anonymous.


In which cases you generally end up on '''version 4''', random, which is basically just a 122-bit random integer,
In which cases you generally end up on '''version 4''', random, which is basically just a 122-bit random integer,
Line 25: Line 27:




==Why==
==Why use UUIDs==


Usually used for identifiers.
Usually used for identifiers.


Particularly when
* you want a fairly large amount of them,
* you want to be  be able to generate them without checking with a central authority
:: and note that central authority would probably need to store ''everything'' it generated/saw before
:: it may be inconvenient or intractible to host such a service and/or
:: it may become a larger distributed system's primary bottleneck {{comment|(though there are certainly ways of making it scale better than a one-big-database implementation)}}.


One of the use cases is when want identifiers for a ''lot'' of things, on varied computers.


And to be able to generate them without checking with a central authority (and note that central authority would probably need to store ''everything'' it generated/saw before).
Instead, you can generate numbers independently, in a very large space to have a ''very very'' high probability of being unique.
If you minimize the risk of ever having colliding IDs {{comment|(consider that 2<sup>122</sup> is 5316911983139663491615228241121378304)}}.
then you can perfectly get away with removing that centralization.




It may be inconvenient to have such a service, intractable to host one, and/or be a distributed system's primary bottleneck {{comment|(though there are certainly ways of making it scale better than a one-big-database implementation)}}.
{{comment|(Yes, generating 128-bit numbers yourself is functionally almost the same (as v4) - the largest difference being that it probably won't be a valid UUID (v4 or otherwise) because of a few special bits)}}




If instead you generate numbers independently, in a space large enough to have a ''very very'' high probability of being unique, you eliminate the need for centralization, at negligible risk of ever having colliding IDs {{comment|(consider that 2<sup>122</sup> is 5316911983139663491615228241121378304)}}.


=How=


{{comment|(And yes, generating 128-bit numbers yourself is functionally the same - though not directly a valid UUID because of a few special bits)}}
Use a UUID library.


It's easier,
more likely to stick to the standard than 5 minutes of DIY,
and recent languages may well have one in their standard libary anyway.




 
When you ''do'' have or want to do it yourself, the simplest (and probably most common) to generate are random UUIDs, i.e. version 4.
=How=
You can generate a large random number, then twiddle the bits to have it be an RFC-compliant version 4 UUID.
 
(see the javascript mention below)
Use a UUID library. Recent languages may have it in your standard libary.
 
It's easier, and more likely to stick to the standard than 5 minutes of DIY.




Line 71: Line 80:




When you have or want to do it yourself, the simplest (and probably most common) to generate are random UUIDs, i.e. version 4.
You can generate a large random number, then twiddle the bits to have it be an RFC-compliant version 4 UUID.


For example, '''JavaScript''' has no standard-library function. Perhaps the simplest implementation to generate a random UUID is [http://ajaxian.com/archives/uuid-generator-in-javascript something like this] [https://jsfiddle.net/briguy37/2MVFd/ or this], but there is a potential problem in [http://web.archive.org/web/20101106000458/http://baagoe.com/en/RandomMusings/javascript/ Math.random() not giving randomness guarantees], which is why there are better libraries (with their own [[PRNG]]) which you might as well use.
For example, '''JavaScript''' has no standard-library function. Perhaps the simplest implementation to generate a random UUID is [http://ajaxian.com/archives/uuid-generator-in-javascript something like this] [https://jsfiddle.net/briguy37/2MVFd/ or this], but there is a potential problem in [http://web.archive.org/web/20101106000458/http://baagoe.com/en/RandomMusings/javascript/ Math.random() not giving randomness guarantees], which is why there are better libraries (with their own [[PRNG]]) which you might as well use.
Line 79: Line 85:
* https://github.com/broofa/node-uuid
* https://github.com/broofa/node-uuid
* http://frugalcoder.us/post/2012/01/13/javascript-guid-uuid-generator.aspx
* http://frugalcoder.us/post/2012/01/13/javascript-guid-uuid-generator.aspx


=Layout=
=Layout=
Line 99: Line 107:




Note that fields vary purpose between versions.
Note that fields vary purpose between versions, so their names are slightly messy.


For example, the 60-bit time value can also be used to store a pseudorandom number, while the node field can be a MAC or a hash of something. {{verify}}
For example, the 60 bits of time value can also be used to store a pseudorandom number,
while the node field can be a MAC or a hash of something. {{verify}}


<!--
<!--
Line 118: Line 127:


-->
-->


==Versions and variants, a few layout details==
==Versions and variants, a few layout details==
Line 123: Line 133:




The '''version''' specifies the means of generation, and is stored in the '''most significant four bits of time_hi_and_version''' (octet 6).  
The '''version''' isn't about standard version, it's the type/variant of UUID.
 
This is stored in the '''most significant four bits of time_hi_and_version''' (octet 6).
 


: {{comment|...which also means that, in hex representation, you can see the version right after the second dash. For example:}}
: {{comment|...which also means that, in hex representation, you can see the version right after the second dash. For example:}}
:: {{comment|<tt>6e8bc430-9c3a-'''1'''1d9-9669-0800200c9a66</tt> is a version 1 UUID, }}
:: {{comment|<tt>6e8bc430-9c3a-'''1'''1d9-9669-0800200c9a66</tt> is a version 1 UUID,}}
:: {{comment|<tt>550e8400-e29b-'''4'''1d4-a716-446655440000</tt> is a version 4.}}
:: {{comment|<tt>550e8400-e29b-'''4'''1d4-a716-446655440000</tt> is a version 4.}}


Line 143: Line 156:




Versions 1, 3, 4, and 5 are defined in RFC 4122.
If the UUID variant is not RFC4122 based, the version value need not necessarily comply to the above;
If the UUID variant is not RFC4122 based, the version value need not necessarily comply to the above.
possibly because UUID itself was inspired by some things before it.
 
Versions 1, 3, 4, and 5 are defined in RFC 4122 (from 2005).
<!--
 
Versions 6 and 7 (and 8) are defined by RFC 9562 (from 2024).
-->




Line 163: Line 182:
Note that some versions will use all of it equally (e.g. version 4, random), others less so (e.g. version 1, due to MAC being essentially constant)
Note that some versions will use all of it equally (e.g. version 4, random), others less so (e.g. version 1, due to MAC being essentially constant)
-->
-->




Line 178: Line 195:
=Limitations=
=Limitations=
<!--
<!--
UUIDs are not a good primary key, in that


Does not guarantee uniqueness. Just the next best thing if you do not want a central registry, where collision is merely '''very very'' unlikely
Does not guarantee uniqueness. Just the next best thing if you do not want a central registry, where collision is merely '''very very'' unlikely


 
UUIDs are not a good primary key, in that rebalancing a balanced-tree index is slower on randomness
In some cases
 




-->
-->


=See also=
=See also=

Latest revision as of 01:01, 20 May 2024


What

A UUID is a standardish way of generating and formatting a probably-universally-unique 128-bits number (practically a few bits less).


There are a few different ways of generating the value,

some of them intentionally less anonymous (e.g. contains time, and network card identifier),
others more opaque (e.g. hashing something volatile, or simply being a random number).


For example, version 1 is the idea to use your network card's MAC address (itself unique), add the current time (preferably in high resolution), and if you can, a (pseudo)random number or counter for good measure.

This is cheap to generate and (except for the counter) simple and stateless, and these two ingredients should mean it will almost necessarily

never collide with a value generated at a different time (until AD 3603)
never collide with values generated elsewhere (MAC reuse is rare except except in some counterfeited, and some DIY).

On the flipside, the ability to extract information about time of generation, and perhaps match it to a computer and place, really isn't ideal in some situations.


In which cases you generally end up on version 4, random, which is basically just a 122-bit random integer, and you're just counting on collisions being extremely rare due to the very large space of values.


Why use UUIDs

Usually used for identifiers.

Particularly when

  • you want a fairly large amount of them,
  • you want to be be able to generate them without checking with a central authority
and note that central authority would probably need to store everything it generated/saw before
it may be inconvenient or intractible to host such a service and/or
it may become a larger distributed system's primary bottleneck (though there are certainly ways of making it scale better than a one-big-database implementation).


Instead, you can generate numbers independently, in a very large space to have a very very high probability of being unique. If you minimize the risk of ever having colliding IDs (consider that 2122 is 5316911983139663491615228241121378304). then you can perfectly get away with removing that centralization.


(Yes, generating 128-bit numbers yourself is functionally almost the same (as v4) - the largest difference being that it probably won't be a valid UUID (v4 or otherwise) because of a few special bits)


How

Use a UUID library.

It's easier, more likely to stick to the standard than 5 minutes of DIY, and recent languages may well have one in their standard libary anyway.


When you do have or want to do it yourself, the simplest (and probably most common) to generate are random UUIDs, i.e. version 4. You can generate a large random number, then twiddle the bits to have it be an RFC-compliant version 4 UUID. (see the javascript mention below)


For example:

In Python: Use the uuid module (in the standard library since 2.5), for example myuuid = uuid.uuid4()

...which is still an object. You can use str(myuuid) for the hexadecimal string with dashes.


In Java, use java.util.UUID, for example UUID random_uuid = UUID.randomUUID();, quite possibly followed by toString().


In .NET, use System.Guid, for example: random_uuid = System.Guid.NewGuid(), quite possibly followed by a .ToString()


For example, JavaScript has no standard-library function. Perhaps the simplest implementation to generate a random UUID is something like this or this, but there is a potential problem in Math.random() not giving randomness guarantees, which is why there are better libraries (with their own PRNG) which you might as well use. They include:


Layout

A UUID is a 128-bit number, conventionally grouped and dashed as 32-16-16-16-48 bits (8-4-4-4-12 hex characters), which reflects the way it is parsed.

The bit layout (from RFC 4122 ):

0                   1                   2                   3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                          time_low                             |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|       time_mid                |         time_hi_and_version   |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|clk_seq_hi_res |  clk_seq_low  |         node (0-1)            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                         node (2-5)                            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


Note that fields vary purpose between versions, so their names are slightly messy.

For example, the 60 bits of time value can also be used to store a pseudorandom number, while the node field can be a MAC or a hash of something. (verify)


Versions and variants, a few layout details

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.


The version isn't about standard version, it's the type/variant of UUID.

This is stored in the most significant four bits of time_hi_and_version (octet 6).


...which also means that, in hex representation, you can see the version right after the second dash. For example:
6e8bc430-9c3a-11d9-9669-0800200c9a66 is a version 1 UUID,
550e8400-e29b-41d4-a716-446655440000 is a version 4.


It seems there are currently:

  • 0001: Version 1: Based on time + node (MAC address of any of the cards present)
  • 0010: Version 2: DCE Security version (with embedded POSIX UIDs)
  • 0011: Version 3: Name-based (MD5 hash)
  • 0100: version 4: random
  • 0101: Version 5: Name-based (SHA1 hash)


If the UUID variant is not RFC4122 based, the version value need not necessarily comply to the above; possibly because UUID itself was inspired by some things before it.

Versions 1, 3, 4, and 5 are defined in RFC 4122 (from 2005).


The variant is specified by the most significant three bits in clk_seq_hi_res (octet 8), and controls the meaning/layout of the rest of the number. Apparently:

  • 001 ?
  • 000 NCS (reserved for backward compatibility)
  • 010 Current variant
  • 011 ?
  • 100 ?
  • 101 ?
  • 110 Microsoft (reserved for backward compatibility)
  • 111 Reserved for future use




Limitations

See also

  • RFC 4122 (A Universally Unique IDentifier (UUID) URN Namespace)


And perhaps:

  • ISO/IEC 9834-8:2005, Information technology — Procedures for the operation of object identifier registration authorities — Part 8: Generation of universally unique identifiers (UUIDs) and their use in object identifiers
  • ISO/IEC 11578:1996 (Information technology -- Open Systems Interconnection -- Remote Procedure Call (RPC))
  • ITU-T Rec. X.667 (Procedures for the operation of OSI Registration Authorities: Generation and registration of Universally Unique Identifiers (UUIDs) and their use as ASN.1 Object Identifier components)