C. Appendix: GUIDs and UUIDs Abstract
This appendix describes the format of UUIDs (Universally Unique IDentifier), which are also known as GUIDs (Globally Unique IDentifier). A GUID is 128 bits long, and if generated according to the one of the mechanisms in this document, is either guaranteed to be different from all other UUIDs/GUIDs generated until 3400 A.D. or extremely likely to be different (depending on the mechanism chosen). GUIDs were originally used in the Network Computing System (NCS) [1] and later in the Open Software Foundation's (OSF) Distributed Computing Environment [2].

This specification is derived from the latter specification with the kind permission of the OSF.

>Introduction This specification defines the format of UUIDs (Universally Unique IDentifiers), also known as GUIDs (Globally Unique IDentifiers). A GUID is 128 bits long, and if generated according to the one of the mechanisms in this document, is either guaranteed to be different from all other UUIDs/GUIDs generated until 3400 A.D. or extremely likely to be different (depending on the mechanism chosen).

E="c2">Motivation One of the main reasons for using GUIDs is that no centralized authority is required to administer them (beyond the one that allocates IEEE 802.1 node identifiers). As a result, generation on demand can be completely automated, and they can be used for a wide variety of purposes. The GUID generation algorithm described here supports very high allocation rates: 10 million per second per machine if you need it, so that they could even be used as transaction IDs. GUIDs are fixed-size (128 bits), which is reasonably small relative to other alternatives. This fixed, relatively small size lends itself well to sorting, ordering, hashing of all sorts, storing in databases, simple allocation, and ease of programming in general.

Specification A GUID is an identifier that is unique across both space and time, with respect to the space of all GUIDs. To be precise, the GUID consists of a finite bit space. Thus the time value used for constructing a GUID is limited and will roll over in the future (at approximately A.D. 3400, based on the specified algorithm). A GUID can be used for multiple purposes, from tagging objects with an extremely short lifetime, to reliably identifying very persistent objects across a network.

The generation of GUIDs does not require that a registration authority be contacted for each identifier. Instead, it requires a unique value over space for each GUID generator. This spatially unique value is specified as an IEEE 802 address, which is usually already available to network-connected systems. This 48-bit address can be assigned based on an address block obtained through the IEEE registration authority. This section of the GUID specification assumes the availability of an IEEE 802 address to a system desiring to generate a GUID, but if one is not available, Section 4 specifies a way to generate a probabilistically unique one that can not conflict with any properly assigned IEEE 802 address.

">C.1 Format< The following table gives the format of a GUID:

Field Data Type Octet # Note
time_low unsigned 32-bit integer 0-3 The low field of the timestamp.
time_mid unsigned 16-bit integer 4-5 The middle field of the timestamp.
time_hi_and_version unsigned 16-bit integer 6-7 The high field of the timestamp multiplexed with the version number.
Clock_seq_hi_and_reserved unsigned 8-bit integer 8 The high field of the clock sequence multiplexed with the version variant.
Clock_seq_low unsigned 8-bit integer 9 The low field of the clock sequence.
node character 10-15 The spatially unique node identifier.
The GUID consists of a record of 16 octets and must not contain padding between fields. The total size is 128 bits.

To minimize confusion about bit assignments within octets, the GUID record definition is defined only in terms of fields that are integral numbers of octets. The version number is multiplexed with the timestamp (time_high), and the variant field is multiplexed with the clock sequence (clock_seq_high).

The timestamp is a 60-bit value. For GUID version 1, this is represented by Coordinated Universal Time (UTC) as a count of 100-nanosecond intervals since 00:00:00.00, 15 October 1582 (the date of Gregorian reform to the Christian calendar).

The version number is multiplexed in the 4 most significant bits of the time_hi_and_version field.

The following table lists currently defined versions of the GUID.

msb1 msb2 msb3 msb4 Version Description
0 0 0 1 1 DCE version, as specified herein.
0 0 1 0 2 DCE Security version, with embedded POSIX UIDs.

The variant field determines the layout of the GUID. The structure of DCE GUIDs is fixed across different versions. Other GUID variants may not interoperate with DCE GUIDs. Interoperability of GUIDs is defined as the applicability of operations such as string conversion, comparison, and lexical ordering across different systems. The variant field consists of a variable number of the MSBS of the clock_seq_hi_and_reserved field.

The following table lists the contents of the DCE variant field.

msb1 msb2 msb3 Description
0 - - Reserved, NCS backward compatibility.
1 0 - DCE variant
1 1 0 Reserved, Microsoft Corporation GUID
1 1 1 Reserved for future definition
The clock sequence is required to detect potential losses of monotonicity of the clock. Thus, this value marks discontinuities and prevents duplicates. An algorithm for generating this value is outlined in the "Clock Sequence" section below.

The clock sequence is encoded in the 6 least significant bits of the clock_seq_hi_and_reserved field and in the clock_seq_low field.

The node field consists of the IEEE address, which is usually the host address. For systems with multiple IEEE 802 nodes, any available node address can be used. The lowest addressed octet (octet number 10) contains the global/local bit and the unicast/multicast bit, and is the first octet of the address transmitted on an 802.3 LAN.

Depending on the network data representation, the multi-octet unsigned integer fields are subject to byte swapping when communicated between different endian machines.

The nil GUID is special form of GUID that is specified to have all 128 bits set to 0 (zero).

C.2 Algorithms for Creating a GUID
Various aspects of the algorithm for creating a GUID are discussed in the following sections. GUID generation requires a guarantee of uniqueness within the node ID for a given variant and version. Interoperability is provided by complying with the specified data structure. To prevent possible GUID collisions, which could be caused by different implementations on the same node, compliance with the algorithms specified here is required.

C.2.1 Clock Sequence
The clock sequence value must be changed whenever:

While a node is operational, the GUID service always saves the last UTC used to create a GUID. Each time a new GUID is created, the current UTC is compared to the saved value and if either the current value is less (the non-monotonic clock case) or the saved value was lost, then the clock sequence is incremented modulo 16,384, thus avoiding production of duplicate GUIDs.

The clock sequence must be initialized to a random number to minimize the correlation across systems. This provides maximum protection against node identifiers that may move or switch from system to system rapidly. The initial value MUST NOT be correlated to the node identifier.

The rule of initializing the clock sequence to a random value is waived if, and only if, all of the following are true:

In other words, the system constraints prevent duplicates caused by possible migration of the IEEE address, while the operational system itself can protect against non-monotonic clocks, except in the case of field service intervention. At manufacturing time, such a system may initialise the clock sequence to any convenient value.

C.2.2 System Reboot
There are two possibilities when rebooting a system:

If the state variables have been restored, the GUID generator just continues as normal. Alternatively, if the state variables cannot be restored, they are reinitialized, and the clock sequence is changed.

If the clock sequence is stored in non-volatile store, it is incremented; otherwise, it is reinitialized to a new random value.

C.2.3 Clock Adjustment
GUIDs may be created at a rate greater than the system clock resolution. Therefore, the system must also maintain an adjustment value to be added to the lower-order bits of the time. Logically, each time the system clock ticks, the adjustment value is cleared. Every time a GUID is generated, the current adjustment value is read and incremented atomically, and then added to the UTC time field of the GUID.

C.2.4 Clock Overrun
The 100-nanosecond granularity of time should prove sufficient even for bursts of GUID creation in the next generation of high-performance multiprocessors. If a system overruns the clock adjustment by requesting too many GUIDs within a single system clock tick, the GUID service may raise an exception, handled in a system or process-dependent manner either by:

If the processors overrun the GUID generation frequently, additional node identifiers and clocks may need to be added.

C.2.5 GUID Generation
GUIDs are generated according to the following algorithm:

C.3 String Representation of GUIDs
For use in human-readable text, a GUID string representation is specified as a sequence of fields, some of which are separated by single dashes.

Each field is treated as an integer and has its value printed as a zero-filled hexadecimal digit string with the most significant digit first. The hexadecimal values a to f inclusive are output as lowercase characters, and are case-insensitive on input. The sequence is the same as the GUID constructed type.

The formal definition of the GUID string representation is provided by the following extended BNF:

GUID = <time_low> <hyphen> <time_mid> <hyphen>
<time_high_and_version> <hyphen>
<clock_seq_and_reserved>
<clock_seq_low> <hyphen> <node>
time_low = <hexOctet> <hexOctet> <hexOctet> <hexOctet>
time_mid = <hexOctet> <hexOctet>
time_high_and_version = <hexOctet> <hexOctet>
clock_seq_and_reserved = <hexOctet>
clock_seq_low = <hexOctet>
node = <hexOctet><hexOctet><hexOctet>
<hexOctet><hexOctet><hexOctet>
hexOctet = <hexDiget><hexDiget>
hexDigit = <digit> | <a> | <b> | <c> | <d> | <e> | <f>
digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"
hyphen = "-"
a = "a" | "A"
b = "b" | "B"
c = "c" | "C"
d = "d" | "D"
e = "e" | "E"
f = "f" | "F"
The following is an example of the string representation of a GUID:

C.4 Comparing GUIDs
The following table lists the GUID fields in order of significance, from most significant to least significant, for purposes of GUID comparison. The table also shows the data types applicable to the fields.

Field Type
time_low Unsigned 32-bit integer
time_mid Unsigned 16-bit integer
time_hi_and_version Unsigned 16-bit integer
clock_seq_hi_and_reserved Unsigned 8-bit integer
clock_seq_low Unsigned 8-bit integer
node Unsigned 48-bit integer
Consider each field to be an unsigned integer as shown above. Then, to compare a pair of GUIDs, arithmetically compare the corresponding fields from each GUID in order of significance and according to their data type. Two GUIDs are equal if and only if all the corresponding fields are equal. The first of two GUIDs follows the second if the most significant field in which the GUIDs differ is greater for the first GUID. The first of a pair of GUIDs precedes the second if the most significant field in which the GUIDs differ is greater for the second GUID.

C.5 Node IDs when no IEEE 802 network card is available
If a system wants to generate GUIDs but has no IEE 802-compliant network card or other source of IEEE 802 addresses, then this section describes how to generate one.

The ideal solution is to obtain a 47-bit cryptographic quality random number, and use it as the low 47 bits of the node ID, with the high-order bit of the node ID set to 1. (The high-order bit is the unicast/multicast bit, which will never be set in IEEE 802 addresses obtained from network cards.)

If a system does not have a primitive to generate cryptographic quality random numbers, then in most systems there are usually a fairly large number of sources of randomness available from which one can be generated. Such sources are system-specific, but often include:

In addition, items such as the computer's name and the name of the operating system, while not strictly speaking random, will differentiate the results from those obtained by other systems.

The exact algorithm to generate a node ID using this data is system-specific, because both the data available and the functions to obtain them are often very system-specific. However, assuming that one can concatenate all the values from the randomness sources into a buffer, and that a cryptographic hash function such as MD5 [3] is available, the following code will compute a node ID:

Other hash functions, such as SHA-1 [4], can also be used (in which case HASHLEN will be 20). The only requirement is that the result be suitably random - in the sense that the outputs from a set uniformly distributed inputs are themselves uniformly distributed, and that a single bit change in the input can be expected to cause half of the output bits to change.

C.6 References
[1] Lisa Zahn, et.al. Network Computing Architecture. Englewood Cliffs, NJ: Prentice Hall, 1990

[2] OSF DCE Spec

[3] R. Rivest, RFC 1321, "The MD5 Message-Digest Algorithm," 04/16/1992.

[4] SHA Spec

TOC