Saturday 28 July 2012

What is a GUID " globally unique identifier"

What is a GUID? The acronym stands for "globally unique identifier"; GUIDs are also called UUIDs, which stands for "universally unique identifier". (It is unclear to me why we need two nigh-identical names for the same thing, but there you have it.) A GUID is essentially a 128 bit integer, and when written in its human-readable form, is written in hexadecimal in the pattern {xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx}.

  • GUIDs are guaranteed to be unique but not guaranteed to be random. Do not use them as random numbers.
  • GUIDs that are random numbers are not cryptographic strength random numbers.
  • GUIDs are only unique when everyone cooperates; if someone wants to re-use a previously-generated GUID and thereby artificially create a collision, you cannot stop them. GUIDs are not a security mechanism.
  • GUIDs have an internal structure; at least six of the bits are reserved and have special meanings.
  • GUIDs are allowed to be generated sequentially, and in practice often are.
  • GUIDs are only unique when taken as a whole.
  • GUIDs can be generated using a variety of algorithms.
  • GUIDs that are generated randomly are statistically highly unlikely to collide in the foreseeable future.
  • GUIDs could reveal information about the time and place they were created, either directly in the case of version one GUIDs, or via cryptanalysis in the case of version four GUIDs.
  • GUIDs might be generated by some entirely different algorithm in the future.


The purpose of a GUID is, as the name implies, to uniquely identify something, so that we can refer to that thing by its identifier and have confidence that everyone can agree upon what thing we are referring to. Think about this problem as it applies to, say, books. It is cumbersome to refer to a book by quoting it in its entirety every time you mention it. Instead, we give every book an identifier in the form of its title. The problem with using a title as an identifier is that there may be many different books with the same title. I have three different books all entitled "The C# Programming Language" on my desk right now; if I want to refer to one of them in particular, I'd typically have to give the edition number. But there is nothing (apart from their good sense) stopping some entirely different publisher from also publishing a book called "The C# Programming Language, fourth edition" that differs from the others.

Publishers have solved this problem by creating a globally unique identifier for each book called the International Standard Book Number, or ISBN. This is the 13-decimal-digit bar coded number you see on pretty much every book(*). How do publishers manage to get a unique number for each of the millions of books published? They divide and conquer; the digits of an ISBN each have a different meaning. Each country has been assigned a certain range of ISBN numbers that they can allocate; governments then further allocate subsets of their numbers to publishers. Publishers then decide for themselves how to assign the remaining digits to each book. The ISBNs for my three editions of the C# spec are 978-0-321-15491-6, 978-0-321-56299-9 and 978-0-321-74176-9. You'll notice that the first seven digits are exactly the same for each; they identify that this is a publishing industry code (978), that the book was published in a primarily English-speaking region (0), by Addison-Wesley (321). The next five digits are Addison-Wesley's choice, and the final digit is a checksum. If I wish to uniquely identify the fourth edition of the C# specification I need not state the ambiguous title at all; I can simply refer you to book number 978-0-321-74176-9, and everyone in the world can determine precisely which book I'm talking about.

An important and easily overlooked characteristic of the ISBN uniqueness system is that it only works if everyone who uses it is non-hostile. If a rogue publisher decides to deliberately publish books with the ISBN numbers of existing books so as to create confusion then the usefulness of the identifier is compromised because it no longer uniquely identifies a book. ISBN numbers are not a security system, and neither are GUIDs; ISBN numbers and GUIDs  prevent accidental collisions. Similarly, traffic lights only prevent accidental collisions if everyone agrees to follow the rules of traffic lights; if anyone decides to go when the light is red then collisions might no longer be avoided, and if someone is attempting to deliberately cause a collision then traffic lights cannot stop them.

The ISBN system has the nice property that you can "decode" an ISBN and learn something about the book just from its number. But it has the enormous down side that it is extraordinarily expensive to administer. There has to be international agreement on the general form of the identifier and on what the industry and language codes mean. In any given country there must be some organization (either a government body or private companies contracted by the government) to assign numbers to publishers. It can cost hundreds of dollars to obtain a unique ISBN.

No comments :