Creating a Time UUID (GUID) in .NET

Previously I had written about how to setup Cassandra as a database on your Windows machine.  As I was diving in deeper to learn more about the subject, I realized that .NET lacks a critical type to Cassandra, for column comparison and sorting, called TimeUUIDType. TimeUUIDType is a Version 1 UUID used in the CompareWith attribute of the storage config file.  A Version 1 UUID is defined as the following:

Conceptually, the original (version 1) generation scheme for UUIDs was to concatenate the UUID version with the MAC address of the computer that is generating the UUID, and with the number of 100-nanosecond intervals since the adoption of the Gregorian calendar in the West. In practice, the actual algorithm is more complicated. This scheme has been criticized in that it is not sufficiently "opaque"; it reveals both the identity of the computer that generated the UUID and the time at which it did so.

What are the Cassandra CompareWith Types

The CompareWith attribute tells Cassandra how to sort the columns for slicing operations.  The default is BytesType, which is a straightforward lexical comparison of the bytes in each column.  Other options are AsciiType, UTF8Type, LexicalUUIDType, TimeUUIDType, and LongType.  You can also specify the fully-qualified class name to a class of your choice extending org.apache.cassandra.db.marshal.AbstractType.

SuperColumns have a similar CompareSubcolumnsWith attribute.

  • BytesType: Simple sort by byte value.  No validation is performed.
  • AsciiType: Like BytesType, but validates that the input can be parsed as US-ASCII.
  • UTF8Type: A string encoded as UTF8
  • LongType: A 64bit long
  • LexicalUUIDType: A 128bit UUID, compared lexically (by byte value)
  • TimeUUIDType: a 128bit version 1 UUID, compared by timestamp

NOTE: The CompareWith types are what we in the relation database world call Table keys.

How do these Types Relate to .NET Types

These types listed above have the following mapping to .NET types:

  • BytesType: Byte[]
  • AsciiType: String (generated by Encoding.ASCII)
  • UTF8Type: String (generated by Encoding.UTF8)
  • LongType: Int64
  • LexicalUUIDType: Guid (generated by Guid.NewGuid())
  • TimeUUIDType: Guid (no native way to generate from .NET Framework)

As you can see from above all of the types can easily be generated by .NET, except for the TimeUUIDType.

What is the point of the TimeUUIDType

I will let Arin Sarkissian describe why you should care about TimeUUIDType:

Since we’re going to want to display lists of entries in chronological order we’ll make sure each Columns name is a time UUID and set the ColumnFamilys CompareWith to TimeUUIDType. This will sort the Columns by time satisfying our “chronological order” requirement. So doing stuff like “get the latest 10 entries tagged ‘foo’” is going to be a super efficient operation.

As Arin' says TimeUUIDTYpe is a “super efficient” way to perform chronological sorting and pulling of data from the Cassandra database.  And since our needs as developers to store chronological data in a database don’t really differ by programming language, I have created a Time UUID generator that can fit the data in to a standard Guid object.

The Time UUID generator was pretty easy to create after I figured out the byte array structure and the differences between how Java and .NET generate byte arrays.  Below is all the code you need to generate a Time UUID or Time-Based Guid object in .NET.

public static Guid GenerateTimeBasedGuid(DateTime dateTime)  
{
    long ticks = dateTime.Ticks - GregorianCalendarStart.Ticks;

    byte[] guid = new byte[ByteArraySize];
    byte[] clockSequenceBytes = BitConverter.GetBytes(Convert.ToInt16(Environment.TickCount % Int16.MaxValue));
    byte[] timestamp = BitConverter.GetBytes(ticks);

    // copy node
    Array.Copy(Node, 0, guid, NodeByte, Node.Length);

    // copy clock sequence
    Array.Copy(clockSequenceBytes, 0, guid, GuidClockSequenceByte, clockSequenceBytes.Length);

    // copy timestamp
    Array.Copy(timestamp, 0, guid, 0, timestamp.Length);

    // set the variant
    guid[VariantByte] &= (byte)VariantByteMask;
    guid[VariantByte] |= (byte)VariantByteShift;

    // set the version
    guid[VersionByte] &= (byte)VersionByteMask;
    guid[VersionByte] |= (byte)((int)GuidVersion.TimeBased << VersionByteShift);

    return new Guid(guid);
}

You can find the actual code as part of the FluentCassandra project which contains useful type generators and serialization types for working with Cassandra.  The actual file for generating the time-based Guid is located here.  https://github.com/managedfusion/fluentcassandra/blob/master/src/GuidGenerator.cs

I hope this little tidbit helps in the adoption of Cassandra in the .NET community, because I believe NoSQL databases like Cassandra are where all the cool jobs you probably want to be working in are moving to.

NOTE: To use the above code to generate a time specific dates which can be used for pull data out of the database or putting current dates in to the database, you just need to pass in a valid .NET DateTime object like this:

// generate a date/time for now  
Guid nowGuid = GenerateTimeBasedGuid(DateTime.Now);

// generate a date/time for a specific date
Guid thenGuid = GenerateTimeBasedGuid(new DateTime(1980, 3. 14));

Nick Berardi

In charge of Cloud Drive Desktop at @Amazon, Entrepreneur, Microsoft MVP, ASPInsider, co-founder and CTO of @CaddioApp, Father, and @SeriouslyOpen host