Archive for FluentCassandra

05 Apr 2010

Creating a Time UUID (GUID) in .NET

4 Comments FluentCassandra

Previously I had written about how to setup Cassandra as a database on your Windows machine.  As I was diving in deeper to learn more about the subject, I realized that .NET lacks a critical type to Cassandra, for column comparison and sorting, called TimeUUIDType. TimeUUIDType is a Version 1 UUID used in the CompareWith attribute of the storage config file.  A Version 1 UUID is defined as the following:

Conceptually, the original (version 1) generation scheme for UUIDs was to concatenate the UUID version with the MAC address of the computer that is generating the UUID, and with the number of 100-nanosecond intervals since the adoption of the Gregorian calendar in the West. In practice, the actual algorithm is more complicated. This scheme has been criticized in that it is not sufficiently “opaque”; it reveals both the identity of the computer that generated the UUID and the time at which it did so.

What are the Cassandra CompareWith Types

The CompareWith attribute tells Cassandra how to sort the columns for slicing operations.  The default is BytesType, which is a straightforward lexical comparison of the bytes in each column.  Other options are AsciiType, UTF8Type, LexicalUUIDType, TimeUUIDType, and LongType.  You can also specify the fully-qualified class name to a class of your choice extending org.apache.cassandra.db.marshal.AbstractType.

SuperColumns have a similar CompareSubcolumnsWith attribute.

  • BytesType: Simple sort by byte value.  No validation is performed.
  • AsciiType: Like BytesType, but validates that the input can be parsed as US-ASCII.
  • UTF8Type: A string encoded as UTF8
  • LongType: A 64bit long
  • LexicalUUIDType: A 128bit UUID, compared lexically (by byte value)
  • TimeUUIDType: a 128bit version 1 UUID, compared by timestamp

NOTE: The CompareWith types are what we in the relation database world call Table keys.

How do these Types Relate to .NET Types

These types listed above have the following mapping to .NET types:

  • BytesType: Byte[]
  • AsciiType: String (generated by Encoding.ASCII)
  • UTF8Type: String (generated by Encoding.UTF8)
  • LongType: Int64
  • LexicalUUIDType: Guid (generated by Guid.NewGuid())
  • TimeUUIDType: Guid (no native way to generate from .NET Framework)

As you can see from above all of the types can easily be generated by .NET, except for the TimeUUIDType.

What is the point of the TimeUUIDType

I will let Arin Sarkissian describe why you should care about TimeUUIDType:

Since we’re going to want to display lists of entries in chronological order we’ll make sure each Columns name is a time UUID and set the ColumnFamilys CompareWith to TimeUUIDType. This will sort the Columns by time satisfying our “chronological order” requirement. So doing stuff like “get the latest 10 entries tagged ‘foo’” is going to be a super efficient operation.

As Arin’ says TimeUUIDTYpe is a “super efficient” way to perform chronological sorting and pulling of data from the Cassandra database.  And since our needs as developers to store chronological data in a database don’t really differ by programming language, I have created a Time UUID generator that can fit the data in to a standard Guid object.

The Time UUID generator was pretty easy to create after I figured out the byte array structure and the differences between how Java and .NET generate byte arrays.  Below is all the code you need to generate a Time UUID or Time-Based Guid object in .NET.

public static Guid GenerateTimeBasedGuid(DateTime dateTime)
{
	long ticks = dateTime.Ticks - GregorianCalendarStart.Ticks;

	byte[] guid = new byte[ByteArraySize];
	byte[] clockSequenceBytes = BitConverter.GetBytes(Convert.ToInt16(Environment.TickCount % Int16.MaxValue));
	byte[] timestamp = BitConverter.GetBytes(ticks);

	// copy node
	Array.Copy(Node, 0, guid, NodeByte, Node.Length);

	// copy clock sequence
	Array.Copy(clockSequenceBytes, 0, guid, GuidClockSequenceByte, clockSequenceBytes.Length);

	// copy timestamp
	Array.Copy(timestamp, 0, guid, 0, timestamp.Length);

	// set the variant
	guid[VariantByte] &= (byte)VariantByteMask;
	guid[VariantByte] |= (byte)VariantByteShift;

	// set the version
	guid[VersionByte] &= (byte)VersionByteMask;
	guid[VersionByte] |= (byte)((int)GuidVersion.TimeBased << VersionByteShift);

	return new Guid(guid);
}

You can find the actual code as part of the FluentCassandra project which contains useful type generators and serialization types for working with Cassandra.  The actual file for generating the time-based Guid is located here.  https://github.com/managedfusion/fluentcassandra/blob/master/src/GuidGenerator.cs

I hope this little tidbit helps in the adoption of Cassandra in the .NET community, because I believe NoSQL databases like Cassandra are where all the cool jobs you probably want to be working in are moving to.

NOTE: To use the above code to generate a time specific dates which can be used for pull data out of the database or putting current dates in to the database, you just need to pass in a valid .NET DateTime object like this:

// generate a date/time for now
Guid nowGuid = GenerateTimeBasedGuid(DateTime.Now);

// generate a date/time for a specific date
Guid thenGuid = GenerateTimeBasedGuid(new DateTime(1980, 3. 14));
30 Mar 2010

Cassandra Jump Start For The Windows Developer

4 Comments FluentCassandra

Recently I have been exploring the NoSQL options for .NET and specifically a database called Cassandra.  In case you haven’t heard of Cassandra before, it is a decentralized, fault-tolerant, elastic database designed by Facebook for high availability.  As Wikipedia describes it:

Cassandra is an open source distributed database management system. It is an Apache Software Foundation top-level project, as of February 17, 2010, designed to handle very large amounts of data spread out across many commodity servers while providing a highly available service with no single point of failure. It is a NoSQL solution that was initially developed by Facebook and powers their Inbox Search feature. Jeff Hammerbacher, who led the Facebook Data team at the time, has described Cassandra as a BigTable data model running on an Amazon Dynamo-like infrastructure.

I bet you have used data that has been served by Cassandra and not even realized it, here are some prominent users of Cassandra:

  • Facebook
  • Digg
  • Twitter
  • Reddit

Sounds interesting or at least worth a look, right?  Well I thought so, however during my journey of getting the database setup I have come to realize there is almost no documentation on installation for Linux, and even less for Windows.  So I am going to provide you with a jump start to installing Cassandra on your machine.  I am doing this so you don’t have to spend days jumping around the web, going down false paths, and pulling your hair out like I did, all so you can get on to what you really care about … development.

First Things First

The first thing you need to understand about Cassandra is that it is developed in Java.  So you can run it on any machine that supports Java 6 or better.  So before you go any farther make sure you Java JRE is updated to the latest version.

The next thing you need is a copy of Cassandra.  Which can be found here.  My setup is going to be based off of the latest stable release. 

Running From Windows

As I said before you can run from an operating system that Java has a runtime for.  So the first and probably most obvious one for a Windows developer, is running Cassandra on Windows.  To install Cassandra on windows just follow these steps:

  1. Extract Cassandra to a directory of your choice (I used c:\development\cassandra)
  2. Set the following environment variables
    1. JAVA_HOME (To the directory where you install the JRE, this should not be the bin directory)
    2. CASSANDRA_HOME (To the directory you extracted the files to in step 1)
  3. Modify your Cassandra config file as you like and don’t forget to update the directory locations from a UNIX like path to something on your windows directory (in my example the config file is located at c:\development\cassandra\conf\storage-conf.xml)
  4. Open cmd and run the cassandra.bat file (in my example the batch file is located at c:\development\cassandra\bin\cassandra.bat)
    cd c:\development\cassandra\bin\
    .\cassandra.bat
  5. You can verify that Cassandra is running, by trying to connect to the server.  To do this open a new cmd and run the cassandra-cli.bat file from the bin directory.

    cd c:\development\cassandra\bin\
    .\cassandra-cli.bat
    connect localhost/9160

This is easy to get running, but there is some manual process that you have to go through each time to get the server running. In the future when you want to start up the Cassandra database for development, just repeat Step 4.

Running From A Linux Virtual Machine

The other way, to run Cassandra is through a Virtual Machine running Linux.  This setup is just as easy as the Windows setup as long as you have some experience with Linux.  I am going to start the install steps assuming that you have already installed Ubuntu Server.  This way the process is generic if you want to run the server on an physical or virtual machine.

  1. Make sure you update.

    sudo apt-get update
    sudo apt-get upgrade
  2. Open up your apt-get sources list with nano for editing.

    sudo nano /etc/apt/sources.list
  3. Add the following Apache Cassandra sources to the list.
    deb http://www.apache.org/dist/cassandra/debian unstable main
    deb-src http://www.apache.org/dist/cassandra/debian unstable main

    After you add these two lines, press CTRL+X to close Nano. It’ll ask “Save modified buffer?” Press Y. Press Enter when Nano asks “File Name to Write.”

  4. Update your apt-get sources to get the latest information about Cassandra in to your sources repository.

    sudo apt-get update
  5. At this point you are going to get an error, don’t freak out, this is totally expected.  It will look something like this:

    W: GPG error: http://www.apache.org unstable Release: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY F758CE318D77295D
  6. Use the following three commands to import the signature in to your repository.

    gpg --recv-keys F758CE318D77295D
    sudo apt-key add ~/.gnupg/pubring.gpg
    sudo apt-get update

    NOTE: You must replace the key value ‘F758CE318D77295D’ with the key value you received in your error message.

  7. Now is the big moment to install Cassandra.

    sudo apt-get install cassandra
  8. Now just restart your machine, to do it from the command line run the following:

    sudo reboot

Unlike Windows to startup this Cassandara database so that you can start using it for development, you just need to boot up the physical or virtual machine and it should be ready for development.

The Thrifty Way To Connect

Thrift is a protocol also developed by Facebook to create agnostic interfaces for all the common languages based off a simple configuration file.  The Thrift team describes their mission as:

Thrift is a software project spanning a variety of programming languages and use cases. Our goal is to make reliable, performant communication and data serialization across languages as efficient and seamless as possible.

As I found out there wasn’t any good way to build the thrift executable in Windows, which is another reason the Linux VM came in handy.  The thrift executable is needed to generate the protocol interface for Cassandra.  You can read how to build the thrift executable here for your operating system.

After you have the executable you need to run it against the Cassandra interface definition.  I am not going to go much into depth with this, because this part of the process is pretty well documented on the internet, but here is the command I ran to create the generated Cassandra interface for C#.

thrift -gen csharp cassandra.thrift 

I am not going to leave you high and dry, I have already generated the Cassandra Thrift interfaces for the following languages for you:

Note: these where generated based on Cassandra 0.5.1, so that may and probably won’t work with future releases of Cassandra.

You can find the necessary supporting libraries for each of the above languages by going to http://svn.apache.org/viewvc/incubator/thrift/trunk/lib/.

Conclusion

If you are like me and interested in using it with a .NET application here is a quick demonstration class that will help you get started. 

https://code.google.com/p/coderjournal/source/browse/trunk/Posts/2010/03/CassandraDemo.cs

But no matter the language you use, I hope this has provided you a good jump start with Cassandra.