Archive for FluentCassandra

02 Jun 2010

Your First Fluent Cassandra Application

23 Comments FluentCassandra

As your are probably aware by now if you follow my Twitter status or have looked in to some of my recent posts.  I am developing a library called FluentCassandra which is a .NET library for using the Cassandra database in a .NETty way.  The project has progressed quite nicely in the last couple of months and I am finally ready to start talking about it and giving examples on how it can be used in your applications.  So lets gets started…

Step 1)

The first thing we need to do is make sure that your machine is properly setup to run Cassandra.  Back in March I put together a jump start for Windows developers to do just that.  So if you don’t have it running on your machine already, start there.

Step 2)

The next thing we need to do is to locate and configure the database storage-conf.xml file, which was referenced in the previous steps instructions. 

  1. Open the storage-conf.xml in your favorite text editor.
  2. Add the following to the <Keyspaces /> tag in the file:
    <Keyspace Name="Blog">
        <ColumnFamily Name="Posts"
            ColumnType="Super"
            CompareWith="UTF8Type"
            CompareSubcolumnsWith="UTF8Type" />
    
        <ReplicaPlacementStrategy>org.apache.cassandra.locator.RackUnawareStrategy</ReplicaPlacementStrategy>
        <ReplicationFactor>1</ReplicationFactor>
        <EndPointSnitch>org.apache.cassandra.locator.EndPointSnitch</EndPointSnitch>
    </Keyspace>
  3. Save it.

The above configuration creates one Column Family (or table in RDBMS speak) called Posts in a Keyspace (or database in RDBMS speak) called Blog.  We are going to use this column family in our code below.

Step 3)

Next grab a copy of FluentCassandra from http://github.com/managedfusion/fluentcassandra

image

Create your your own console app or use FluentCassandra.Sandbox console app provided in the source downloaded.

Step 4)

Now for the fun part, the coding. 

The first thing we need to do is create a context for the database entities that we are going to save.  This is done with the CassandraContext.

using (var db = new CassandraContext(keyspace: "Blog", host: "localhost"))
{

The above code creates a Cassandra Context for the Blog Keyspace on our local Cassandra database.  After we have done this we want to get a reference to the family that we are going to execute our saves against.  This is done by getting the column family with the CompareWith and CompareSubcolumnWith types we specified in the above storage-conf.xml.

var family = db.GetColumnFamily<UTF8Type, UTF8Type>("Posts");

In the above code the first generic parameter is the CompareWith parameter and the second generic parameter is the CompareSubcolumnsWith parameter.  This creates the family repository that can be used to execute CRUD commands against this column family. 

Now that we have all this setup lets actually create a post record, with a key called “first-blog-post”.

// create post
dynamic post = family.CreateRecord(key: "first-blog-post");

The easiest way to accomplish this is to use the method provided in the family object for creating the properly typed record for use. This object will be used in a little while but first we need to create two super columns with the details of our blog post and the tags associated with the blog post.  This is done by using the CreateSuperColumn method on the post object we just created.

// create post details
dynamic postDetails = post.CreateSuperColumn();
postDetails.Title = "My First Cassandra Post";
postDetails.Body = "Blah. Blah. Blah. about my first post on how great Cassandra is to work with.";
postDetails.Author = "Nick Berardi";
postDetails.PostedOn = DateTimeOffset.Now;

// create post tags
dynamic tags = post.CreateSuperColumn();
tags[0] = "Cassandra";
tags[1] = ".NET";
tags[2] = "Database";
tags[3] = "NoSQL";

This creates two super column objects postDetails and tags that each contain their own set of columns.  In the case of the post details it contains information about the posts title, content body, author, and when it was posted on.  In the case of the tags it contains an array where each item in the array is a new column.  We will talk about why this works in a future post, but accept for now that it does work, even though one is used as an object with a bunch of properties and one is used as an array with a bunch of elements.

Lets now add the details and tags to our post record that we created above.

// add properties to post
post.Details = postDetails;
post.Tags = tags;

Just like the details above we are going to treat the post record as an object with properties.  This will complete our entire record that we want to save to the database.  Now lets attach it and save our record to the database.

// attach the post to the database
Console.WriteLine("attaching record");
db.Attach(post);

// save the changes
Console.WriteLine("saving changes");
db.SaveChanges();

So we have now done our first Cassandra database insert.  But that is only half the fun, lets read it back out of the database.  As with the write, we are going to use the same family object to do the read from the database.  The first thing we need to do is get the record out of the database using the same key, “first-blog-post”.

// get the post back from the database
Console.WriteLine("getting 'first-blog-post'");
dynamic getPost = family.Get("first-blog-post").FirstOrDefault();

The above code uses the LINQ-like syntax to retrieve the record.  This LINQ-like syntax can be started using the method Get on the family object.  And it then can be executed with any LINQ operation, in our case above we are using FirstOrDetault method.  The next thing we want to see is the details of the post, which can be easily retrieved using the same object structure that we put them in the database as.

// show details
dynamic getPostDetails = getPost.Details;
Console.WriteLine(
    String.Format("=={0} by {1}==\n{2}", 
        getPostDetails.Title, 
        getPostDetails.Author, 
        getPostDetails.Body
    ));

And now for the tags, which we are going to query in a way more suitable for an array.

// show tags
Console.Write("tags:");
foreach (var tag in getPost.Tags)
    Console.Write(String.Format("{0}:{1},", tag.Name, tag.Value));

Finish it off with this code, and we will be ready to run our first Cassandra application.

}

Console.Read();

Step 5)

The first thing we need to do to run our application is to make sure the database is running.  This may sound like a no-duh moment, but if you are use to SQL Server development, you really never have to make sure the database is running, so I just like to mention it.  If you don’t remember how to do this, go back to Step 1 and look at the instructions for starting the database.

Now lets run the application and see what results.  If everything ran correctly you will receive the following output.

attaching record
saving changes
getting 'first-blog-post'
==My First Cassandra Post by Nick Berardi==
Blah. Blah. Blah. about my first post on how great Cassandra is to work with.
tags:0:Cassandra,1:.NET,2:Database,3:NoSQL,

Step 6)

As a follow up exercise, see if you can add comments.  Hint you will need a new super column family as defined here:

<ColumnFamily Name="Comments"
    ColumnType="Super"
    CompareWith="TimeUUIDType"
    CompareSubcolumnsWith="UTF8Type" />

Hope this was an interesting exercise, and if you see any way to improve the interface or want to help out on the project please start by going to http://github.com/managedfusion/fluentcassandra.

Don’t forget to check out part 2 of this series.

24 May 2010

That No SQL Thing: Column (Family) Databases

No Comments FluentCassandra

Just wanted to mention a very well written post that explains Column Family Databases, like that of Cassandra, in the most straight forward way that I have found to explain the concept to .NET developers.  I have no doubt that most of you who read Ayende’s blog have already seen this, but for those that might have missed the post, or don’t follow him, here it is:

http://ayende.com/Blog/archive/2010/05/14/that-no-sql-thing-column-family-databases.aspx

The fictitious fluent interface that he demonstrates in this blog post was a great inspiration to my own Fluent Querying that I have included in Fluent Cassandra.

16 May 2010

TimeUUID only makes sense with version 1 UUIDs

2 Comments FluentCassandra

In a world where we are all use to dealing with objects we often forget that everything gets reduced to ones and zeros before being transmitted over the wire to the destination.  Most times the destination easily handles converting this object back in to an object on the other side that is easily understood and consumed.  The frustration comes when we run in to a situation where the other side doesn’t understand our transmitted data.  This can often cause us to pull our hair out, become irritable, and throw out hands up in disgust.  Well recently I have been doing all that when trying to solve what sounds like simple problem on the surface.  Sending the bytes of a Type 1 UUID, or GUID, over the wire from .NET to a server running on Java.

The Problem

It is a simple one, I need to transmit a Guid from .NET over to a UUID in Java.  Mind you Guid and UUID solve the same thing in the differently languages so this isn’t a case of mis-mapped different objects.

TimeUUID only makes sense with version 1 UUIDs

This error is the error that came up each and every time I sent the GUID object over to the Java Server using the following method.

Guid g = new Guid("38400000-8cf0-11bd-b23e-10b96e4ef00d");
var bytes = g.ToByteArray();

Server.Transmit(bytes); // this method call is an example

The part of this that really made me pull my hair out was the fact that the GUID that I was sending over was a valid Type 1 UUID.  If you don’t believe me check it out here.  So after writing a bunch of unit tests and double and triple checking the fact that the GUID I was sending over was a Type 1 GUID, it finally dawned on me that maybe the problem was the fact that Java handled the byte order of the UUID different than .NET.

So I wrote a quick Java program to verify this thesis.  And this is what I found.

UUID x = UUID.fromString("38400000-8cf0-11bd-b23e-10b96e4ef00d");
System.out.println(x.toString());
System.out.println(x.version());

// results:
// 38400000-8cf0-11bd-b23e-10b96e4ef00d
// 1

byte[] byteArray = { ... the bytes here ... };
ByteBuffer bb = ByteBuffer.warp(byteArray);
UUID y = new UUID (bb.getLong(), bb.getLong());
System.out.println(y.toString());
System.out.println(y.version());

// results:
// 00003840-f08c-bd11-b23e-10b96e4ef00d
// 11

You might have noticed that the byte order for the first 8 bytes of the UUID have been reversed for each section.  So I reversed the bytes in my byteArray and here is the output I received.

// results:
// 38400000-8cf0-11bd-b23e-10b96e4ef00d
// 1

I jumped for joy when I saw this come out correctly, because it meant that I had solved the problem.

But Not So Fast

I still needed to write the code on the .NET side to handle reversing and unreversing these bytes when sending over to and receiving from the Cassandra server.  So to correct this I just created a simple byte array converter for my Cassandra type TypeConverter object for TimeUUID. 

private byte[] ReverseLowFieldTimestamp(byte[] guid)
{
    return guid.Skip(0).Take(4).Reverse().ToArray();
}

private byte[] ReverseMiddleFieldTimestamp(byte[] guid)
{
    return guid.Skip(4).Take(2).Reverse().ToArray();
}

private byte[] ReverseHighFieldTimestamp(byte[] guid)
{
    return guid.Skip(6).Take(2).Reverse().ToArray();
}

public override object ConvertTo(ITypeDescriptorContext context, System.Globalization.CultureInfo culture, object value, Type destinationType)
{
    if (!(value is Guid))
        return null;

    if (destinationType == typeof(byte[]))
    {
        var oldArray = ((Guid)value).ToByteArray();
        var newArray = new byte[16];
        Array.Copy(ReverseLowFieldTimestamp(oldArray), 0, newArray, 0, 4);
        Array.Copy(ReverseMiddleFieldTimestamp(oldArray), 0, newArray, 4, 2);
        Array.Copy(ReverseHighFieldTimestamp(oldArray), 0, newArray, 6, 2);
        Array.Copy(oldArray, 8, newArray, 8, 8);
        return newArray;
    }

    if (destinationType == typeof(Guid))
        return (Guid)value;

    return null;
}

And a little coding and some unit tests later, I was able to verify that the UUID (or GUID) would be correctly transmitted to Java and read back from Java correctly with the changes I made above.  I don’t completely understand why the byte order is different, because UUID is a pretty solid standard.  But I am sure it has something to do with the Endianness of the two languages.

In The End

This turned out to more of an adventure than I thought it would be, and I had to break out some Java code to do it, but after about an hour coding and testing, everything seems to be working and TimeUUID is now supported under .NET. 

Note: By the way I really didn’t want to install Eclipse to test out Java snippet, luckily Mono Develop supports Java development.