06 Jun 2010

Your First Fluent Cassandra Application (part 2)

10 Comments FluentCassandra

Last time I demonstrated how to create your first Fluent Cassandra app.  After we finished learning about how to create records and save them to the database, I issued a challenge to implement comments for our command line blog app we created.  I hinted at how I would have done it with this column family configuration:

<ColumnFamily Name="Comments"
    ColumnType="Super"
    CompareWith="TimeUUIDType"
    CompareSubcolumnsWith="UTF8Type" />

And this is what we are going to implement today.

Basic Structure

The basic information of our blog’s comments we need to keep, is the standard information that you would expect from any blog comment.

  • Name
  • Email
  • Website
  • Comment
  • Date

However in Cassandra we aren’t going to use the standard flat table that you might see in an RDBMS system, where the comment row contains all the information in the bullet list above, plus a reference to the post identity, all summed up under a comment identity.  In a column based database like Cassandra we would use a structure that looks like this:

key: 

“first-blog-post”

super column name: 

2010-6-3 12:43:00 AM (in Time UUID)

name: 

“Nick Berardi”

email: 

“nick@coderjournal.com”

website: 

“www.coderjournal.com”

comment:

“Wow fluent cassandra is really neeto…”

super column name: 

2010-6-3 3:12:33 PM (in Time UUID)

name: 

“Joe User”

email: 

“joe@gmail.com”

website: 

“”

comment:

“I agree with you Nick!”

The first thing you might notice is that the key for our comments family is going to be the same as the key for our posts family.  This is done to tie the contents of the two tables together under one comment lookup entity.  The next thing you may notice is that the super column name isn’t actually a string, it is a Time UUID or for you .NET people a System.Guid that stores the date time.  And then the last thing is the actual property columns for all the meta data we want to store about each comment.

Coding The Comments

We are going to pick up where we left off in the last post.  If you want to follow along, open up your previous project from the last post, or use the file located here.

The first thing we need to do, as we did with the posts, is to get the repository for the comments column family.

// get the comments family
var commentsFamily = db.GetColumnFamily<TimeUUIDType, UTF8Type>("Comments");

Then we need to create the record for adding the comments against, as we did for the tags and post details in the previous post:

dynamic postComments = commentsFamily.CreateRecord(key: "first-blog-post");

And this time lets attach the postComments to the database ahead of time, so that it tracks the changes as they are made.

// lets attach it to the database before we add the comments
db.Attach(postComments);

Now lets create 5 comments that are 5 seconds apart from each other to give us some data to play with in the database, and then save the changes off to the database.

// add 5 comments
for (int i = 0; i < 5; i++)
{
    dynamic comment = postComments.CreateSuperColumn();
    comment.Name = i + " Nick Berardi";
    comment.Email = i + " nick@coderjournal.com";
    comment.Website = i + " www.coderjournal.com";
    comment.Comment = i + " Wow fluent cassandra is really great and easy to use.";

    postComments[GuidGenerator.GenerateTimeBasedGuid()] = comment;

    Console.WriteLine("Comment " + i + " Done");
    Thread.Sleep(TimeSpan.FromSeconds(5));
}

// save the comments
db.SaveChanges();

Now that we have 5 comments in the database stored for our blog post, we should probably query them out:

DateTime lastDate = DateTime.Now;

for (int page = 0; page < 2; page++)
{

Since comments are sometimes paged, we are going to query two pages of comments separately from the database for our blog post.  Our comments are stored by date, so we need to pull them out of the database by date.  This is done by starting at the current date and querying backwards.

// lets back the date off by a millisecond so we don't get paging overlaps
lastDate = lastDate.AddMilliseconds(-1D);

Console.WriteLine("Showing page " + page + " starting at " + lastDate.ToLocalTime());

var comments = commentsFamily.Get("first-blog-post")
    .Reverse()
    .Fetch(lastDate)
    .Take(3)
    .FirstOrDefault();

The above is a little more complex than our last query, but easy enough to understand the basic premise of what it is doing, because of the descriptive fluent interface.  Since we are querying by date it is easiest to pull them out in the reverse order of LIFO (last-in-first-out).  To do this we use a method called Reverse, which does exactly what it sounds like, reverses the column order.  Then we are going to Fetch a column starting at our lastDate and Take 3 columns for our page.  And to finish it off since we are only querying one key, we are going to use the LINQ method FirstOrDefault to return our queried records back to us.

If the above query was SQL it would look something like this:

SELECT TOP(3) *
FROM comments
WHERE commented_on <= getdate()

Now that we have our comments, lets display the comment as we did for the post in the previous article.

foreach (dynamic comment in comments)
{
    var dateTime = GuidGenerator.GetDateTime((Guid)comment.ColumnName);

    Console.WriteLine(String.Format("{0:T} : {1} ({2} - {3})",
        dateTime.ToLocalTime(),
        comment.Name,
        comment.Email,
        comment.Website
    ));

    lastDate = dateTime;
}

Nothing really mind blowing is happening here, we use the column name (our Time UUID) to extract the date, and then we display the properties for the comments.  There is a subtle part of the code at the bottom of the foreach loop where we set the date to the lastDate.  This is done to keep track of the last date we pulled out of the database so we can requery by that date when we pull the comments from the database for the second page.  You may or may have not noticed this code in the above statement:

// lets back the date off by a millisecond so we don't get paging overlaps
lastDate = lastDate.AddMilliseconds(-1D);

But this is used so we don’t pull back the same comment over again.

Fun Part

The fun part for me is hitting the run button and waiting to see if everything is working as I intended.  If everything is working as expected this is what the output will look like for our new comments section.

Comment 0 Done
Comment 1 Done
Comment 2 Done
Comment 3 Done
Comment 4 Done
Showing page 0 starting at 6/6/2010 9:13:22 AM
9:13:17 AM : 4 Nick Berardi (4 nick@coderjournal.com - 4 www.coderjournal.com)
9:13:12 AM : 3 Nick Berardi (3 nick@coderjournal.com - 3 www.coderjournal.com)
9:13:07 AM : 2 Nick Berardi (2 nick@coderjournal.com - 2 www.coderjournal.com)
Showing page 1 starting at 6/6/2010 9:13:07 AM
9:13:02 AM : 1 Nick Berardi (1 nick@coderjournal.com - 1 www.coderjournal.com)
9:12:57 AM : 0 Nick Berardi (0 nick@coderjournal.com - 0 www.coderjournal.com)

We added in our 5 comments and and then we pulled back 2 pages of up to 3 comments each.

Pretty neat huh?

02 Jun 2010

Your First Fluent Cassandra Application

23 Comments FluentCassandra

As your are probably aware by now if you follow my Twitter status or have looked in to some of my recent posts.  I am developing a library called FluentCassandra which is a .NET library for using the Cassandra database in a .NETty way.  The project has progressed quite nicely in the last couple of months and I am finally ready to start talking about it and giving examples on how it can be used in your applications.  So lets gets started…

Step 1)

The first thing we need to do is make sure that your machine is properly setup to run Cassandra.  Back in March I put together a jump start for Windows developers to do just that.  So if you don’t have it running on your machine already, start there.

Step 2)

The next thing we need to do is to locate and configure the database storage-conf.xml file, which was referenced in the previous steps instructions. 

  1. Open the storage-conf.xml in your favorite text editor.
  2. Add the following to the <Keyspaces /> tag in the file:
    <Keyspace Name="Blog">
        <ColumnFamily Name="Posts"
            ColumnType="Super"
            CompareWith="UTF8Type"
            CompareSubcolumnsWith="UTF8Type" />
    
        <ReplicaPlacementStrategy>org.apache.cassandra.locator.RackUnawareStrategy</ReplicaPlacementStrategy>
        <ReplicationFactor>1</ReplicationFactor>
        <EndPointSnitch>org.apache.cassandra.locator.EndPointSnitch</EndPointSnitch>
    </Keyspace>
  3. Save it.

The above configuration creates one Column Family (or table in RDBMS speak) called Posts in a Keyspace (or database in RDBMS speak) called Blog.  We are going to use this column family in our code below.

Step 3)

Next grab a copy of FluentCassandra from http://github.com/managedfusion/fluentcassandra

image

Create your your own console app or use FluentCassandra.Sandbox console app provided in the source downloaded.

Step 4)

Now for the fun part, the coding. 

The first thing we need to do is create a context for the database entities that we are going to save.  This is done with the CassandraContext.

using (var db = new CassandraContext(keyspace: "Blog", host: "localhost"))
{

The above code creates a Cassandra Context for the Blog Keyspace on our local Cassandra database.  After we have done this we want to get a reference to the family that we are going to execute our saves against.  This is done by getting the column family with the CompareWith and CompareSubcolumnWith types we specified in the above storage-conf.xml.

var family = db.GetColumnFamily<UTF8Type, UTF8Type>("Posts");

In the above code the first generic parameter is the CompareWith parameter and the second generic parameter is the CompareSubcolumnsWith parameter.  This creates the family repository that can be used to execute CRUD commands against this column family. 

Now that we have all this setup lets actually create a post record, with a key called “first-blog-post”.

// create post
dynamic post = family.CreateRecord(key: "first-blog-post");

The easiest way to accomplish this is to use the method provided in the family object for creating the properly typed record for use. This object will be used in a little while but first we need to create two super columns with the details of our blog post and the tags associated with the blog post.  This is done by using the CreateSuperColumn method on the post object we just created.

// create post details
dynamic postDetails = post.CreateSuperColumn();
postDetails.Title = "My First Cassandra Post";
postDetails.Body = "Blah. Blah. Blah. about my first post on how great Cassandra is to work with.";
postDetails.Author = "Nick Berardi";
postDetails.PostedOn = DateTimeOffset.Now;

// create post tags
dynamic tags = post.CreateSuperColumn();
tags[0] = "Cassandra";
tags[1] = ".NET";
tags[2] = "Database";
tags[3] = "NoSQL";

This creates two super column objects postDetails and tags that each contain their own set of columns.  In the case of the post details it contains information about the posts title, content body, author, and when it was posted on.  In the case of the tags it contains an array where each item in the array is a new column.  We will talk about why this works in a future post, but accept for now that it does work, even though one is used as an object with a bunch of properties and one is used as an array with a bunch of elements.

Lets now add the details and tags to our post record that we created above.

// add properties to post
post.Details = postDetails;
post.Tags = tags;

Just like the details above we are going to treat the post record as an object with properties.  This will complete our entire record that we want to save to the database.  Now lets attach it and save our record to the database.

// attach the post to the database
Console.WriteLine("attaching record");
db.Attach(post);

// save the changes
Console.WriteLine("saving changes");
db.SaveChanges();

So we have now done our first Cassandra database insert.  But that is only half the fun, lets read it back out of the database.  As with the write, we are going to use the same family object to do the read from the database.  The first thing we need to do is get the record out of the database using the same key, “first-blog-post”.

// get the post back from the database
Console.WriteLine("getting 'first-blog-post'");
dynamic getPost = family.Get("first-blog-post").FirstOrDefault();

The above code uses the LINQ-like syntax to retrieve the record.  This LINQ-like syntax can be started using the method Get on the family object.  And it then can be executed with any LINQ operation, in our case above we are using FirstOrDetault method.  The next thing we want to see is the details of the post, which can be easily retrieved using the same object structure that we put them in the database as.

// show details
dynamic getPostDetails = getPost.Details;
Console.WriteLine(
    String.Format("=={0} by {1}==\n{2}", 
        getPostDetails.Title, 
        getPostDetails.Author, 
        getPostDetails.Body
    ));

And now for the tags, which we are going to query in a way more suitable for an array.

// show tags
Console.Write("tags:");
foreach (var tag in getPost.Tags)
    Console.Write(String.Format("{0}:{1},", tag.Name, tag.Value));

Finish it off with this code, and we will be ready to run our first Cassandra application.

}

Console.Read();

Step 5)

The first thing we need to do to run our application is to make sure the database is running.  This may sound like a no-duh moment, but if you are use to SQL Server development, you really never have to make sure the database is running, so I just like to mention it.  If you don’t remember how to do this, go back to Step 1 and look at the instructions for starting the database.

Now lets run the application and see what results.  If everything ran correctly you will receive the following output.

attaching record
saving changes
getting 'first-blog-post'
==My First Cassandra Post by Nick Berardi==
Blah. Blah. Blah. about my first post on how great Cassandra is to work with.
tags:0:Cassandra,1:.NET,2:Database,3:NoSQL,

Step 6)

As a follow up exercise, see if you can add comments.  Hint you will need a new super column family as defined here:

<ColumnFamily Name="Comments"
    ColumnType="Super"
    CompareWith="TimeUUIDType"
    CompareSubcolumnsWith="UTF8Type" />

Hope this was an interesting exercise, and if you see any way to improve the interface or want to help out on the project please start by going to http://github.com/managedfusion/fluentcassandra.

Don’t forget to check out part 2 of this series.

30 Mar 2010

Cassandra Jump Start For The Windows Developer

4 Comments FluentCassandra

Recently I have been exploring the NoSQL options for .NET and specifically a database called Cassandra.  In case you haven’t heard of Cassandra before, it is a decentralized, fault-tolerant, elastic database designed by Facebook for high availability.  As Wikipedia describes it:

Cassandra is an open source distributed database management system. It is an Apache Software Foundation top-level project, as of February 17, 2010, designed to handle very large amounts of data spread out across many commodity servers while providing a highly available service with no single point of failure. It is a NoSQL solution that was initially developed by Facebook and powers their Inbox Search feature. Jeff Hammerbacher, who led the Facebook Data team at the time, has described Cassandra as a BigTable data model running on an Amazon Dynamo-like infrastructure.

I bet you have used data that has been served by Cassandra and not even realized it, here are some prominent users of Cassandra:

  • Facebook
  • Digg
  • Twitter
  • Reddit

Sounds interesting or at least worth a look, right?  Well I thought so, however during my journey of getting the database setup I have come to realize there is almost no documentation on installation for Linux, and even less for Windows.  So I am going to provide you with a jump start to installing Cassandra on your machine.  I am doing this so you don’t have to spend days jumping around the web, going down false paths, and pulling your hair out like I did, all so you can get on to what you really care about … development.

First Things First

The first thing you need to understand about Cassandra is that it is developed in Java.  So you can run it on any machine that supports Java 6 or better.  So before you go any farther make sure you Java JRE is updated to the latest version.

The next thing you need is a copy of Cassandra.  Which can be found here.  My setup is going to be based off of the latest stable release. 

Running From Windows

As I said before you can run from an operating system that Java has a runtime for.  So the first and probably most obvious one for a Windows developer, is running Cassandra on Windows.  To install Cassandra on windows just follow these steps:

  1. Extract Cassandra to a directory of your choice (I used c:\development\cassandra)
  2. Set the following environment variables
    1. JAVA_HOME (To the directory where you install the JRE, this should not be the bin directory)
    2. CASSANDRA_HOME (To the directory you extracted the files to in step 1)
  3. Modify your Cassandra config file as you like and don’t forget to update the directory locations from a UNIX like path to something on your windows directory (in my example the config file is located at c:\development\cassandra\conf\storage-conf.xml)
  4. Open cmd and run the cassandra.bat file (in my example the batch file is located at c:\development\cassandra\bin\cassandra.bat)
    cd c:\development\cassandra\bin\
    .\cassandra.bat
  5. You can verify that Cassandra is running, by trying to connect to the server.  To do this open a new cmd and run the cassandra-cli.bat file from the bin directory.

    cd c:\development\cassandra\bin\
    .\cassandra-cli.bat
    connect localhost/9160

This is easy to get running, but there is some manual process that you have to go through each time to get the server running. In the future when you want to start up the Cassandra database for development, just repeat Step 4.

Running From A Linux Virtual Machine

The other way, to run Cassandra is through a Virtual Machine running Linux.  This setup is just as easy as the Windows setup as long as you have some experience with Linux.  I am going to start the install steps assuming that you have already installed Ubuntu Server.  This way the process is generic if you want to run the server on an physical or virtual machine.

  1. Make sure you update.

    sudo apt-get update
    sudo apt-get upgrade
  2. Open up your apt-get sources list with nano for editing.

    sudo nano /etc/apt/sources.list
  3. Add the following Apache Cassandra sources to the list.
    deb http://www.apache.org/dist/cassandra/debian unstable main
    deb-src http://www.apache.org/dist/cassandra/debian unstable main

    After you add these two lines, press CTRL+X to close Nano. It’ll ask “Save modified buffer?” Press Y. Press Enter when Nano asks “File Name to Write.”

  4. Update your apt-get sources to get the latest information about Cassandra in to your sources repository.

    sudo apt-get update
  5. At this point you are going to get an error, don’t freak out, this is totally expected.  It will look something like this:

    W: GPG error: http://www.apache.org unstable Release: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY F758CE318D77295D
  6. Use the following three commands to import the signature in to your repository.

    gpg --recv-keys F758CE318D77295D
    sudo apt-key add ~/.gnupg/pubring.gpg
    sudo apt-get update

    NOTE: You must replace the key value ‘F758CE318D77295D’ with the key value you received in your error message.

  7. Now is the big moment to install Cassandra.

    sudo apt-get install cassandra
  8. Now just restart your machine, to do it from the command line run the following:

    sudo reboot

Unlike Windows to startup this Cassandara database so that you can start using it for development, you just need to boot up the physical or virtual machine and it should be ready for development.

The Thrifty Way To Connect

Thrift is a protocol also developed by Facebook to create agnostic interfaces for all the common languages based off a simple configuration file.  The Thrift team describes their mission as:

Thrift is a software project spanning a variety of programming languages and use cases. Our goal is to make reliable, performant communication and data serialization across languages as efficient and seamless as possible.

As I found out there wasn’t any good way to build the thrift executable in Windows, which is another reason the Linux VM came in handy.  The thrift executable is needed to generate the protocol interface for Cassandra.  You can read how to build the thrift executable here for your operating system.

After you have the executable you need to run it against the Cassandra interface definition.  I am not going to go much into depth with this, because this part of the process is pretty well documented on the internet, but here is the command I ran to create the generated Cassandra interface for C#.

thrift -gen csharp cassandra.thrift 

I am not going to leave you high and dry, I have already generated the Cassandra Thrift interfaces for the following languages for you:

Note: these where generated based on Cassandra 0.5.1, so that may and probably won’t work with future releases of Cassandra.

You can find the necessary supporting libraries for each of the above languages by going to http://svn.apache.org/viewvc/incubator/thrift/trunk/lib/.

Conclusion

If you are like me and interested in using it with a .NET application here is a quick demonstration class that will help you get started. 

https://code.google.com/p/coderjournal/source/browse/trunk/Posts/2010/03/CassandraDemo.cs

But no matter the language you use, I hope this has provided you a good jump start with Cassandra.