20 Sep 2012

Sometimes a nanosecond makes all the difference

3 Comments FluentCassandra

In the Cassandra database there is a type known as TimeUUID. Which I have talked about a couple times on my blog and even created a pretty well received class for generating them. This type is typically used for log data, because it helps you create a unique value in the database that has an extractable timestamp. Because of how .NET creates DateTime.Now you rarely get a resolution smaller than a millisecond in the DateTime, even though DateTime supports a notion of a tick which is equivalent to 100 nanoseconds or 1/10,000 of a millisecond. As you can see there is plenty of room for more resolution, and this extra room not being used causes pain when you need a resolution smaller than milliseconds, which many high performance logging situations demand, so that all your log entries are put in order especially when you are receiving more than one in a millisecond time span.

So you may be asking yourself what does this have to do with Cassandra and TimeUUID.  Well it was brought up to me today that because of the limitations of DateTime duplicate TimeUUID were being created in FluentCassandra, which is obviously not something you want to see for a value that claims it self to be unique.  So today I set out to provide more resolution to the DateTime type, so that FluentCassandra could create unique TimeUUID’s using tick resolution instead of millisecond resolution.

And I call my creation, aptly, DateTimePrecise.

public class DateTimePrecise
{
	private const long TicksInOneSecond = 10000000L;

	private readonly double _syncSeconds;
	private readonly Stopwatch _stopwatch;
	private DateTimeOffset _baseTime;

	public DateTimePrecise(int syncSeconds)
	{
		_syncSeconds = syncSeconds;
		_stopwatch = new Stopwatch();

		Syncronize();
	}

	private void Syncronize()
	{
		lock (_stopwatch) {
			_baseTime = DateTimeOffset.UtcNow;
			_stopwatch.Restart();
		}
	}

	public DateTimeOffset GetUtcNow()
	{
		var elapsedSeconds = _stopwatch.ElapsedTicks / (double)Stopwatch.Frequency;
			
		if (elapsedSeconds > _syncSeconds) {
			Syncronize();
			return _baseTime;
		}

		var elapsedTicks = Convert.ToInt64(elapsedSeconds * TicksInOneSecond);
		return _baseTime.AddTicks(elapsedTicks);
	}
}

DateTimePrecise uses the Stopwatch class which operates based your CPU’s frequency or clockrate. So you get very accurate measurements using Stopwatch. A very accurate measurement is exactly what we needed for our precise times. To create the precise time we create a base time of when the Stopwatch was started and then when the time is requested we add the elapsed time of the Stopwatch onto the base time. And this process is sycronized (or in other words reset) at defined intervals, so that the precise date time doesn’t diverge too radically from the system time.

You can find the source code for the above PreciseDateTime class here, which also includes some static helper methods so that you can just use the class and not have to worry about instantiating it, same way you do with DateTime.Now. A special thanks James Michael Hare needs to be made because his blog post inspired this class. Specifically the part where he shows the radical difference in precision between TimeSpan and the actual resolution of the Stopwatch.

For those interested, you can convert from Stopwatch ticks to seconds by using the Stopwatch.Frequency static property, which tells you the ration of Stopwatch ticks per second. Thus these two are (roughly, due to precision differences) the same:

// take the ElapsedTicks, then divide by Frequency to get seconds
Console.WriteLine("ElapsedTicks to sec:  {0}", sw.ElapsedTicks / (double)Stopwatch.Frequency);

// take the Elapsed property, and query total number of seconds it represents
Console.WriteLine("Elapsed.TotalSeconds: {0}", sw.Elapsed.TotalSeconds);

Which gives us results (on my machine) like:

ElapsedTicks to sec:  4.9998583024032
Elapsed.TotalSeconds: 4.9998583

Wow… look at those numbers they represent the exact same number, but there is an extra 6 digits of resolution on the ElapsedTicks.

I hope you find use in DateTimePrecise, and it will be released with the next release of FluentCassandra to NuGet.

17 Jun 2012

FluentCassandra Primer

No Comments FluentCassandra

Getting Started

To get started you have to understand the basic terminology of the Cassandra database. Unlike relational databases (i.e. SQL Server, MySQL, etc) Cassandra is what is known as a Key/Value pair database. The Cassandra data model has 4 main concepts which are cluster, keyspace, column family and super column.

  • Cluster (also called as ring) is several servers (or nodes) functioning together to act as a single Cassandra database occurrence. A cluster will contain at least one node, and each cluster can contain many keyspaces.
  • Keyspace can contain many column families.
  • Column Family contains multiple columns referenced by a record keys.
  • Column contains a name, value, and a timestamp. The column name can be a static label (such as “name” or “email”) or it can be set to a wide range of values (ex. a date of a log entry). The actual columns that make up a row are can be determined by the client application or pre set in a more traditional method using CQL.

To better understand what all this means lets do a naming remapping from relational databases to Cassandra.

Relational Cassandra
Database Keyspace
Table Column Family
Row Record
Column Column

There are more differences, but we will save those for a different time. The above is to give you general working knowledge of the naming used in Cassandra and by proxy in FluentCassandra.

First Query

using (var db = new CassandraContext(keyspace: "Testing", host: "localhost")) {
    var usersFamily = db.GetColumnFamily("Users");

    var adminUsers =
        from u in usersFamily.AsObjectQueryable<User>()
        where u.IsAdmin == true
        select u;

    Console.WriteLine("The following users are admins:");
    foreach(var u in adminUsers) {
        Console.WriteLine(u.UserName);
    }
}

One of the main goals of FluentCassandra was to produce easily readable code. Hopefully the above is somewhat easy to follow. But to be safe lets step through the above query line by line, to give you a first hand working knowledge as to whats happening.

Explanation

using (var db = new CassandraContext(keyspace: "Testing", host: "localhost")) {

On the first line we are creating a database context. In this context we are setting the keyspace and host that we want to run the queries against. The database context implements IDisposable, by wrapping it in a using block it ensures that all connections to the database are cleaned up when the context isn’t in use anymore.

  var usersFamily = db.GetColumnFamily("Users");

Next we are getting the column family reference. This reference is going to be the main way in FluentCassandra of interacting with the column families. This object contains many different paths for accessing Cassandra, the one I am going to show you next is via LINQ. We will discuss the other ways in a different article.

  var adminUsers =
      from u in usersFamily.AsObjectQueryable<User>()
      where u.IsAdmin == true
      select u;

The above should be very familiar to anybody who as worked with any form on LINQ in the past. The one thing that you may notice right off the bat is the following .AsObjectQueryable<User>() stacked on to the end of the column family. This method is nessisary for querying the column family using an object, column families by default don’t have objects associated with them, because the columns in the column family can vary by length and type across records.

    Console.WriteLine("The following users are admins:");
    foreach(var u in adminUsers) {
        Console.WriteLine(u.UserName);
    }
}

The last part doesn’t need much explaining. It just loops through all of the user objects pulled from the database and writes them out to the command line.

More detailed example

To see a more detailed example that sets up the keyspace and column families programatically, and uses the different methods of querying, inserting, updating and deleting allowed by FluentCassandra. Please go to: https://github.com/managedfusion/fluentcassandra/blob/master/test/FluentCassandra.Sandbox/Program.cs

17 Jun 2010

Run Cassandra As A Windows Service

14 Comments FluentCassandra

One of the main issues that comes up over and over again for Cassandra is:

How do I run Cassandra as a Windows Service?

In this post I am going to answer that question and in the process demonstrate how to do it in less than 10 minutes.

Background

Cassandra is mainly developed by Linux developers so very little attention has been paid to the Windows developer or administrator as far as Cassandra goes.  So as Windows developers we have to hop through a couple more hoops than just clicking on an install.exe file and and letting it do all the work.  However lucky for us, those hoops are easy and quickly hopped through.

Step 1

If you haven’t done so already please read my jump start for Windows users on install Cassandra, this guide will get you ready for the next steps.

Step 2

The second step is also an easy one, you need to download a package called RunAsService, which provides the ability to run any program as a Windows Service.

After you have downloaded the file extract the contents to a directory of your choosing.  (I extracted it to c:\RunAsService)

Note: RunAsService was originally developed here, however I recompiled it to run on .NET 2.0.

Step 3

To install RunAsService open up a command prompt with Administrative privileges and run this command.

cd c:\RunAsService
install networkservice

This registers RunAsService with your Windows Service.  Make sure to keep your command prompt open because you will need it for the 5th step.

Step 4

To configure RunAsService for Cassandra open up the RunAsService.exe.config file in your favorite text editor and replace <service.settings> section with the following so that it looks like this:

<!-- Services configuration -->
<service.settings>
    <!-- Run Cassandra as a service -->
    <!-- My Cassandra install path is C:\apache-cassandra\ -->
    <service>
        <name>Cassandra Database</name>
        <executable>C:\apache-cassandra\bin\cassandra.bat</executable>
        <parameters></parameters>
    </service>
</service.settings>

After you have finished, save the config file and exit your text editor.

Note: My Cassandra install is in c:\apache-cassandra\ you will have to correct the config above for where you installed it if different than mine.

Step 5

The last and final step of this process is to start the RunAsService service.  You can either do it through the Services control panel or just type the following in to your command prompt.

net start runasservice

You should see a response in the command line saying that the service has been successfully started.  To verify that Cassandra has been started you can use the cassandra-cli.bat file:

cd c:\apache-cassandra\bin\
cassandra-cli.bat
connect localhost/9160

It should report that it is connected to the server if the service is running.  And with that we are done, and I told you it would only take about 10 minutes.