Archive for March, 2010

30 Mar 2010

Cassandra Jump Start For The Windows Developer

4 Comments FluentCassandra

Recently I have been exploring the NoSQL options for .NET and specifically a database called Cassandra.  In case you haven’t heard of Cassandra before, it is a decentralized, fault-tolerant, elastic database designed by Facebook for high availability.  As Wikipedia describes it:

Cassandra is an open source distributed database management system. It is an Apache Software Foundation top-level project, as of February 17, 2010, designed to handle very large amounts of data spread out across many commodity servers while providing a highly available service with no single point of failure. It is a NoSQL solution that was initially developed by Facebook and powers their Inbox Search feature. Jeff Hammerbacher, who led the Facebook Data team at the time, has described Cassandra as a BigTable data model running on an Amazon Dynamo-like infrastructure.

I bet you have used data that has been served by Cassandra and not even realized it, here are some prominent users of Cassandra:

  • Facebook
  • Digg
  • Twitter
  • Reddit

Sounds interesting or at least worth a look, right?  Well I thought so, however during my journey of getting the database setup I have come to realize there is almost no documentation on installation for Linux, and even less for Windows.  So I am going to provide you with a jump start to installing Cassandra on your machine.  I am doing this so you don’t have to spend days jumping around the web, going down false paths, and pulling your hair out like I did, all so you can get on to what you really care about … development.

First Things First

The first thing you need to understand about Cassandra is that it is developed in Java.  So you can run it on any machine that supports Java 6 or better.  So before you go any farther make sure you Java JRE is updated to the latest version.

The next thing you need is a copy of Cassandra.  Which can be found here.  My setup is going to be based off of the latest stable release. 

Running From Windows

As I said before you can run from an operating system that Java has a runtime for.  So the first and probably most obvious one for a Windows developer, is running Cassandra on Windows.  To install Cassandra on windows just follow these steps:

  1. Extract Cassandra to a directory of your choice (I used c:\development\cassandra)
  2. Set the following environment variables
    1. JAVA_HOME (To the directory where you install the JRE, this should not be the bin directory)
    2. CASSANDRA_HOME (To the directory you extracted the files to in step 1)
  3. Modify your Cassandra config file as you like and don’t forget to update the directory locations from a UNIX like path to something on your windows directory (in my example the config file is located at c:\development\cassandra\conf\storage-conf.xml)
  4. Open cmd and run the cassandra.bat file (in my example the batch file is located at c:\development\cassandra\bin\cassandra.bat)
    cd c:\development\cassandra\bin\
    .\cassandra.bat
  5. You can verify that Cassandra is running, by trying to connect to the server.  To do this open a new cmd and run the cassandra-cli.bat file from the bin directory.

    cd c:\development\cassandra\bin\
    .\cassandra-cli.bat
    connect localhost/9160

This is easy to get running, but there is some manual process that you have to go through each time to get the server running. In the future when you want to start up the Cassandra database for development, just repeat Step 4.

Running From A Linux Virtual Machine

The other way, to run Cassandra is through a Virtual Machine running Linux.  This setup is just as easy as the Windows setup as long as you have some experience with Linux.  I am going to start the install steps assuming that you have already installed Ubuntu Server.  This way the process is generic if you want to run the server on an physical or virtual machine.

  1. Make sure you update.

    sudo apt-get update
    sudo apt-get upgrade
  2. Open up your apt-get sources list with nano for editing.

    sudo nano /etc/apt/sources.list
  3. Add the following Apache Cassandra sources to the list.
    deb http://www.apache.org/dist/cassandra/debian unstable main
    deb-src http://www.apache.org/dist/cassandra/debian unstable main

    After you add these two lines, press CTRL+X to close Nano. It’ll ask “Save modified buffer?” Press Y. Press Enter when Nano asks “File Name to Write.”

  4. Update your apt-get sources to get the latest information about Cassandra in to your sources repository.

    sudo apt-get update
  5. At this point you are going to get an error, don’t freak out, this is totally expected.  It will look something like this:

    W: GPG error: http://www.apache.org unstable Release: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY F758CE318D77295D
  6. Use the following three commands to import the signature in to your repository.

    gpg --recv-keys F758CE318D77295D
    sudo apt-key add ~/.gnupg/pubring.gpg
    sudo apt-get update

    NOTE: You must replace the key value ‘F758CE318D77295D’ with the key value you received in your error message.

  7. Now is the big moment to install Cassandra.

    sudo apt-get install cassandra
  8. Now just restart your machine, to do it from the command line run the following:

    sudo reboot

Unlike Windows to startup this Cassandara database so that you can start using it for development, you just need to boot up the physical or virtual machine and it should be ready for development.

The Thrifty Way To Connect

Thrift is a protocol also developed by Facebook to create agnostic interfaces for all the common languages based off a simple configuration file.  The Thrift team describes their mission as:

Thrift is a software project spanning a variety of programming languages and use cases. Our goal is to make reliable, performant communication and data serialization across languages as efficient and seamless as possible.

As I found out there wasn’t any good way to build the thrift executable in Windows, which is another reason the Linux VM came in handy.  The thrift executable is needed to generate the protocol interface for Cassandra.  You can read how to build the thrift executable here for your operating system.

After you have the executable you need to run it against the Cassandra interface definition.  I am not going to go much into depth with this, because this part of the process is pretty well documented on the internet, but here is the command I ran to create the generated Cassandra interface for C#.

thrift -gen csharp cassandra.thrift 

I am not going to leave you high and dry, I have already generated the Cassandra Thrift interfaces for the following languages for you:

Note: these where generated based on Cassandra 0.5.1, so that may and probably won’t work with future releases of Cassandra.

You can find the necessary supporting libraries for each of the above languages by going to http://svn.apache.org/viewvc/incubator/thrift/trunk/lib/.

Conclusion

If you are like me and interested in using it with a .NET application here is a quick demonstration class that will help you get started. 

https://code.google.com/p/coderjournal/source/browse/trunk/Posts/2010/03/CassandraDemo.cs

But no matter the language you use, I hope this has provided you a good jump start with Cassandra.

21 Mar 2010

Editable MVC Routes (Apache Style)

No Comments Uncategorized

Since writing yesterday’s post about what annoys me regarding the limited insight most web developers have in regards to Routing vs Rewriting.  It occurred to me that I might be able to make the difference and benefits between the two more clear, after remembering a post Phil Haack wrote about Editable MVC Routes.

By taking my companies already production ready URL Rewriter that supports runtime-editing of rewriter rules and adding support for routes.  I would essentially be merging together Routing and Rewriting in the same configuration, and making the routes just as editable as the rewriter rules.  By doing this, my hope is that it should illustrate the benefits of having both a Rewriter as well as a Router in your web arsenal, because you can play with both in real time and start to connect in your mind when one is more useful than the other.

I started with the latest release of my companies URL Rewriter and created a contrib project on GitHub that extended the Apache support to also include System.Web.Routing configuration.  The syntax looks similar to the Apache mod_rewrite but specific for routes.  Here is an example of what the config, with both routes and rewriting rules in it, might look like:

RewriteEngine On

#
# Start Rewrite
#

# force to HTTPS
RewriteCond %{HTTPS} (off) [NC]
RewriteRule ^(.*)$ https://www.somesite.com$1 [QSA,L]

# force non-www
RewriteCond %{HTTP_HOST} !^somesite.com$ [NC]
RewriteRule ^(.*)$ http://somesite.com$1 [R=301,L]

# add a trailing slash
RewriteRule ^([^.?]+[^.?/])$ $1/ [R=301,L]

#
# Start Routes
#

RouteDefault controller Home
RouteDefault action Index
RouteUrl {controller}/{action}/{id} "Default"

Notice the standard Route that comes default with an ASP.NET MVC project at the bottom of the traditional rewriter config.  As a planned side-effect of using the Managed Fusion URL Rewriter the routes are completely editable, so they can be changed with out a recompile and with out a restart of the application pool.  So I see those as two huge benefits of using a method like this, on top of illuminating the differences and advantages of using both System.Web.Routing and a URL Rewriter in your application.

The code that I have provided via GitHub provides the following commands that you can use to control your routes:

  • RouteUrl <url>  “<name>”
  • RouteDefault <url-part>  <default>
  • RouteConstraint <url-part>  <regex-constraint>
  • RouteNamespace <namespace>
  • RouteIgnoreUrl <url>

So I encourage you to check this code out and provide feedback, as I aim to continue to grow this code base where I can to make it easier to have your rewriter and your routes in the same place. Because even though I created it for demonstration reasons, I realize that it may have more of a use for others out there that need to support runtime-editable routes in their application.

Note to readers that have just read my last post: You maybe wondering why I compared routes to unchangeable roads in my last post, and then in this one give people the ability to edit them in real time.  Well I did state in the beginning of this post that the main goal of this project above was to illustrate the differences in real-time of routing and rewriting with out the need to recompile.  Also Phil did concede there are times when routes would be better as dynamic code that can be changed without a recompile:

Having said that, there are many situations in which the ability to change an application’s routes without having to recompile the application comes in very handy. This is the situation I find myself in as I build a blog engine where the folks who will install may want to tweak the routes without having to recompile the blog’s source code.

Update: Here is what you put in your Web.config file to get this working. I have included the whole snippet but the important part is the engineType attribute in the XML below:

<managedFusion.rewriter xmlns="http://managedfusion.com/xsd/managedFusion/rewriter">
    <rules engine="Other" engineType="ManagedFusion.Rewriter.Contrib.RoutingApacheEngine, ManagedFusion.Rewriter.Contrib">
        <apache defaultFileName="rewriter.txt"/>
    </rules>
</managedFusion.rewriter>
20 Mar 2010

The difference between Routing and Rewriting

No Comments Uncategorized

As most of you are probably aware, if you read my blog enough, I am the sole developer of a URL Rewriter that I have tried to keep extensible and relevant to the problems that modern web developers face when exposing their applications to the web, by allowing them to have more control over the only interface that matters on the web … THE URL.  The benefits of a URL Rewriter have been explained many times, by many people, so I am not going to add just another rant to the web about keeping your URL’s clean for the search engines.  I will just leave you with Jeff’s explanation of why you shouldn’t ignore the URL.

Having multiple URLs reference the same content is undesirable not only from a sanity check DRY perspective, but also because it lowers your PageRank. PageRank is calculated per-URL. If 50% of your incoming backlinks use one URL, and 50% use a different URL, you aren’t getting the full PageRank benefit of those backlinks. The link juice is watered down and divvied up between the two different URLs instead of being concentrated into one of them.

While Jeff only focuses on the reasons related to SEO, there are many other reasons to make your URL’s “look-and-feel” a high priority.  One that is often touted as a wonderful reason to use a URL Rewriter is to produce pretty looking URL’s, and even though this one of many reasons to use a rewriter, it is really a small part of why you want to have a URL Rewriter in your arsenal as a web developer.  Other reasons include forcing your domain to a constant www vs non-www address, having helper URL’s such as http://www.microsoft.com/sql that redirect to their actual location, and many others.

routing_engine

However, since Microsoft released the System.Web.Routing framework the benefits for using a URL Rewriter have been blurred, because the routing framework gives developers more of an ability to control the URL and thus create prettier URL’s than have traditionally been possible.  Because of this overlap of efforts, in the router and rewriter, in making a more readable URL a misunderstand has been created about the functions and benefits that each provide to the modern web developer.

The first thing to understanding the difference between routes and a rewriter. Phil Haack explains on his blog the reasons routes were not designed to be changeable without a recompile:

This is partly by design as routes are generally considered application code, and should have associated unit tests to verify that the routes are correct. A misconfigured route could seriously tank your application.

In other words the route which is compiled “application code” is like a road, and like the properties of a road it provides a way to get between the starting point and the destination, or in the case of the web the client browser requesting a URL as the start point and your action method as the destination.  If this road could be changed with out much thought, it would be possible to create a circumstance where your destination is no longer accessible by the road. The rewriter on the other hand can be looked at as the rules of the road used to detour traffic, govern the speed, give direction, and really just provide flexibility on top of the rigid start and end points of the road.

When I try to explain this to fellow developers I often have a conversation that goes something like this:


[ME] Why are you not using a rewriter in your ASP.NET MVC application to give you better control over your URL routes.  So that you provide a consistent domain, helper URL’s, and more flexibility to the running of your web application?
[THEM] I don’t need a rewriter, I use ASP.NET MVC for creating pretty URL’s and Routing rocks.
[ME] Sigh, I never said anything about pretty URL’s. The benefits of a URL Rewriter go way beyond making your URL pretty.
[THEM] I don’t see how. The interweb always talks about pretty URL’s and rewriters.
[ME] Well, Routing is like namespaces for your actions they just provide a web accessible name to get directly to your action method, they don’t act as a rule engine on what types of requests to let through, what type to redirect, and where the request should go.  That is why you need a URL Rewriter in addition to Routing.  Think of a route as a road, and the rewriter the rules you use to drive on that road.
[THEM] I like driving fast in my Prius.
[ME] Double Sigh. Lets focus here for a minute.  Lets get back on topic.
[THEM] Yeah but so what I don’t need any of that mumbo jumbo, I just want pretty URL’s because that is all that people talk about on the interweb, and that is how you get to #1 on Google.
[ME] Fine Good Luck with your PageRank, come back to me in a year when you are still at the exact same rank in Google and ready to listen.


It has gotten really to the point where I start picking the people I want to have this conversation with based on if they are actually willing to listen and understand enough of the basics of SEO and HTTP so that my conversation is not lost on them.

If you have gotten this far in to my rant on the differences of routing and rewriters, you are probably somebody who generally cares and already understands the difference or wants to know more.  If you are that person, I would love to talk to you about what kind of enhancements to my companies URL Rewriter that would make your life easier as a web developer.  As I start to line up the features for the 4.0 release.