09 Dec 2008

Creating an extension module for .NET URL Rewriter and Reverse Proxy

4 Comments Uncategorized

Wow that is a long title. Recently I have been looking for quick posts that I can put out each day to keep my blog relevant and also so I don’t feel like I am slacking off too much. Today I want to post about a little known feature in my .NET URL Rewriter and Reverse Proxy (aka. Managed Fusion URL Rewriter) that I have developed in my spare time, mostly out of necessity for this blog and other projects I have worked on.  Here is a quick run through of what it does.

Managed Fusion URL Rewriter is a powerful URL manipulation engine based on the Apache mod_rewrite extension. It is designed, from the ground up to bring all the features of Apache mod_rewrite to IIS 6.0 and IIS 7.0. Managed Fusion Url Rewriter works with ASP.NET on Microsoft’s Internet Information Server (IIS) 6.0 and Mono XPS Server and is fully supported, for all languages, in IIS 7.0, including ASP.NET and PHP. Managed Fusion Url Rewriter gives you the freedom to go beyond the standard URL schemes and develop your own scheme.

But one feature that I added that is not part of the official Apache mod_rewrite documentation is the ability to add custom modules to extend the use of the URL rewriter in non-traditional ways.  One great example of this was born out of wanting to clean up the SEO mess I created in the early days of this blog.  I had to support the following different types of URL patterns:

  1. http://www.coderjournal.com/?p=23
  2. http://www.coderjournal.com/2008/03/14/some-post.html
  3. http://www.coderjournal.com/2008/03/14/some-post

to transform them in to the URL pattern that I finally settled on today:

  • http://www.coderjournal.com/2008/03/some-post

In the above list #2 and #3 were pretty easy to transform using the following rules:

RewriteRule ^(/[0-9]{4}/.*).html$    $1/ [NC,R=301]
RewriteRule ^(/[0-9]{4}/[0-9]{1,2}/)[0-9]{1,2}/(.*)$    $1$2 [R=301]

Because they contained all of the elements that make up my current URL.  As you can imagine problems arose when I had to support links that used #1′s syntax.  It contains zero elements that I can use to create my current URL.  Being a programmer who beleives that each part of a system should handle gracefully the domain it was designed to support, in this case a URL rewriter should be able to handle any senario that has to do with URL rewriting.  I added in support that allowed developers to naturally extend the URL rewriter to accomplish any type of URL rewriting task they could think of.

Setting Up the URL Rewriter Rules

In my case I needed to handle the following SQL query everytime I saw a URL that matched #1.

select concat('http://www.coderjournal.com/',year(post_date),'/',month(post_date),'/',post_name,'/') from wp_posts where ID = $1;

What this query does is query the WordPress database table that contains all the posts by the post ID and have it return the actual absolute path to the post, that should be displayed in the URL.  To do this I created a new directive for the mod_rewrite syntax called RewriteModule.  I also had to extend the RewriteRule and RewriteCond directives to support these new module extensions.  The RewriteModule, RewriteRule, and RewriteCond are defined by the following syntax:

RewriteModule <Reference Name> <Namespace>,<Assembly>
RewriteRule[([<Left Module>],[<Right Module>])] <Pattern> <Substitution>
RewriteCond[([<Left Module>],[<Right Module>])] <Test String> <Condition Pattern>

The parts in light blue parts above are optional to creating the rule.  In my case for this blog the rewriter directives looked like the following:

RewriteModule PostQueryString CoderJournal.Rewriter.Rules.PostQueryStringRuleAction, CoderJournal.Rewriter.Rules
RewriteRule(,PostQueryString)   ^/\?p=([0-9]+)$    "select guid from wp_posts where ID = $1;" [R=301]

I have highlighted in red the important parts of the syntax that indicate the custom module processor that should be used on the RewriteRule directive and how it relates back to the class defined in the RewriteModule

Creating the Module

I have to warn you that I am not going to demonstrate and show all the properties and methods on the interface that are important for creating a custom module, but I am going to show you the actual meat of the module that is involved in the lookup of the URL from the database.

public Uri Execute(int logLevel, string logCategory, HttpContext context, 
                   Pattern pattern, Uri url, string[] conditionValues, 
                   IDictionary<string, string> flags)
{
	string inputUrl = url.GetComponents(UriComponents.PathAndQuery, UriFormat.UriEscaped);
	string sqlCommand = pattern.Replace(inputUrl, Text, conditionValues);
	string substituedUrl = String.Empty;

	using (MySqlConnection connection = new MySqlConnection(Properties.Settings.Default.DatabaseConnection)) {
		using (MySqlCommand command = connection.CreateCommand()) {
			command.CommandText = sqlCommand;
			command.CommandType = CommandType.Text;

			try {
				connection.Open();
				substituedUrl = command.ExecuteScalar() as string;
			} finally {
				connection.Close();
			}
		}
	}

	return new Uri(url, substituedUrl);
}

It may not be clear right away what is going on, but on line 6, I am replacing the defined value in the regular expression (^/\?p=([0-9]+)$) with the SQL query (from above) to produce a query that will be run against the database. So if the following URL came in to my server:

It would produce a SQL query that looked like this:

select concat('http://www.coderjournal.com/',year(post_date),'/',month(post_date),'/',post_name,'/') from wp_posts where ID = 372;

Notice that the ID, 372, shows up in both the URL and the query, that is because this is the part I am most interested in, in the URL, because it is the only part of the URL that I need to query the database to find the actual path of the post.

Now that we have the query we can execute it on the database, using lines 9 through 21, and create the resulting URL on line 23. The resulting URL is then passed back through the URL rewriter, and processed using the flags defined. In my case [R=301], actually indicates that I want to do a 301 Permanent Redirect on the URL, which tells the browser and search engines, a like, that they need to update their URL for this page.

You can test out the above conditions by using the following URL’s that all redirect back to this page:

  1. http://www.coderjournal.com/?p=372
  2. http://www.coderjournal.com/2008/12/9/creating-extension-module-net-url-rewriter-reverse-proxy.html
  3. http://www.coderjournal.com/2008/12/9/creating-extension-module-net-url-rewriter-reverse-proxy/

The code as always is available on my SVN server at Google Code.

I hope this comes in handy to some of you developers that have to support legacy URL’s in your own product or a project that you are working on. As always if you have any questions or need anything clarified please feel free to contact me or leave a comment below.

05 Mar 2008

Your Impressions of Coder Journal’s Design

No Comments Uncategorized

So today it was brought to my attention that the design of my blog needed work. Since good design is a very subjective term, much like good programming:

your program (n): a maze of non-sequiturs littered with clever-clever tricks and irrelevant comments. Compare MY PROGRAM.

my program (n): a gem of algorithmic precision, offering the most sublime balance between compact, efficient coding on the one hand, and fully commented legibility for posterity on the other. Compare YOUR PROGRAM.

Please tell me your impressions, of my blog, in the comments below. I would like to see constructive actionable comments, that I can work toward implementing, around the ease of reading, layout, and usability.  That is what I am really interested in hearing about.

You can tell me what you think of the colors but honestly much like personal tastes in cars, food, and everything else, it is usually very superficial and relies on personal preferences more than industry recognized usability problems.  My personal preferences, since it is my blog, is to use strong colors right next to each other to show strong lines, instead of gradients, because strong lines give the sense of strength and professionalism.

Honestly, if I was to break it down, I just like the look of a Orange, Blue, and Brown, I believe they provide nice contrast to each other and have an almost academic look.  If I was to sum up my style I would say the Power Point Theme Median, as seen below, is the closest I have ever seen to My Personal Style Tastes.

Power Point ExamplePower Point Example 2

So please let me here your comments, about my blog, on:

  1. Ease of Reading
  2. Layout
  3. Usability

I will take them all very seriously.

10 Feb 2008

How to use the .NET URL Rewriter and Reverse Proxy to run WordPress on IIS

4 Comments Uncategorized

First off I would like to say that many of my readers are very intelligent, they picked up on a one line sentence in my last post about my new design and Coder Journal switching from Linux to Windows.

I also moved hosts from GoDaddy’s shared Linux hosting. To GoDaddy’s virtual dedicated hosting on Windows. This proved difficult since URL Rewriting isn’t currently built in to IIS 6.0 like it is in Apache. I will talk a little about this setup in a later post.

Switching from Linux to Windows wasn’t the part that really intrigued many of them, it happens every day so why would it? It was the fact that I was able to get the same level of URL Rewriting out of IIS 6.0 as I was out of Apache’s mod_rewrite and still be able to make WordPress look and function like it was running on Apache.

So to get started I just want to say, while I know there are other solutions out there to get WordPress hosted on IIS with the exact same outcome as what I am going to present below. I did this for the following reasons:

  1. I am a .NET guy and I love developing software that is popular on other platforms on .NET just to see if it can be done.
  2. I also believe in Eating One’s Own Dog Food, and the URL Rewriter and Reverse Proxy that I am presenting below, and that is used in Coder Journal, is my own creation.

What This Post Covers

This post is meant to provide an insight in to a technology, Reverse Proxy, that many developers are unaware of and it will be demonstrated through the eyes of my blog and how it works in regards to WordPress/IIS 6.0. Some of the basics will be covered such as the working of a URL Rewriter and Reverse Proxy. This post will not cover how to code a URL Rewriter or Reverse Proxy in C#. The reader should also have a basic understanding of how RegEx, HTTP, and URL Rewriters.

The Problem

On IIS 6.0, and previous versions, due to a lack of any standardized URL Rewriting process built in, so developers have to take nice visitor and SEO friendly URL’s like this:

http://www.coderjournal.com/2008/02/10/sample-post/

And make IIS 6.0 compatible ugly URL’s, which may or may not be SEO friendly, and neither URL is as visitor friendly as the one above.

http://www.coderjournal.com/?p=123
http://www.coderjournal.com/index.php/2008/02/10/sample-post/

My Solution Used On Coder Journal

The solution I choose was influenced by a number of factors, a couple that will change for the better when IIS 7.0 is released. The factors are:

  • I need to run PHP for WordPress.
  • I need to run FastCGI for IIS 6.0 to get the best performance out of PHP.
  • .NET and PHP run separate from each other, so I cannot use a .NET URL Rewriter to control which PHP file is chosen to run. (This changes in IIS 7.0 with Integrated Pipelines)
  • I need to pass all requests to www.coderjournal.com through .NET, which has a performance loss for rendering static files such as image, and text files. (This changes in IIS 7.0 with Integrated Pipelines)
  • I need to keep the URL’s friendly for visitors and SEO.

So because of what I listed above I needed to create two web servers to host www.coderjournal.com, which I will talk about later on in this article. One of the servers is the public interface to www.coderjournal.com, which I will call frontend, and the other is the Backend WordPress web server, which I will call backend that only handles standard WordPress with the ugly URL’s listed above, this one is not public. The picture will demonstrate the structure better than I can explain.

Coder Journal Web Structure

As you can see, from the above picture, all requests to WordPress are handled by the frontend server for this blog. This all happens through a technique known as Reverse Proxy.

A reverse proxy dispatches in-bound network traffic to a set of servers, presenting a single interface to the caller. For example, a reverse proxy could be used for load balancing a cluster of web servers. In contrast, a forward proxy acts as a proxy for out-bound traffic. For example, an ISP may use a proxy to forward HTTP traffic from its clients to external web servers on the internet; it may also cache the results to improve performance.

So with out going in to a deep explanation of how I was able to accomplish the reverse proxy, basically for every request that comes in to frontend server that meets a certain criteria I make another HTTP web request to the backend server and then write it’s response back to the original frontend server request.

Step 1 – Setting Up .NET to Process All Requests

Setup your frontend server to process everything through the .NET framework.

  1. Open IIS and right-click on the website and select Properties.
  2. Click the Configuration button under Application Settings section
  3. Click the Insert… button to create a new wildcard mapping
  4. Set the executable textbox to aspnet_isapi.dll file location.
    for .net 2.0, 3.0, 3.5: C:WindowsMicrosoft.NETFrameworkv2.0.50727aspnet_isapi.dll
  5. Make sure the checkbox Verify that file exists is not checked.
  6. Press OK to confirm and close all the windows.

Step 2 – Install PHP/WordPress

Just follow this article on IIS.NET for installing PHP/WordPress on IIS 6.0. You may also want to install FastCGI, I recommend this, but it is optional.

Step 3 – Setting Up the URL Rewriter and Reverse Proxy Rules

The criteria for the requests are put inside the URL Rewriter Rules files. But before the proxy request is made, I must check to make sure the file being requested doesn’t already exist on the frontend server. If it does exist on the frontend server I don’t want to make a reverse proxy request. The following is the code used to do that.

# any file that exists just return it
RewriteCond %{REQUEST_FILENAME} -f
RewriteRule ^(.*) $1 [L]

Then after I check to make sure the file doesn’t exist on the frontend server I make the request to the backend using the following rules.

# proxy all connections through to the backend server
RewriteRule ^(/[0-9]{4}/.*) http://backend/index.php$1 [P]
RewriteRule ^(/tags/.*) http://backend/index.php$1 [NC,P]
RewriteRule ^(/topics/.*) http://backend/index.php$1 [NC,P]
RewriteRule ^(/author/.*) http://backend/index.php$1 [NC,P]
RewriteRule ^(/comments/feed/.*) http://backend/index.php$1 [NC,P]
RewriteRule ^(/page/.*) http://backend/index.php$1 [NC,P]
RewriteRule ^(.*) http://backend$1 [P]

Conclusions

To get the exact same setup as I have, you will need the following software, which is all free for download:

As always if you have any questions about the setup or the performance please post them below in the comments and I will answer them and or update the post as needed.

Happy Coding.