Pretty Good URLs

Mar 13, 2009
by:   Tim Stanley

With a little forethought and planning, it’s possible to create URLs that can be around the web for a long time and as Tim Berners-Lee pointed out cool URIs don't change.  Choose wisely though, you may have to live with the URLs for much longer than you think, or if your not careful, you may end up with too many URLs pointing to the same content.

Domains Without WWW

The first part of the URL includes the domain which may or may not include the www sub domain.  Leaving out the www as part of the domain name makes it easier for users to type.  However, removing the www sub domain has some technical side effects.  Using no sub domain in the URL means that cookies will be shared across all sub domains and cookies will be sent on all requests (including requests for static content).  Large cookies can lead to slightly more traffic and slower content processing.

Weather using the www sub domain or not, the non preferred URL domain should be redirected by performing a 301 redirect to the preferred URL domain. This keeps search engines clear on the preferred URL and keeps from splitting inbound link counts on search engines.

Examples:

URLs Without Extensions

File extensions don’t add value to users.  They are there for the server (and programmers) benefit.  Removing the need for the extension on a content only URL I believe helps users and helps with any long term migration issues.  Content on an Apache server running php may be on an IIS server running ASP.NET in the future and vice-versa.  Users shouldn’t have to know they have to specify .htm, .php, or .aspx after a name.

The biggest problem with removing extensions is what to decide to do with old content and how to create rules to rewrite the URL as needed.  For the longest time, Apache had an advantage over IIS 5/6 in this regard, but with IIS7 and the URL rewrite module, that has changed.

Don’t allow two URLs with and without extensions to point to the same content.  If you do decide to remove the required extensions, use a 301 redirect from the old to the new URL to ensure search engines only recognize one URL.

Examples:

  • http://example.com/post/Nikon-Lens-Rentals.aspx 
    becomes http://example.com/post/Nikon-Lens-Rentals
  • http://example.com/post/Nikon-Lens-Rentals.html 
    becomes http://example.com/post/Nikon-Lens-Rentals

Rewriting URL Extensions

In the same way that extensions can be removed, extensions can be rewritten to a different file type.  HTM or HTML is the most common destination type and masks the internal type (.php, .aspx, etc.) and technology used to drive a site.

Examples:

  • http://example.com/post/Nikon-Lens-Rentals.aspx 
    becomes http://example.com/post/Nikon-Lens-Rentals.htm

Distinct URLs

The URLs should be distinct.  A well though out structure can provide a clean ability to rewrite the URL should the platform be moved in the future.  The path and the URL should be distinct within the path.

Examples:

  • /page/*
  • /post/*
  • /category/*
  • /tag/*
  • /product/*
  • /authors/*

Adding Trailing Slashes

References to a sites root path should always include a trialing slash.  If the trailing slash isn’t provided on the root of the site, when it’s referenced, both Apache and IIS servers will respond with a 301 redirect and it doubles the traffic for the request.

A trailing slash should not be added to a URL with a file extension (i.e. one should not use http://example.com/post/Nikon-Lens-Rentals.aspx/ ). Beyond the root path, it is arguable if a trailing slash should be added to URLs without extensions.  If trailing slashes are are added for a site, the site needs to take into account the scenarios when trailing slashes are not provided by users or referrers, and the site needs to perform a 301 redirect to the appropriate URL.

Removing both the extensions and adding the trailing slashes means that the URL rewrite rules and handlers must be configured properly.

In IIS 6, this was more difficult to do on a platform that was hosted because ISAPI rewrite components had to be added to the server.  Hosting providers rarely could offer this support on IIS6.  With IIS7, and the URL rewrite module, and other configuration settings, a developer can write an HTTP Module that handles and rewrites URL’s without requiring any special permissions.

The search results from Yahoo and Microsoft Live search remove the slashes in the results displayed (although not from the actual destination links themselves) while Google search does not alter the slashes.

ASP.NET MVC allows links with and without trailing slashes and users must code for the 301 redirects for the preferred URL or risk having duplicate content and cutting SEO rankings.

Wordpress provides the ability to have links to all pages and posts with trailing slashes.  From what I read, Drupal wants to remove the trailing slashes.

After some extensive research, I believe my preference is the combination of removing extensions and to append trailing slashes to URLs.  Even though I’m an avid .Net lover, I’ve never really liked the fact that some-page.aspx displays for a site running ASP.NET. 

My secondary preference would be to rewrite the file extension to .htm and not append the trailing slash.

Don’t switch a site developed without trailing slashes to trailing slashes.  There will always be a million and one references internally that will never fully resolve correctly.  If appending trailing slashes is done, this needs to be done up front and at the beginning of development before a site goes live.

Examples:

  • http://example.com/post/Nikon-Lens-Rentals/

Use Dashes Not Spaces For Words

When creating a title for a page, separating the words with something appropriate and readable.  Some approaches I’ve seen use the underscore character.  This becomes difficult to discern because most URLs have some form of underline when displayed.

Other approaches I’ve seen use Camel Case.  The latter becomes difficult to read when case is ignored (CamelCase becomes camelcase).  This can also lead to some unanticipated interpretations when words run together (think Speed Of Art).

The link with spaces http://example.com/post/Nikon%20Lens%20Rentals.aspx (%20 = spaces) is inherently more difficult to read and type by hand than one where words are separated by dashes.

Examples:

  • http://example.com/post/Nikon-Lens-Rental.aspx (dashes)

Best Practices

These apply to externally facing URLs.  URLs used for administration or not available to the general public don’t require this level of URL

  1. Be practical.  Changing URLs strategies may likely break a lot of stuff and it’s a lot of work to get 100% all scenarios right.
  2. Choose URLs with distinct patterns (/category/*, /post/*, /page/*, etc.).
  3. URL’s without extensions are preferred.  If extensions are needed, use .htm or .html.
  4. Use Dashes not spaces to separate words in URLs.
  5. Pick a www sub domain strategy and stick with it and redirect to the alternative.
  6. Always put a trailing slash on the domain reference (i.e. http://tim-stanley.com/ ).
  7. Always put the trailing slash on directories.
  8. Never include the index page in the URL (default.htm, index.html, default.asp, default.aspx).
  9. Pick a trailing slash strategy and stick with it and redirect the alternative URL with / without the trailing slash to the proper target URL.

Astute readers may note that this site at the time may not have pretty URLs with all the best practices in place.  The chief reason; it’s a lot of work to remove extensions and get URL 301 redirects for older inbound links working correctly for all features on a site.  I knew this when I started the site, but I couldn’t fine a good URL rewriting solution on IIS6 at the time.  Now the site has moved to IIS7, I have a solution, but I’m not confident of all the redirect and the impact it will have on searches.

However, if your starting a new site, In my opinion, it’s good to start off with planned good clean and pretty URLs and it will be easier to maintain down the road.

References

Related Items