One feature shared by many social networking sites is "vanity" short profile
URLs. My Twitter page could have easily been the RESTfully predictable http://twitter.com/users/riscfuture, but thanks to short profile URLs it is http://twitter.com/riscfuture.
Even Facebook got in the game recently with their "Facebook Usernames" feature. Of course, in classic Facebook style, getting the vanity URL is a multi-step process with an application and the associated land-grab. At Scribd I kept it a little simpler, and I'm assuming you'd like to keep it simple for your Rails website as well.
In order for this system to work, we're going to have to lay down a few ground rules:
- No user whose username conflicts with a controller name can have a short URL. You can't sign up on Scribd with the username "documents" and prevent anyone from seeing their document list.
- No user whose username conflicts with another defined route can have a short URL. Remember that the routes file defines named or custom routes and resources, but with the default routes, normal controllers do not need an entry in that file.
- Users with reserved characters in their names must have these characters escaped or dealt with. If I sign up with the username "foo/bar", that slash can't be left unescaped, or the router will misunderstand the address.
- Usernames must be case-insensitively unique. Every browser expects scribd.com/foo to be the same as scribd.com/FOO.
- Any user who cannot be given a short URL for the above reasons must have a fallback URL. This is where you fall back to your less pretty /users/123 URL. (Or perhaps /users/123-foo-bar for SEO purposes.)
Note that it's not enough to simply build a list of your controllers and stick them in a validates_exclusion_of validation. You want to be able to claim new routes for yourself even if users have already signed up with conflicting logins, and gracefully revert those users to a fallback profile URL.
Ultimately the question we need to answer is this: Given a user name, will a vanity URL conflict with an existing route? There are a lot of really hard ways of going about this, many of which will break over time. I opted to go with the a reliable (if somewhat slow) way of doing this: I build a list of known routes, strip them down to their first path component, then build an array of these reserved names. A known route might be, for instance, /documents/:id; its first path component is "documents." Thus, a user whose login is "documents" cannot have a vanity URL.
There are some points to note for this system:
- You'll get a few false positives. If /documents/:id is a valid route, but /documents is not (say you had no index action), this system would still disallow a user named "documents". You can easily solve this by tweaking the code below, though.
- No attention is paid to HTTP methods. Theoretically, if you had a route like /upload whose only acceptable method is POST, you could still use GET /upload to refer to a user named "upload". I have intentionally avoided doing this, however; good web design dictates that varying the HTTP method of a request only varies the manner in which you interact with the resource represented by the URL; a single URL should represent the same resource regardless of which method is used in the request.
In order to eke speed out wherever we can, we generate the list of reserved routes once, at launch, and cache it for the lifetime of the process. We do this in a module in lib/:
module FancyUrls
def self.generate_cached_routes
# Find all routes we have, take the first part (/xxx/) and remove some unwanted ones
@cached_routes = ActionController::Routing::Routes.routes.map do |route|
segs = route.segments.inject("") { |str, s| str << s.to_s }
segs.sub! /^\/(.*?)\/.*$/, '\\1'
# Some routes accept a :format parameter (ratings.:format).
segs.sub! /\.:format$/, ''
segs
end
# All possible controllers for /:controller/:action/:id route
@cached_routes += ActionController::Routing.possible_controllers.map do |c|
# Use only the first path component for controllers with multiple path components
c.sub /^(.*?)\/.*$/, '\\1'
end
@cached_routes.uniq!
# Remove routes whose first path component is a variable or wildcard
@cached_routes.reject! { |route| route.starts_with?(':') or route.starts_with?('*') }
# Remove the root route.
@cached_routes.delete '/'
end
def self.cached_routes
@cached_routes
end
end
The top method combines two arrays: the first, a list of routes from the defined routes, and the second, a list of the app's controllers. It then filters out some non-applicable routes and stores the list in an instance variable. The list consists of only the first path component of a route.
The method is called generate_cached_routes because it's called when the server process starts, as part of the environment.rb file. The cached results are accessed with the cached_routes method.
So given this method, how do we test if a user is eligible for URL "vanitization?" It's simple:
module FancyUrls
def user_name_valid_for_short_url?(login)
not FancyUrls.cached_routes.include?(login)
end
end
The method is simple: If the user's name is in our list of reserved routes, then it's not valid for URL shortening. Easy peasy.
So now we can reasonably quickly determine whether or not a user gets a vanity profile URL. The next step is to write a user_profile_url method that, given a user, returns either the vanity or full profile URL, as appropriate. To do this, first we will need to add our vanity URLs to the bottom of our routes.rb file:
# Install the non-vanity user profile route above the vanity route so people
# who don't have shortenable logins can still have a URL to their profile page.
map.long_profile 'users/:id', :controller => 'users', :action => 'show', :conditions => { :method => :get }
# Install the vanity user profile route above the default routes but below all
# resources.
map.short_profile ':login', :controller => 'users', :action => 'show', :conditions => { :method => :get }
# Install the default routes as the lowest priority.
map.connect ':controller/:action/:id'
map.connect ':controller/:action/:id.:format'
What's going on here? Well, at the very bottom of the routes.rb file, we are installing the old Rails standby, the :controller/:action routes. Newer Rails ideology is often to leave these routes out, so adjust your routes file as appropriate. Above those routes, but otherwise of the lowest priority, is our vanity route. Anywhere above that route is our traditional profile URL. (If you have a RESTful users controller, you could of course replace the top route with a resources call.)
At first glance there's a chicken-and-egg problem: We're checking if a user is "vanitizable" using the routes file, but now the routes file contains the vanity URL route. We solved this problem earlier in the generate_cached_routes method:
# Remove routes whose first path component is a parameter or wildcard
regular_routes.reject! { |route| route.starts_with?(':') or route.starts_with?('*') }
This line of code filters out any routes that start with a parameter or wildcard, among them the short_profile named route.
With the routes squared away, we move on to the problem of users with logins containing reserved characters. RFC 1738 defines what characters must be encoded in a URL:
Thus, only alphanumerics, the special characters "$-_.+!*'(),", and
reserved characters used for their reserved purposes may be used
unencoded within a URL.
Characters aside from these in usernames must either be encoded or otherwise dealt with. Beyond RFC 1738, we should additionally consider the dollar sign and plus characters ("$" and "+") reserved because they often serve special roles in URLs as well. And because this is a Rails app, we should consider the period (".") reserved as well, as it is used by Rails to indicate the format parameter.
So if a user has any reserved character in his login, what do we do? The obvious solution is to percent-encode it, creating a string like "foo%2Fbar", but some might find that ugly. You could also replace these characters with dashes (or some other stand-in character), creating "foo-bar", but then you run into trouble if someone actually signs up with the username "foo-bar". If you're making a new website, you may opt to disallow these characters from usernames. At Scribd we use a combination of approaches: Some reserved characters (like spaces) are simply not allowed in usernames; others are allowed but by using one of these characters you "give up" your vanity URL, instead using the fallback profile URL.
If you choose to allow certain reserved characters in your usernames, but disallow those people vanity URLs, you will have to modify the user_name_valid_for_short_url? like so:
def user_name_valid_for_short_url?(login)
not (login.include?('.') and FancyUrls.cached_routes.include?(login))
end
This example allows users to have periods in their login, but disallows those users their vanity URLs.
With our vanity routes defined, we can implement the user_profile_url method:
module FancyUrls
def user_profile_url(person, options={})
login = login_for_user(person)
raise ArgumentError, "No such user #{person.inspect}" unless login
if user_name_valid_for_short_url?(login) then
short_profile_url options.merge(:id => login)
else
long_profile_url options.merge(:id => person)
end
end
private
def login_for_user(user_or_id)
return (if user_or_id.is_a?(User) then
user_or_id.login
else
Rails.cache.get("login:#{user_or_id}") { User.find_by_id(user_or_id, :select => 'login').try(:login) }
end)
end
end
The method is simple enough: We check if the user an have a vanity URL, and if so, we return it; otherwise we return the standard profile URL. I included two small optimizations: We cache the login to avoid database lookups with each method call, and we only select the fields we care about from our users table.
And with that, we've got our URLs! Simply include your module as a helper and call user_profile_url to generate profile URLs as opposed to url_for or the named resource routes or whatever else you might have been using.
We're not quite done yet, though. What happens when a user who haplessly registered the username "ratings" gets screwed because we just launched our ratings feature? With the system I've shown above, the moment we deploy our new feature, any links to that user's profile page would automatically revert to the normal profile URLs.
Good web practice teaches us that when we change the URL for a resource, we should respond with a 301 to any client that tries to access the old URL. Obviously, since the /ratings URL now points to a different web page, we can't do that. Any users who visit external web pages and click a link to that user's profile URL will find themselves on your brand new ratings page. I have implemented no particular fix for this problem, as I believe most websites add very, very few controllers and named routes in comparison to the number of users they have. In other words, the problem is small enough that it's probably not worth solving.
We can solve the flip side of this problem, though: Once a website launches its vanity URL feature, there will still be bunches of external links to the old, longer profile URLs. We can respond to these requests with 301s to inform people that those links are now outdated. This also helps assist with SEO, getting people's new profile URLs on the Google index and getting the old ones off.
We do this by including code in the profile page's controller action to redirect if necessary:
class UsersController
def show
if params[:id] then
@user = User.find(params[:id])
return head(:moved_permanently, :location => user_profile_url(@user)) if user_name_valid_for_short_url?(@user)
elsif params[:login] then
@user = User.with_login(params[:login]).first || raise ActiveRecord::RecordNotFound
else
raise ActiveRecord::RecordNotFound
end
end
end
We have this if statement at the start of our show method because the method is doing double-duty: It responds to both the short_profile and long_profile named routes. In the former, the variadic portion of the URL is stored in the id parameter; in the latter, the login parameter. You could of course opt to dispatch the two URLs to two separate actions; either way, make sure you respond to unnecessarily long profile URLs with a 301.
And with that, you've got your vanity URLs. All it comes down to is a little bit of route-foo and some speed optimizations here and there. The solution here is tailored to the needs of Scribd; I've done my best to outline those needs and how they impacted our code. You should think about how you want to do vanity URLs on your website and take this code as a guide to implementing your own solution. Vanity URLs take a little extra time to implement, but in return you are rewarded with users who are more willing to share their profile pages, improved SEO, and that glowy feeling you get when you increase your site's Web 2.0-ishness.