08May / 2011

Clean Up Your Project

This post was written by Sam Soffes, an iOS developer at Scribd, and originally posted on his blog here.

Many of the apps I work on are usually 100% custom. There is rarely any system UI components visible to the user. Styling the crap out of apps like this makes for tons of images in my iOS projects to get everything the way the designer wants. I’m starting to drawRect: stuff more these days because it makes it easier to reuse, but anyway.

There are literally hundreds of images in the Scribd app I’ve been working on. Designers changing their mind plus everything custom leaves a lot of images behind that are no longer used. Our application was starting to be several megs and a lot of it was unused images. So… being the programmer I am, I wrote a script.

desc 'Remove unused images'
task :clean_assets do
  require 'set'

  all = Set.new
  used = Set.new
  unused = Set.new

  # White list
  used.merge %w{Icon Icon-29 Icon-50 Icon-58 Icon-72 Icon-114}

  regex = /\[UIImage imageNamed:@"([a-zA-Z0-9\-_]+).png"\]/
  Dir.glob('Classes/*.m').each do |path|
    used.merge File.open(path).read.scan(regex).flatten
  end

  Dir.glob('Resources/Images/*.png').each do |path|
    next if path.include? '@2x.png'
    all << path.gsub(/Resources\/Images\/([a-zA-Z0-9\-_]+).png/, "\\1")
  end

  unused = all - used
  unused.each do |key|
    `rm -f Resources/Images/#{key}.png Resources/Images/#{key}@2x.png`
  end

  puts "#{all.length} total found"
  puts "#{used.length} used found"
  puts "#{unused.length} deleted"
end

view raw
clean_assets.rb
This Gist brought to you by GitHub.

It basically searches all of your source files for references for [UIImage imageWithName:@"image_name_here"]. Then it looks at all of the images on disk and removes any you didn’t reference. I setup a whitelist for icons and other images I don’t reference directly. You might need to tweak the paths a bit to work for your setup.

Hopefully this little rake task helps someone clean up their project too.

Permalink Leave a comment

08May / 2011

How to Drastically Improve Your App with an Afternoon and Instruments

This post is by Sam Soffes, an iOS engineer at Scribd, and was originally posted on his blog here

Recently I managed to make the Scribd iOS application way better with some simple tweaks. I wanted to write a quick post about what I did that really helped that will probably help most people. This stuff is a bit application specific, but I think you’ll see parallels to your application.

Symptoms

The Scribd application pulls a ton of data from the network and puts it in Core Data when you login for the first time. From using the application, I noticed that performance totally sucks at first and then goes back to normal. (My table views all scroll at 60fps, but I’ll save that for another post. Sorry. Had to throw that in there. I’m way proud.) This was troubling since it usually works really great, (okay, now I’m done bragging about my cells) so I investigated.

Just so you know, I am doing all of my networking, data parsing, and insertion into Core Data on background threads via NSOperationQueue.

The Problems

After running Instruments with the object allocations instrument, I noticed that I was using about 22MB of memory while it was downloading all of this data. In my opinion, that is way too high. I’ll add that to list of stuff to mess with.

I also noticed that my NSDate category for parsing ISO8601 date strings (standard way to put a date into JSON) was taking about 7.4 seconds using the timer instrument. Totally unacceptable. Added to the list.

After messing around for a little while longer, I noticed that a lot of time was being spent in one of my NSString categories, specifically in NSRegularExpression. This sounds annoying, so I’ll save that for last.

The Solutions

Memory

I had a few guess on how to cut memory usage while converting large amounts of JSON strings into NSManagedObjects. My guess was that a ton of objects needed to be autoreleased but the NSAutoreleasePool wasn’t being drained until the operation finished. The simple solution for this to add a well-placed NSAutoreleasePool around problem code. This took a few tries to get in the right spot. I would put it where I think most of the temporary objects were being created and then watch the object allocations instrument to make sure it got flatter.

Here was my first try:

First Try

See how it goes up and drops sharply down a bit and then builds up for awhile then finally drops off? That’s a sign there is another loop nested deeper down that should have a pool around it. For the first one, it did a little and then drained (probably because it did less stuff in that operation). Since the second giant hump (note the peak of that is 23MB or so) doesn’t drop off for awhile, I know to look for another loop deeper down. Hopefully that makes sense. Once you get in there, it will suddenly hit you after stumbling around for a bit. You’ll see.

After moving it to a more nested loop, here’s the result:

Second Try

Once I got it in the right spot, it was using under 2MB of memory for the entire process! Score! Next problem.

Date Stuff

The date stuff had me stumped for awhile. I was using ISO8601Parser (a subclass of NSFormatter) which was working really, really well compared to NSDateFormatter. After looking at timer instrument, I saw that most of that time was spent in system classes like NSCFCalendar. I assumed there was a better way. I tried switched back to NSDateFormatter, but that didn’t work well and still wasn’t great memory and speed wise.

As a disclaimer, I am all about Objective-C. I love it. I’m not one of those engineers that’s says “hey, we should rewrite this in C” all the time, but hey, we should rewrite this in C. I did… and the result was astounding!

Here’s the code:





#include <time.h>

+ (NSDate *)dateFromISO8601String:(NSString *)string {
    if (!string) {
        return nil;
    }
    
    struct tm tm;
    time_t t;    
    
    strptime([string cStringUsingEncoding:NSUTF8StringEncoding], "%Y-%m-%dT%H:%M:%S%z", &tm);
    tm.tm_isdst = -1;
    t = mktime(&tm);
    
    return [NSDate dateWithTimeIntervalSince1970:t + [[NSTimeZone localTimeZone] secondsFromGMT]];
}


- (NSString *)ISO8601String {
    struct tm *timeinfo;
    char buffer[80];

    time_t rawtime = [self timeIntervalSince1970] - [[NSTimeZone localTimeZone] secondsFromGMT];
    timeinfo = localtime(&rawtime);

    strftime(buffer, 80, "%Y-%m-%dT%H:%M:%S%z", timeinfo);
    
    return [NSString stringWithCString:buffer encoding:NSUTF8StringEncoding];
}


            view raw

            date.m

            This Gist brought to you by GitHub.

See, it’s not too crazy. Using the C date stuff took my date parsing from 7.4 seconds to 300ms. Talk about a performance boost! (I updated SSTookit‘s NSDate category to use this new code.)

Regular Expression

I have several NSString categories in my application for doing various things. Some of them were called throughout the process I was trying to optimize. I drilled down in the time profiler instrument and realized that [NSRegularExpression regularExpressionWith...] was taking a ton of the time. This totally makes sense, since it compiles your regex to use later and I was doing it each time. Simple solution:

- (NSString *)camelCaseString {
    static NSRegularExpression *regex = nil;
    if (!regex) {
        regex = [[NSRegularExpression alloc] initWithPattern:@"(?:_)(.)" options:0 error:nil];
    }
    
    // Use regex...
    
    return string;
}

view raw
string.m
This Gist brought to you by GitHub.

This was actually the easiest part 🙂

Conclusions

So using Instruments to track down slow or bad code is really easy once you get the hang of it. Start with the leaks instrument if you’re new. You shouldn’t have any (known) leaks in your application.

Once you get that down (or get so frustrated trying to track it down you give up and move to something else) do the object allocations instrument next. You can watch the graph and see how many objects you have alive. If you see a big spike that never goes down, you most likely have a ton of memory around that you probably don’t need but still have a reference to so it doesn’t show up in leaks. Adding autorelease pools around loops that do lots of processing always helps.

Finally, use the time profiler instrument to see what’s taking a long time and optimize the crap out of it. This is the most fun since it’s easy to see whats happening and how much of an improvement you made by the changes you just made. The key to making this instrument useful is the checkboxes on the left. Turning on Objective-C only or toggling the inverted stack tree is really useful.

This is Hard

Don’t feel bad, especially if you’re new to this. This stuff is hard. All of my solutions I listed above are pretty simple. I spent almost an entire day coming up with those few things. The majority of the time you spend will be tracking down problems. Fixing them is usually pretty simple, especially after you’ve done it a few times. This is hard. You’re smart. 🙂

Permalink 4 Comments

01Sep / 2010

Vanity Profile URLs in Rails

One feature shared by many social networking sites is "vanity" short profile
URLs. My Twitter page could have easily been the RESTfully predictable http://twitter.com/users/riscfuture, but thanks to short profile URLs it is http://twitter.com/riscfuture.

Even Facebook got in the game recently with their "Facebook Usernames" feature. Of course, in classic Facebook style, getting the vanity URL is a multi-step process with an application and the associated land-grab. At Scribd I kept it a little simpler, and I'm assuming you'd like to keep it simple for your Rails website as well.

In order for this system to work, we're going to have to lay down a few ground rules:

No user whose username conflicts with a controller name can have a short URL. You can't sign up on Scribd with the username "documents" and prevent anyone from seeing their document list.
No user whose username conflicts with another defined route can have a short URL. Remember that the routes file defines named or custom routes and resources, but with the default routes, normal controllers do not need an entry in that file.
Users with reserved characters in their names must have these characters escaped or dealt with. If I sign up with the username "foo/bar", that slash can't be left unescaped, or the router will misunderstand the address.
Usernames must be case-insensitively unique. Every browser expects scribd.com/foo to be the same as scribd.com/FOO.
Any user who cannot be given a short URL for the above reasons must have a fallback URL. This is where you fall back to your less pretty /users/123 URL. (Or perhaps /users/123-foo-bar for SEO purposes.)

Note that it's not enough to simply build a list of your controllers and stick them in a validates_exclusion_of validation. You want to be able to claim new routes for yourself even if users have already signed up with conflicting logins, and gracefully revert those users to a fallback profile URL.

Ultimately the question we need to answer is this: Given a user name, will a vanity URL conflict with an existing route? There are a lot of really hard ways of going about this, many of which will break over time. I opted to go with the a reliable (if somewhat slow) way of doing this: I build a list of known routes, strip them down to their first path component, then build an array of these reserved names. A known route might be, for instance, /documents/:id; its first path component is "documents." Thus, a user whose login is "documents" cannot have a vanity URL.

There are some points to note for this system:

You'll get a few false positives. If /documents/:id is a valid route, but /documents is not (say you had no index action), this system would still disallow a user named "documents". You can easily solve this by tweaking the code below, though.
No attention is paid to HTTP methods. Theoretically, if you had a route like /upload whose only acceptable method is POST, you could still use GET /upload to refer to a user named "upload". I have intentionally avoided doing this, however; good web design dictates that varying the HTTP method of a request only varies the manner in which you interact with the resource represented by the URL; a single URL should represent the same resource regardless of which method is used in the request.

In order to eke speed out wherever we can, we generate the list of reserved routes once, at launch, and cache it for the lifetime of the process. We do this in a module in lib/:

 
module FancyUrls
  def self.generate_cached_routes
    # Find all routes we have, take the first part (/xxx/) and remove some unwanted ones
    @cached_routes = ActionController::Routing::Routes.routes.map do |route|
      segs = route.segments.inject("") { |str, s| str << s.to_s }
      segs.sub! /^\/(.*?)\/.*$/, '\\1'
 
      # Some routes accept a :format parameter (ratings.:format).
      segs.sub! /\.:format$/, ''
      segs
    end
 
    # All possible controllers for /:controller/:action/:id route
    @cached_routes += ActionController::Routing.possible_controllers.map do |c|
      # Use only the first path component for controllers with multiple path components
      c.sub /^(.*?)\/.*$/, '\\1'
    end
    @cached_routes.uniq!
    # Remove routes whose first path component is a variable or wildcard
    @cached_routes.reject! { |route| route.starts_with?(':') or route.starts_with?('*') }
    # Remove the root route.
    @cached_routes.delete '/'
  end
 
  def self.cached_routes
    @cached_routes
  end
end

The top method combines two arrays: the first, a list of routes from the defined routes, and the second, a list of the app's controllers. It then filters out some non-applicable routes and stores the list in an instance variable. The list consists of only the first path component of a route.

The method is called generate_cached_routes because it's called when the server process starts, as part of the environment.rb file. The cached results are accessed with the cached_routes method.

So given this method, how do we test if a user is eligible for URL "vanitization?" It's simple:

 
module FancyUrls
  def user_name_valid_for_short_url?(login)
    not FancyUrls.cached_routes.include?(login)
  end
end

The method is simple: If the user's name is in our list of reserved routes, then it's not valid for URL shortening. Easy peasy.

So now we can reasonably quickly determine whether or not a user gets a vanity profile URL. The next step is to write a user_profile_url method that, given a user, returns either the vanity or full profile URL, as appropriate. To do this, first we will need to add our vanity URLs to the bottom of our routes.rb file:

 
# Install the non-vanity user profile route above the vanity route so people
# who don't have shortenable logins can still have a URL to their profile page.
map.long_profile 'users/:id', :controller => 'users', :action => 'show', :conditions => { :method => :get }
# Install the vanity user profile route above the default routes but below all
# resources.
map.short_profile ':login', :controller => 'users', :action => 'show', :conditions => { :method => :get }
 
# Install the default routes as the lowest priority.
map.connect ':controller/:action/:id'
map.connect ':controller/:action/:id.:format'

What's going on here? Well, at the very bottom of the routes.rb file, we are installing the old Rails standby, the :controller/:action routes. Newer Rails ideology is often to leave these routes out, so adjust your routes file as appropriate. Above those routes, but otherwise of the lowest priority, is our vanity route. Anywhere above that route is our traditional profile URL. (If you have a RESTful users controller, you could of course replace the top route with a resources call.)

At first glance there's a chicken-and-egg problem: We're checking if a user is "vanitizable" using the routes file, but now the routes file contains the vanity URL route. We solved this problem earlier in the generate_cached_routes method:

 
# Remove routes whose first path component is a parameter or wildcard
regular_routes.reject! { |route| route.starts_with?(':') or route.starts_with?('*') }

This line of code filters out any routes that start with a parameter or wildcard, among them the short_profile named route.

With the routes squared away, we move on to the problem of users with logins containing reserved characters. RFC 1738 defines what characters must be encoded in a URL:

Thus, only alphanumerics, the special characters "$-_.+!*'(),", and
reserved characters used for their reserved purposes may be used
unencoded within a URL.

Characters aside from these in usernames must either be encoded or otherwise dealt with. Beyond RFC 1738, we should additionally consider the dollar sign and plus characters ("$" and "+") reserved because they often serve special roles in URLs as well. And because this is a Rails app, we should consider the period (".") reserved as well, as it is used by Rails to indicate the format parameter.

So if a user has any reserved character in his login, what do we do? The obvious solution is to percent-encode it, creating a string like "foo%2Fbar", but some might find that ugly. You could also replace these characters with dashes (or some other stand-in character), creating "foo-bar", but then you run into trouble if someone actually signs up with the username "foo-bar". If you're making a new website, you may opt to disallow these characters from usernames. At Scribd we use a combination of approaches: Some reserved characters (like spaces) are simply not allowed in usernames; others are allowed but by using one of these characters you "give up" your vanity URL, instead using the fallback profile URL.

If you choose to allow certain reserved characters in your usernames, but disallow those people vanity URLs, you will have to modify the user_name_valid_for_short_url? like so:

 
def user_name_valid_for_short_url?(login)
  not (login.include?('.') and FancyUrls.cached_routes.include?(login))
end

This example allows users to have periods in their login, but disallows those users their vanity URLs.

With our vanity routes defined, we can implement the user_profile_url method:

 
module FancyUrls
  def user_profile_url(person, options={})
    login = login_for_user(person)
    raise ArgumentError, "No such user #{person.inspect}" unless login
 
    if user_name_valid_for_short_url?(login) then
      short_profile_url options.merge(:id => login)
    else
      long_profile_url options.merge(:id => person)
    end
  end
 
  private
 
  def login_for_user(user_or_id)
    return (if user_or_id.is_a?(User) then
      user_or_id.login
    else
      Rails.cache.get("login:#{user_or_id}") { User.find_by_id(user_or_id, :select => 'login').try(:login) }
    end)
  end
end

The method is simple enough: We check if the user an have a vanity URL, and if so, we return it; otherwise we return the standard profile URL. I included two small optimizations: We cache the login to avoid database lookups with each method call, and we only select the fields we care about from our users table.

And with that, we've got our URLs! Simply include your module as a helper and call user_profile_url to generate profile URLs as opposed to url_for or the named resource routes or whatever else you might have been using.

We're not quite done yet, though. What happens when a user who haplessly registered the username "ratings" gets screwed because we just launched our ratings feature? With the system I've shown above, the moment we deploy our new feature, any links to that user's profile page would automatically revert to the normal profile URLs.

Good web practice teaches us that when we change the URL for a resource, we should respond with a 301 to any client that tries to access the old URL. Obviously, since the /ratings URL now points to a different web page, we can't do that. Any users who visit external web pages and click a link to that user's profile URL will find themselves on your brand new ratings page. I have implemented no particular fix for this problem, as I believe most websites add very, very few controllers and named routes in comparison to the number of users they have. In other words, the problem is small enough that it's probably not worth solving.

We can solve the flip side of this problem, though: Once a website launches its vanity URL feature, there will still be bunches of external links to the old, longer profile URLs. We can respond to these requests with 301s to inform people that those links are now outdated. This also helps assist with SEO, getting people's new profile URLs on the Google index and getting the old ones off.

We do this by including code in the profile page's controller action to redirect if necessary:

 
class UsersController
  def show
    if params[:id] then
      @user = User.find(params[:id])
      return head(:moved_permanently, :location => user_profile_url(@user)) if user_name_valid_for_short_url?(@user)
    elsif params[:login] then
      @user = User.with_login(params[:login]).first || raise ActiveRecord::RecordNotFound
    else
      raise ActiveRecord::RecordNotFound
    end
  end
end

We have this if statement at the start of our show method because the method is doing double-duty: It responds to both the short_profile and long_profile named routes. In the former, the variadic portion of the URL is stored in the id parameter; in the latter, the login parameter. You could of course opt to dispatch the two URLs to two separate actions; either way, make sure you respond to unnecessarily long profile URLs with a 301.

And with that, you've got your vanity URLs. All it comes down to is a little bit of route-foo and some speed optimizations here and there. The solution here is tailored to the needs of Scribd; I've done my best to outline those needs and how they impacted our code. You should think about how you want to do vanity URLs on your website and take this code as a guide to implementing your own solution. Vanity URLs take a little extra time to implement, but in return you are rewarded with users who are more willing to share their profile pages, improved SEO, and that glowy feeling you get when you increase your site's Web 2.0-ishness.

Permalink 17 Comments

coding@scribd

Posted by: Scribd