This is the second of a four-part series on the technology behind Scribd’s HTML viewing experience. You might like to read part 1, “Facing Fonts in HTML” and part 3, “Repolygonizing Fonts,” if you haven’t already. Part 4, “Plan B: Font Fallbacks” is coming soon.
A document page, unlike an image, isn’t really just a two-dimensional thing.
It’s not until you’ve been forced to dig into the internals of the PDF format that you come to appreciate the rich structure an innocent looking document page gives you. Vector fills, gradient patterns and semi-transparent bitmaps fight over dominance in the z-order stack, while clip-polygons slice away whole portions of the page, only to be faded into the background by an omnipotent hierarchical transparency group afterwards. So how does one convert this multitude of multi-layer objects into an html page, which is basically nothing more than a background image with a bit of text on top?
To understand the problem better, here’s a stacked diagram of a page:

At the bottom of the stack we have a bitmap (drawn first), then some text, followed by vector graphics, and finally another block of text on top. We don’t currently support vector graphics in HTML (stay tuned …); instead, we convert polygons to images which presents us with the challenge of finding a z-order of bitmaps and text elements that preserves the drawing order of the original page, while also simplifying the structure.
An optimal solution of transforming the above document page into a bitmap/text layering might look like this:

You see that here we merged two images into one even though they were not adjacent in the rendering stack, by using the fact that the text between the two images didn’t intersect with both of them.
This was a simple case where it’s actually enough to put one solitary bitmap at the background of a page. It also may happen that you have to put transparent images on top of the text (i.e, give them an higher z-index value). Notice that this requires the IE6 transparency hack.
In order to figure out whether or not an element on the page shares display space (i.e., pixels) with another element, we keep a boolean bitmap around during conversion:
![]() |
![]() |
| Element on page | Corresponding boolean bitmap |
This bitmap tells us which regions of a page currently have been drawn by e.g. polygon operations, and thus which pixels need to be checked against new text objects in preceding layers. In fact, we actually keep two of those bitmaps around; one for keeping track of the area currently occupied by the next bitmap layer we’re going to add to the display stack, and one for keeping track of the same thing for text objects.
There’s an interesting fact about this approach: As long as the things drawn so far onto a bitmap and the html text fields don’t overlap, we’re free to chose the order we draw them.
In other words, it’s not until we’ve drawn the first object intersecting with another layer that we decide which of those two layers to dump out to the page first.
Here’s an example of a document page being rendered step by step:
The background image is put on the lowermost layer. Notice that the background also contains graphical elements for the equations on this page:

This is the text layer, consisting of normal HTML markup using fonts extracted from the document:

The two layers are combined to produce the final output:

Take a look at the actual technical paper as converted. And of course, if you want to see your own docs htmlified, just upload them to Scribd.com!
—Matthias Kramm
Next: Repolygonizing Fonts




