This is the second of a four-part series on the technology behind Scribd’s HTML viewing experience. You might like to read part 1, “Facing Fonts in HTML” and part 3, “Repolygonizing Fonts,” if you haven’t already. Part 4, “Plan B: Font Fallbacks” is coming soon.
A document page, unlike an image, isn’t really just a two-dimensional thing.
It’s not until you’ve been forced to dig into the internals of the PDF format that you come to appreciate the rich structure an innocent looking document page gives you. Vector fills, gradient patterns and semi-transparent bitmaps fight over dominance in the z-order stack, while clip-polygons slice away whole portions of the page, only to be faded into the background by an omnipotent hierarchical transparency group afterwards. So how does one convert this multitude of multi-layer objects into an html page, which is basically nothing more than a background image with a bit of text on top?
To understand the problem better, here’s a stacked diagram of a page:
At the bottom of the stack we have a bitmap (drawn first), then some text, followed by vector graphics, and finally another block of text on top. We don’t currently support vector graphics in HTML (stay tuned …); instead, we convert polygons to images which presents us with the challenge of finding a z-order of bitmaps and text elements that preserves the drawing order of the original page, while also simplifying the structure.
An optimal solution of transforming the above document page into a bitmap/text layering might look like this:
You see that here we merged two images into one even though they were not adjacent in the rendering stack, by using the fact that the text between the two images didn’t intersect with both of them.
This was a simple case where it’s actually enough to put one solitary bitmap at the background of a page. It also may happen that you have to put transparent images on top of the text (i.e, give them an higher z-index value). Notice that this requires the IE6 transparency hack.
In order to figure out whether or not an element on the page shares display space (i.e., pixels) with another element, we keep a boolean bitmap around during conversion:
![]() |
![]() |
Element on page | Corresponding boolean bitmap |
This bitmap tells us which regions of a page currently have been drawn by e.g. polygon operations, and thus which pixels need to be checked against new text objects in preceding layers. In fact, we actually keep two of those bitmaps around; one for keeping track of the area currently occupied by the next bitmap layer we’re going to add to the display stack, and one for keeping track of the same thing for text objects.
There’s an interesting fact about this approach: As long as the things drawn so far onto a bitmap and the html text fields don’t overlap, we’re free to chose the order we draw them.
In other words, it’s not until we’ve drawn the first object intersecting with another layer that we decide which of those two layers to dump out to the page first.
Here’s an example of a document page being rendered step by step:
The background image is put on the lowermost layer. Notice that the background also contains graphical elements for the equations on this page:
This is the text layer, consisting of normal HTML markup using fonts extracted from the document:
The two layers are combined to produce the final output:
Take a look at the actual technical paper as converted. And of course, if you want to see your own docs htmlified, just upload them to Scribd.com!
—Matthias Kramm
Next: Repolygonizing Fonts
Pingback: Facing Fonts in HTML « coding@scribd
Pingback: jardenberg kommenterar – 2 Jun, 2010 | jardenberg unedited
When I upload a document to scribd, I am taken to the Flash version; not the HTML version. What gives?! Also, the technical paper you link to at the end of the article brings me to the flash version rather than the HTML version.
Pingback: Repolygonizing Fonts « coding@scribd
I seriously love your blog.. Very nice colors & theme.
Did you develop this website yourself? Please reply
back as I’m looking to create my very own blog
and would love to find out where you got this from or what the theme is named.
Thank you!
Pingback: Vital Factors For survival tips – What’s Required | Your Career - Career Development, Career Coaching, Career Change
Pingback: www.elainedemakas.com » A Guide To Straightforward Methods In survival tips
Article spinning creativity software is a hot trend that is
currently yielding many results for Internet Marketers and
Website Owners while at the same time enabling them to
spread out teir content exponentially. The chapger explains devveloping a brand including photography and how to
make a light box, vending tips, selling online, even dealing witth shipping annd handling.
Content creation management services play a vital role in getting a favorable ranking annd placement on search engines.
Always purchase a sample aticle to ssee what you will get before you order a large amount of
content. As per to professionals, nearely all words posses many different meanings and can mean various thins inn whatever
context they’re utilized. Trying to fihd menus on a program’s window
and not realijzing that they are always present
at the top.
The first battery operated fan was most probably invented in tthe late 1960s, although details are generally
sketchy. There is no sich thing as being the ulptimate digital camera.
They aare not always as powerful as laptops or PC’s but they run your normal windows versions just like a
laptop or PC.