This is the third of a four-part series on the technology behind Scribd’s HTML viewing experience. You might like to read part 1, “Facing Fonts in HTML” and part 2, “The Perils of Stacking,” if you haven’t already. Part 4, “Plan B: Font Fallbacks” is coming soon.
Every single day, Scribd processes over 150,000,000 polygons in order to convert your uploaded documents into our new HTML format (among others).
So why on earth would something like this be necessary? In order to get to that, we first have to talk a bit about fonts. A font, in its most simplistic form, is just a bundle of polygons, with a Unicode index attached to each one. It’s these font polygons this post is about.
Of course, Truetype fonts and their descendants (.eot fonts, OpenType fonts etc.) store much more than that, with a typical font containing thousands of lines of vertex-adjusting bytecode programs (hence the name “font program”), multiple encoding tables, a vast amount of miscellaneous font metrics, etc. This is what this post is not about.
For our HTML conversion, we had to solve the following problem: How do you encode an arbitrary polygon into a font glyph for use in a @font-face declaration? The answer is font repolygonization, which is the process of optimizing the polygons in a given font. It turns out that to repolygonize a font properly, you first have to know a thing or two about fill-rules.
Polygons in computer graphic generally come in two “flavors”: even-odd-filled or nonzero-filled. These two schemes use different semantics to determine whether a given point is inside or outside of the glyph. For even-odd, every line in the polygon separates something on the inside from something on the outside:
Even-odd filled glyphs
For nonzero, on the other hand, the direction of the line matters. Two lines in the same direction mean the area after the lines is still filled; drawing a hole into a polygon needs to be done by a line going into the other direction:
Nonzero-filled glyphs (with direction indicators)
For even-odd polygons, the segment direction doesn’t matter, but for nonzero-filled polygons it’s also necessary to draw segments in the right direction. Truetype, and embedded OpenType (eot) fonts use the nonzero fill style. For .svg fonts, you get to choose (see the fill-rule property). So to properly encode any polygon as font glyph, we need to “fix” the fill-rule.
Another thing that’s important for properly encoding a Truetype font is that all segments need to be continuous. Depending on where the source polygon comes from, this isn’t necessarily the case. In some vector graphic formats (e.g. SWF), polygons can easily be drawn in an arbitrary order and still define a filled shape:
degenerate glyph (even-odd filled)
We found that while we’re recoding and fixing a polygon, a convenient way to store the end result is as a hybrid polygon which is, at the same time, even-odd as well as circular:
Self intersected, both even-odd as well as nonzero-filled glyphs
At Scribd, we greatly rely on this hybrid approach when repolygonizing fonts in order to be able to produce output files for both our Flash and HTML reader platforms at the same time. “But wait!” you cry, “you said you generate HTML fonts from fonts stored in PDF files? Why aren’t those already encoded with the right fill-rule and continuous segments?” Good question.
First of all, we simplify font polygons, thus potentially breaking the type of fill-rule. Secondly, some “fonts” in a PDF are no more than a vector drawing in font’s clothing. Finally, font characters might get overlapped, transformed, combined, intersected etc. during conversion.
So how do you convert a polygon to a different fill-rule? This gets us back to the polygon intersection of those 150,000,000 polygons per day. You intersect the polygon with itself and relabel them at the same time with a fill-rule of your choice. We actually not only intersect the polygon with itself but also with various elements like clipshapes in order to remove invisible or half-hidden elements on the page.
In theory, this is a rather simple approach: For every character element to be drawn on the page, intersect that character’s polygon with itself and all other page polygons it possibly interacts with, then store the result of that intersection in the font. There are two perhaps non-obvious problems, though:
- This polygon intersection needs to be fast. We process hundreds of thousands of pages every day.
- It needs to be very stable. We want to introduce zero display errors.
So while long processing speed can easily be countered by buying some more fancy hardware, getting a polygon intersection to be stable is something entirely different. For example, here are some classic pathological cases that need to be correctly handled. Notice that cases like these are actually the rule rather than the exception.
Horizontal lines throw off scanline algorithms, multiple intersections in a single point lead to subtle numeric errors, and intersections taking place on top of lines can introduce hairline fragments that will cause the rendering engine displaying these polygons to “bleed.”
While it’s possible to deal with each of these cases, floating point arithmetic can actually cause them to occur after or during the computation: For example, if you add a point to a polygon segment because of an intersection, the resulting small change in the segment slope can cause it to overlap another segment it was previously disjunct from.
There are basically three ways to deal with this precision problem:
- Introduce some random perturbation to all polygon endpoints. This nicely takes care of horizontal lines and three point intersections, and makes the probability of floating point errors introducing extraneous overlaps very small. However when concatenating intersections the perturbations tend to accumulate. It’s possible to counteract this with symbolic perturbation, but still, this is cheating. We wouldn’t be solving the problem, only make it harder to reproduce.
- Work with infinite precision. In other words, do away with floating point errors by introducing enough bits in the numeric representation of the coordinates so that we never have to do any rounding. This approach has been verified in practice (see e.g. Fortune & Wyk). The main problem is going back from the infinite precision representation to the finite precision final version (which we need to actually store the data).
- Work on an integer grid, and use snap-rounding. Don’t let the floating point numbers lure us into the security of false precision, and explicitly deal with all special cases that can occur.
Let’s look at the last item in more detail. So in the example of these glyphs:
A snap-rounded version would look like this:
You see that snap-rounding doesn’t just reposition all polygon endpoints, it also puts all the intersection points on the integer grid. Also all lines that come into the vicinity of a used grid point are routed through that grid point as well. This way, no polygon edges can ever come to lie too close to the vertex of another edge. And, while snap-rounding causes things to look blocky in the example above, this is because we choose a very coarse-grained grid for illustration. In our production system we operate with a grid spacing that is a tiny fraction of a pixel (or for glyphs, a tiny fraction of an em square).
Intersecting two polygons while at the same time grid-rounding the results also uses only slightly more computation time than standard scanline algorithms: while a Bentley & Ottman scanline sweep needs O(n log n + k log n) time, snap-rounding can be done in up to O(E log n), with E the description complexity of the crossing pattern which for typical polygons is of order (n+k). The nice thing about intersection using grid-rounding is that it’s rock-solid. There’s no corner-case that’s not explicitly handled, and overcomplicated intersection patterns automatically simplify themselves. If you’re interested in the nitty-gritty details of our implementation, we basically used the algorithm from Hobby with a few additional ideas from Hershberger and Bhattacharya et al in order to produce a grid-rounded glyph polygon intersection in a single scanline pass.
Also, even though we process those hundred million polygons a day, only 10% of document conversion time is actually spent on that. All the rest goes into image processing. We’ll talk about that in one of our next posts.
Coming Soon: Plan B: Font Fallbacks