Complex Text Rendering

Complex Text Rendering

Complex Text Rendering

What is the state of text rendering in Mapbox GL?

We currently do not render scripts that require bidirectional support or complex text shaping correctly in mapbox-gl-native or mapbox-gl-js. This ticket will track adding proper support to both projects.

What's missing?

We need to add the following functionality for proper text rendering:

  • Unicode Bidi Algorithm to cut labels into logical segments and flip the display order of RTL (right-to-left) text.
  • Necessary for proper rendering of RTL scripts and mixed script labels containing RTL scripts interspersed with LTR text runs like numerals.
  • Complex text shaping
  • Necessary for proper display of scripts where adjacent glyphs should be transformed into new glyphs or rendered as combined glyphs or ligatures.

In terms of scripts affected by this, Hebrew requires bidi support, Indic scripts like Hindi can require complex text shaping, and Arabic requires both bidi and complex text shaping support. Additionally, implementing the Unicode line-breaking algorithm should improve support for cases like smarter line breaking in Chinese.

How do we currently handle fontstack fallbacks?

Currently, the Protobuf-encoded "glyph tiles" we create with node-fontnik are a composited "fontstack" with missing glyphs in fonts higher in the stack being filled in by glyphs from fonts further down the stack and we therefore end up with a combined Helvetica, Arial Unicode fontstack with per-glyph fallbacks in rendered text.

Fontstack / Coverage
"Helvetica" / Latin
"Arial Unicode" / Latin, Arabic
"Helvetica, Arial Unicode" / Helvetica Latin, Arial Unicode Arabic

Why will this not work for complex text shaping?

Because shaping tables are specific to a font file, to apply shaping properly we will need to work exclusively with glyphs from a single font. Instead of using "fontstack" glyph tiles, we will need tiles which contain all the glyphs in a given range for a single font. This approach should also limit glyph atlas duplication for multiple fontstacks with a common fallback.

How will we do this?

We will first need to segment each label into text runs (splitting words into individual segments, and splitting Arabic text segments from numerical segments for example) with the Unicode bidi algorithm. Then, for each segment, we will attempt, with each font in the fontstack until a match is found, to shape the text segment with a single font's shaping table and check whether all characters in the shaped result can be rendered by that font (using a glyph coverage file). If coverage is incomplete, we will fall back to the next font in the stack.

(It's possible we could check glyph coverage first, but the necessary glyphs may change after shaping, and the glyph coverage check would have to be repeated. We should test performance to determine whether a possibly inaccurate initial coverage check is faster than redundant shaping passes for fonts lacking glyph coverage.)

Example

For the fontstack "Open Sans, Arial Unicode", no glyphs change when shaped with Open Sans/gsub.sfnt - do all characters in résumé exist in Open Sans.coverage.json? NO? Missing é? Reshape with Arial Unicode/gsub.sfnt, then check if all characters in résumé exist in Arial Unicode.coverage.json

Once a font with matching coverage has been determined, we can request glyph tiles from a single font containing the necessary glyphs, like Arial Unicode Regular/0-255.pbf.

How will we get/use these "shaping tables"?

Shaping tables are contained in font files as GSUB (glyph substitution), GPOS (glyph positioning) and KERN (kerning) tables, which can be read by the FreeType function FT_Load_Sfnt_Table. We will need to extract these tables from from uploaded font files, then request them from the client through an API. We've started work on extracted shaping tables but it isn't quite functional yet.

To use these shaping tables, we will need to pass them into HarfBuzz for mapbox-gl-native, or an emscripten port for mapbox-gl-js. I'm not sure if HarfBuzz currently has an interface for reading raw shaping tables (it generally works with full font files). If this interface doesn't currently exist, we'll need to add it.