Creating font subsets

Table of contents

Two of the most important parts of a web font are the characters and the OpenType features. Characters are the letters of the English alphabet (A-Z, a-z), Arabic numbers (0-9), punctuation (comma, spaces, etc). The Unicode standard organizes the characters in blocks based on their functionality and the language they support. Each character in Unicode is mapped to a hex number.

The OpenType features, on the other hand, may include variations of those characters (aka glyphs) that are useful in different contexts. If we take the numbers, for example, they can be superscripts (x2), fractions (1/2), slashed zeros to separate them from the capital O, or old-style numbers (123456789) that are used to blend numbers inside text.

Why bother with font subsets.

Subsetting fonts is the process of taking a large font file as input and creating other smaller files, with fewer characters or OpenType features. There are several reasons you may want to subset a font file:

  • You want to use advanced typographic features or characters that are not included in the Google fonts subsets—see what the Google fonts subsets include in a later section. On the other hand, you may want to use fewer features or characters from the original font file.
  • You want to create small subsets for two-stage font loading to improve the performance of your application. The goal of this method is to reduce or eliminate the time the browser shows the fallback fonts or the invisible fonts. You can achieve that by loading a small file first, and, at some later point, by swapping it with a richer font file.
  • You want to split a large font file into script (or language) specific subsets to improve performance without getting overly technical with two-stage font loading. Google already does this, but you may want to create your own, custom subsets, or create subsets for a font that’s not on Google Fonts.
  • You want to use the latest version of a typeface, and you don’t want to wait until Google updates it.
Subsetting is probably more useful for free fonts because paid fonts may require to embed them in a specific way in your site (through a CDN), or the license you purchase may not permit subsetting.

In the next section, you’ll see what tools you can use to subset your fonts.

Available Tools

To create subsets, you can use online tools like the web font generator from Font Squirrel, but here I’ll show you how to use pyftsubset. pyftsubset is a popular command-line tool written in Python that is part of the fonttools. Apparently, Google Fonts use this tool to create their subsets, but, because I’m not familiar with Python, I don’t know if this is still true. Another useful tool is glyphhanger (it uses pyftsubset under the hood) that has a really cool feature that can crawl your site and show you what characters you use. You can use glyphhanger to subset your fonts, but I prefer pyftsubset because you can select what OpenType features you want.

To use glyphhanger —which is an NPM package by the way— check the project’s README on GitHub or a post by Zach Leatherman. I will use glyphhanger in one of the following examples.

Let’s see now how you can install the fontools and pyftsubset:

Install fonttools

To use pyftsubset, you need Python and the Python package manager (pip). Later versions of Python include pip, so you don’t have to install them separately. When you install Python, don’t forget to add it to the path (Windows installer gives you this option) because you want to use pip from the command line to install the required packages. To check if you installed them, run python --version and pip --version in your command line; both tools should print their versions.

Having installed Python and pip, you’ll use pip to install the following Python packages:

pip install fonttools brotli zopfli

Let’s see what those packages do:

  • You install the fonttools because you want to use the pyftsubset tool to subset your fonts.
  • You want Brotli to compress .woff2 files, which is the format used by modern browsers. You don’t have to run Brotli separately; it’s used by pyftsubset.
  • zopfli is used to compress .woff files, which is a format used by older browsers. zopfli is not the default option for compressing woff files, and, as a result, it’s optional. If you want to use it, you add the --with-zopfli option in your pyftsubset commands.

To subsets your fonts, you’ll want to know what characters and OpenType features they contain. Luckily, there’s an excellent online tool that can help you with that.

What the Google font subsets include

Let’s now see what the subsets from Google fonts include. You’ll use an online tool by Roel Nieskens called wakamaifondue (What can my font do). Go to the Google fonts interface and select the regular version of Work Sans. To download the font file, go to the embed code that Google gives you, copy the URL from the href attribute of the <link>, and open it in your browser. To save you some time, this is the URL I’m talking about:

https://fonts.googleapis.com/css2?family=Work+Sans&display=swap

This URL contains a CSS file that has the font faces for the subsets. If you open the Latin subset (the URL from the src descriptor) in your browser, you’ll start downloading that file. This is the URL of the font file at the time of writing:

https://fonts.gstatic.com/s/worksans/v7/QGY_z_wNahGAdqQ43RhVcIgYT2Xz5u32K0nXBi8Jpg.woff2

Download the file above and drop it in wakamaifondue. The app tells me that the Latin subset has 225 characters and 9 layout features.

The Work Sans in the wakamaifondue interface
The Work Sans in the wakamaifondue interface

OpenType features

  • kern, liga, calt, rvrn, locl: Browsers enable those features by default. kern adjusts the distance between glyphs, liga and calt transform some glyph combinations to single glyphs, rvrn is specific to variable fonts, and locl is script related.
  • cswh: Contextual swashes (RealityReality) are used for decoration.
  • frac (fractions: 5/85/8), numr (numerators: 123123), and dnom (denominators: 123123) are all different ways to display numbers.

A Google font subset with 9 OpenType features is not that usual; most times you’ll see subsets with 4 features.

Now do the same thing for the complete font file that you can find in the GitHub page of Work Sans. The app tells me that this file has 750 characters and 38 layout features. Although 750 compared to 225 characters seems a big difference, most of the characters the Latin subset is missing will be inside the other subsets (see the CSS file with the font faces). So let’s focus first on the OpenType features, and name some important that are not in the subsets:

  • onum, pnum: Old-style and proportional numbers are used to blend numbers inside text (365 days365 days).
  • smcp, c2sc: Small caps are used for abbreviations and acronyms, and have the same purpose as old-style numbers: to blend capital letters inside text (CSS tricksCSS tricks).
  • subs, sups: Superscripts and subscripts for numbers.
  • dlig: Discretionary ligatures are used for decoration (stationstation). You can use them together with contextual swashes to create an interesting logo for example.
  • salt or ss01 to ss05: Stylistic alternates that are variations of existing characters (RR).

For more information about the OpenType features and how to use them in your code, check an interactive post by Tim Brown called Caring about OpenType features. I use a CSS package I made for that reason.

Characters

You saw what OpenType features the Google fonts subsets include and what features they’re missing; let’s now focus on the characters. Open again the CSS file that contains the font faces, and, this time, pay attention to the unicode-range descriptor:

/* latin */
@font-face {
  unicode-range: U+0000-00FF, U+0131, U+0152-0153, U+02BB-02BC, U+02C6,
    U+02DA, U+02DC, U+2000-206F, U+2074, U+20AC, U+2122, U+2191, U+2193,
    U+2212, U+2215, U+FEFF, U+FFFD;
}

Let’s now see what characters this Unicode range contains:

  • U+0000-00FF (I’m ommiting the control characters because they don’t render a character): 20!21"22#23$24%25&26'27(28)29*2A+2B,2C-2D.2E/2F030131232333434535636737838939:3A;3B<3C=3D>3E?3F@40A41B42C43D44E45F46G47H48I49J4AK4BL4CM4DN4EO4FP50Q51R52S53T54U55V56W57X58Y59Z5A[5B\5C]5D^5E_5F`60a61b62c63d64e65f66g67h68i69j6Ak6Bl6Cm6Dn6Eo6Fp70q71r72s73t74u75v76w77x78y79z7A{7B|7C}7D~7E A0¡A1¢A2£A3¤A4¥A5¦A6§A7¨A8©A9ªAA«AB¬AC­AD®AE¯AF°B0±B1²B2³B3´B4µB5B6·B7¸B8¹B9ºBA»BB¼BC½BD¾BE¿BFÀC0ÁC1ÂC2ÃC3ÄC4ÅC5ÆC6ÇC7ÈC8ÉC9ÊCAËCBÌCCÍCDÎCEÏCFÐD0ÑD1ÒD2ÓD3ÔD4ÕD5ÖD6×D7ØD8ÙD9ÚDAÛDBÜDCÝDDÞDEßDFàE0áE1âE2ãE3äE4åE5æE6çE7èE8éE9êEAëEBìECíEDîEEïEFðF0ñF1òF2óF3ôF4õF5öF6÷F7øF8ùF9úFAûFBüFCýFDþFEÿFF
  • U+0131: ı131
  • U+0152-0153: Œ152œ153
  • U+02BB-02BC: ʻ2BBʼ2BC
  • U+02C6: ˆ2C6
  • U+02DA: ˚2DA
  • U+02DC: ˜2DC
  • U+2000-206F:  2000200120022003200420052006200720082009200A200B200C200D200E200F2010201120122013201420152016201720182019201A201B201C201D201E201F2020202120222023202420252026202720282029202A202B202C202D202E202F2030203120322033203420352036203720382039203A203B203C203D203E203F2040204120422043204420452046204720482049204A204B204C204D204E204F2050205120522053205420552056205720582059205A205B205C205D205E205F2060206120622063206420652066206720682069206A206B206C206D206E206F
  • U+2074: 2074
  • U+20AC: 20AC
  • U+2122: 2122
  • U+2191: 2191
  • U+2193: 2193
  • U+2212: 2212
  • U+2215: 2215
  • U+FEFF: FEFF
  • U+FFFD: FFFD

As you can see, the Latin subset from Google Fonts contains a lot of characters, that your font may not even support. Furthermore, there’s a chance that you don’t need all these characters, or that you need some characters not listed above. In the following examples, you’ll see how to address these issues by creating your own subsets.

Examples

It’s time to use the pyftsubset to create some subsets. This is the list with the subsets you’ll create:

Before we begin, I want to point out that I'm running the examples on a Windows machine with the Git Bash command line.

Keep all OpenType features and characters

The first thing you’ll do is to convert a TrueType (.ttf) file to a compressed TrueType file that’s optimized for the web (.woff2). You’ll keep all the characters and OpenType features from the original file. The file you’re working with is the regular version of Work Sans because it has many OpenType features and characters. Download the font file, open your command line in the downloads folder, and paste the following command:

pyftsubset WorkSans-Regular.ttf \
           --output-file="WorkSans-Regular-all.woff2" \
           --flavor=woff2 \
           --layout-features=* \
           --unicodes=*

The first parameter is the original font name (WorkSans-Regular.ttf). With the output-file option, you pass the name of the file you’ll create and with the flavor option the format of that file. You specify what OpenType features you want with the layout-features option—the asterisk means that you want them all. The unicodes option is for the Unicode characters you want, with the asterisk meaning that you want all the characters from the original file. If you specify more characters than the font file has, the program doesn’t throw an error by default. Instead of passing Unicode character codes, you can directly pass characters with the text or text-file options. I don’t prefer the text option because it seems error-prone.

I’m using the backslash (\) to span the command over multiple lines and make it more readable. The alternative is to remove the backslashes and write the command in a single line.
I suggest opening a text editor to paste the examples there so you can easily change them if you want. For example, the command above doesn’t work on the Windows CMD. To run multiline commands on the Windows command line, you’ll have to replace the backslash with a circumflex accent (`^`, `Shift + 6` in your keyboard).

You can also use the --verbose option to view information while the tool creates the file. With pyftsubset --help, you get a list of all the available options. Alternatively, check the pyftsubset documentation page that explains all the options.

The new font file is around 62kb and the original was 207kb. Along with the characters and the OpenType features, the font files also include hinting tables. These are instructions on how to render the fonts well on screens with small resolution (or more specifically, with low pixels per inch or ppi). If you want to remove the hinting, use the --no-hinting option along with the --desubroutinize option as shown in the following command:

pyftsubset WorkSans-Regular.ttf \
           --output-file="WorkSans-Regular-all-no-hinting.woff2" \
           --flavor=woff2 \
           --layout-features=* \
           --unicodes=* \
           --no-hinting \
           --desubroutinize

After removing the hinting, the file size goes down to 47kb, which means that hinting was taking up 30% of the file size. While testing hinting in a Windows machine, with a screen that has a 23″ diagonal and 1920 × 1080 resolution, I was able to see some minor differences between the hinted and the unhinted version of Work Sans. In mobile phones (they have high ppi) and Macs it shouldn’t matter much. As a result, I suggest to keep hinting, unless you really want to save some space.

Latin subset

Now, you’ll create a subset based on the Google font Latin subset you saw earlier. This subset includes a ton of useful characters, so it’s a good starting point if you want to cover text in English, French, Spanish, or German. This is the command you’ll run:

pyftsubset \
  WorkSans-Regular.ttf \
  --output-file="WorkSans-Regular-google-fonts.woff2" \
  --flavor=woff2 \
  --layout-features=* \
  --unicodes="U+0000-00FF,U+0131,U+0152-0153,U+02BB-02BC,\
  U+02C6,U+02DA,U+02DC,U+2000-206F,U+2074,U+20AC,\
  U+2122,U+2191,U+2193,U+2212,U+2215,U+FEFF,U+FFFD"

The new file has a file size of 35kb with hinting and 26kb without hinting. For comparison, the Latin Google font subset is around 23kb with hinting but with a limited number of layout features. To reduce the file size, you can limit the layout features too:

pyftsubset\
  WorkSans-Regular.ttf \
  --output-file="WorkSans-Regular-google-fonts-2.woff2" \
  --flavor=woff2 \
  --layout-features="kern,liga,clig,calt,ccmp,locl,mark,mkmk,\
  onum,pnum,smcp,c2sc,frac,lnum,tnum,subs,sups,\
  cswh,dlig,ss01,ss03,zero"\
  --unicodes="U+0000-00FF,U+0131,U+0152-0153,U+02BB-02BC,\
  U+02C6,U+02DA,U+02DC,U+2000-206F,U+2074,U+20AC,\
  U+2122,U+2191,U+2193,U+2212,U+2215,U+FEFF,U+FFFD"

Let’s explain what the layout features do first (you already saw most of them):

  • In the first line, kern, liga, calt, clig are important features and they are on by default in CSS. ccmp, locl, mark, and mkmk are also always turned on by the browser, but their difference is that you can’t turn them off with CSS.
  • In the second line, we have some essential features for body text like the onum, pnum, smcp, and c2sc that you saw in a previous section, and for representing numbers we have frac, lnum, tnum, sups, and subs.
  • In the third line, we have some decoration features like cswh, dlig, ss01, ss03, and zero. If you don’t plan to use these features, you can remove them because some take a lot of space (cswh).

By including only these features, we save 4kb and we go down to 31kb. We were not able to reduce the size much because small caps and contextual swashes add a lot of extra glyphs to the file.

If you open the new file in wakamaifondue, you’ll notice that only the locl feature is kept from the features that you can’t change in CSS, and mark, ccmp, mkmk were thrown away. You can find an explanation for that in wakamaifondue:

These are the required layout features: features that are always turned on by the system that renders the font. You can’t turn them off in CSS. They may be applicable only to certain language scripts or specific languages, or in certain writing modes.

Because this process can get messy, If you don’t want to remove essential features from the original file by specifying the layout features explicitly with the = operator, you can instead add to the default features with += or remove with -=. The default features are calt, ccmp, clig, curs, dnom, frac, kern, liga, locl, mark, mkmk, numr, rclt, rlig, rvrn, and all features required for script shaping. Type pyftsubset --layout-features=? to see all the defaults features. So a safer alternative to the previous command is the following:

pyftsubset\
  WorkSans-Regular.ttf \
  --output-file="WorkSans-Regular-google-fonts-2.woff2" \
  --flavor=woff2 \
  --layout-features+="onum,pnum,smcp,c2sc,lnum,tnum,\
  subs,sups,cswh,dlig,ss01,ss03,zero"\
  --layout-features-="dnom,numr"\
  --unicodes="U+0000-00FF,U+0131,U+0152-0153,U+02BB-02BC,\
  U+02C6,U+02DA,U+02DC,U+2000-206F,U+2074,U+20AC,\
  U+2122,U+2191,U+2193,U+2212,U+2215,U+FEFF,U+FFFD"

Looking at the character table of the original file in wakamaifondue, I can see some cool characters we don’t have in the subset. I want to add all the arrows U+2190-21BB (← ↑ → ↓ ↔ ↕ ↖ ↗ ↘ ↙ ↩ ↪ ↺ ↻) and a hedgehog symbol () that’s on U+F8FF. To add those characters, change the unicodes option to the following:

pyftsubset\
  WorkSans-Regular.ttf \
  --output-file="WorkSans-Regular-google-fonts-3.woff2" \
  --flavor=woff2 \
  --layout-features="kern,liga,clig,calt,ccmp,locl,mark,mkmk,\
  onum,pnum,smcp,c2sc,frac,lnum,tnum,subs,sups,\
  cswh,dlig,ss01,ss03,zero"\
  --unicodes="U+0000-00FF,U+0131,U+0152-0153,U+02BB-02BC,\
  U+02C6,U+02DA,U+02DC,U+2000-206F,U+2074,U+20AC,\
  U+2122,U+2190-21BB,U+2212,U+2215,U+F8FF,U+FEFF,U+FFFD"

This increases the file size to 32kb but you now have some cool characters to work with.

You saw how to create a Latin subset with fewer characters and OpenType features. In the next section, you’ll remove even more characters to create an English-only subset.

English subset

Let’s now assume that you display only English on your site. If that’s the case, you’ll want to remove the characters used by other languages. Don’t overdo it though because you may display names or places from other languages. In the next section, you’ll verify what characters you use with glyphhanger so stay tuned.

We want to remove a big chunk from the Latin-1 supplement (U+0080-00FF), but we want to keep punctuation characters, symbols, and a couple of math characters: U+00D7 (×) and U+00F7 (÷). Long story short, from the following table, you’ll remove the highlighted characters:

20!21"22#23$24%25&26'27(28)29*2A+2B,2C-2D.2E/2F030131232333434535636737838939:3A;3B<3C=3D>3E?3F@40A41B42C43D44E45F46G47H48I49J4AK4BL4CM4DN4EO4FP50Q51R52S53T54U55V56W57X58Y59Z5A[5B\5C]5D^5E_5F`60a61b62c63d64e65f66g67h68i69j6Ak6Bl6Cm6Dn6Eo6Fp70q71r72s73t74u75v76w77x78y79z7A{7B|7C}7D~7E A0¡A1¢A2£A3¤A4¥A5¦A6§A7¨A8©A9ªAA«AB¬AC­AD®AE¯AF°B0±B1²B2³B3´B4µB5B6·B7¸B8¹B9ºBA»BB¼BC½BD¾BE¿BFÀC0ÁC1ÂC2ÃC3ÄC4ÅC5ÆC6ÇC7ÈC8ÉC9ÊCAËCBÌCCÍCDÎCEÏCFÐD0ÑD1ÒD2ÓD3ÔD4ÕD5ÖD6×D7ØD8ÙD9ÚDAÛDBÜDCÝDDÞDEßDFàE0áE1âE2ãE3äE4åE5æE6çE7èE8éE9êEAëEBìECíEDîEEïEFðF0ñF1òF2óF3ôF4õF5öF6÷F7øF8ùF9úFAûFBüFCýFDþFEÿFF

Outside of the Latin-1 supplement, you’ll also remove U+0131 (ı), U+0152-0153 (Œ œ), and the spacing modifier letters U+02B0-02FF.

This is the command that creates an English-only subset:

pyftsubset\
  WorkSans-Regular.ttf \
  --output-file="WorkSans-Regular-english.woff2" \
  --flavor=woff2 \
  --layout-features="kern,liga,clig,calt,ccmp,locl,mark,mkmk,\
  onum,pnum,smcp,c2sc,frac,lnum,tnum,subs,sups,\
  cswh,dlig,ss01,ss03,zero"\
  --unicodes="U+0000-00A0,U+00A2-00A9,U+00AC-00AE,U+00B0-00B7,\
  U+00B9-00BA,U+00BC-00BE,U+00D7,U+00F7,U+2000-206F,U+2074,U+20AC,\
  U+2122,U+2190-21BB,U+2212,U+2215,U+F8FF,U+FEFF,U+FFFD"

With those changes, the file size goes from 31kb to 27kb. In the next section, you see how to create a performance focused subset.

Minimal English subset

The goal here is to create a tiny subset that you can use for two-stage font loading. As a result, you’ll want to keep the file size as small as possible—under 10kb would be ideal. The English subset in the previous section won’t do because it still has a lot of characters you probably won’t use. Instead, I suggest to install glyphhanger and use it to crawl your website to see what characters you use. If you want something more generic, but with fewer characters than in the previous section, crawl a popular English website with good typography.

First, you want to install glyphhanger globally:

npm install -g glyphhanger

You can also install it as a dev dependency in your project with yarn add -D glyphhanger and run it from there with yarn glyphhanger.

Then, crawl your development server (or your production site), after adding the correct port:

glyphhanger http://localhost:3000 --spider --spider-limit=5

I will now crawl my website to see what characters I’m using:

glyphhanger https://markoskon.com --spider --spider-limit=5

These are the Unicode ranges I get back after running the command above:

U+A,U+20,U+22,U+25-29,U+2B-3E,U+41-59,U+61-7A,U+F3,
U+3A4,U+2014,U+2019,U+201C,U+201D,U+2020,U+2021,
U+2026,U+20AC,U+2190,U+F8FF,U+1F426,U+1F4D7,U+1F4F0,U+1F525
  • To be sure you get all the characters back, you can crawl all your pages with --spider-limit=0, but this will take some time.

  • You can get the characters a specific font uses with the family option (e.g. --family="monospace") or specify multiple families with --family="Georgia,Consolas".

I will now pass the string I got in the --unicodes option (after removing the 4 emojis at the end) to create an English subset:

pyftsubset\
  WorkSans-Regular.ttf \
  --output-file="WorkSans-Regular-glyphhanger.woff2" \
  --flavor=woff2 \
  --layout-features="kern,liga,clig,calt,ccmp,locl,mark,mkmk,\
  onum,pnum,smcp,c2sc,frac,lnum,tnum,subs,sups,\
  cswh,dlig,ss01,ss03,zero"\
  --unicodes="U+A,U+20,U+22,U+25-29,U+2B-3E,U+41-59,U+61-7A,U+F3,\
U+3A4,U+2014,U+2019,U+201C,U+201D,U+2020,U+2021,\
U+2026,U+20AC,U+2190"

This results in a font file at around 20kb. This is still way over the 10kb goal. Let’s remove the hinting first because, in this case, size is more important:

pyftsubset\
  WorkSans-Regular.ttf \
  --output-file="WorkSans-Regular-minimal.woff2" \
  --flavor=woff2 \
  --layout-features="kern,liga,clig,calt,ccmp,locl,mark,mkmk,\
  onum,pnum,smcp,c2sc,frac,lnum,tnum,subs,sups,\
  cswh,dlig,ss01,ss03,zero"\
  --unicodes="U+A,U+20,U+22,U+25-29,U+2B-3E,U+41-59,U+61-7A,U+F3,\
U+3A4,U+2014,U+2019,U+201C,U+201D,U+2020,U+2021,\
U+2026,U+20AC,U+2190"\
  --no-hinting\
  --desubroutinize

The font file without hinting is around 14kb which is still not good enough.

The goal of two-stage font loading is to reduce or eliminate the time the browser displays the invisible or the fallback font. To be more precise, we have 3 fonts: the fallback (e.g. sans-serif), the small fake (e.g. Work Sans Minimal), and the real font (e.g. Work Sans). We also have 2 transitions: from the fallback to the fake and from the fake to the real font. Here, we’re not concerned with the first transition, so we’ll focus on the second. In this transition, you’ll want to reduce the moving of the text, and, to do that, you’ll want to keep all the layout features that affect spacing. All the features we have here affect spacing in some way, so if you plan to use them all, you can’t reduce the size any further.

But there’s a good chance that you won’t use all of them, or that you’ll use them sparingly, and, as a result, they won’t cause a lot of text reflow. So if you keep only the default features (minus the swashes) along with lnum, tnum (which are the default number forms):

pyftsubset\
  WorkSans-Regular.ttf \
  --output-file="WorkSans-Regular-minimal-super-minimal.woff2" \
  --flavor=woff2 \
  --layout-features+="lnum,tnum"\
  --layout-features-="cswh"\
  --unicodes="U+A,U+20,U+22,U+25-29,U+2B-3E,U+41-59,U+61-7A,U+F3,\
U+3A4,U+2014,U+2019,U+201C,U+201D,U+2020,U+2021,\
U+2026,U+20AC,U+2190"\
  --no-hinting\
  --desubroutinize

The command above creates a file that’s 8kb in size which is under the 10kb goal.

Extra: Find what characters a language uses

This is what you can do when you want to cover a language, but you are not sure what characters to add in your subsets.

  • There is the Unicode Common Locale Data Repository (CLDR) that shows, among other things, what characters a language uses. For example, these are the characters for the French language. I also made an app a while ago that shows that data in a table, but it’s not 100% correct yet.
  • You can crawl a popular website that uses that language with glyphhanger, as you saw in this section.
  • After you create your subsets, you can see if you cover all the characters with the Firefox developer tools. Open the dev tools, go to the inspector tab, and, at the top right, select the Fonts tab that’s sitting next to Layout/Computed/Changes.

Multiple subsets by script

A repo with the examples of this section

Instead of creating a single subset and throwing away the rest of the characters, you can keep them all by chunking the initial file into multiple files. You can then take advantage of the unicode-range descriptor—that you can use inside your @font-face—to instruct the browser to download only the chunk it needs to render the text. The easiest way to chunk the file is follow the Unicode blocks; for example, you can create the following subsets:

SetRange
LatinU+0000-00FF
Latin-extended-aU+0100-017F
Latin-extended-bU+0180-024F
Rest middleU+0259-03C0
Latin-extended-additionalU+1E00-1EFF
RestU+2000-FB02

In this case, the pyftsubset commands will be:

pyftsubset\
  WorkSans-Regular.ttf \
  --output-file="WorkSans-Regular-latin.woff2" \
  --flavor=woff2 \
  --layout-features="kern,liga,clig,calt,ccmp,locl,mark,mkmk,\
  onum,pnum,smcp,c2sc,frac,lnum,tnum,subs,sups,\
  cswh,dlig,ss01,ss03,zero"\
  --unicodes="U+0000-00FF" &&

pyftsubset\
  WorkSans-Regular.ttf \
  --output-file="WorkSans-Regular-latin-extended-a.woff2" \
  --flavor=woff2 \
  --layout-features="kern,liga,clig,calt,ccmp,locl,mark,mkmk,\
  onum,pnum,smcp,c2sc,frac,lnum,tnum,subs,sups,\
  cswh,dlig,ss01,ss03,zero"\
  --unicodes="U+0100-017F" &&

pyftsubset\
  WorkSans-Regular.ttf \
  --output-file="WorkSans-Regular-latin-extended-b.woff2" \
  --flavor=woff2 \
  --layout-features="kern,liga,clig,calt,ccmp,locl,mark,mkmk,\
  onum,pnum,smcp,c2sc,frac,lnum,tnum,subs,sups,\
  cswh,dlig,ss01,ss03,zero"\
  --unicodes="U+0180-024F" &&

pyftsubset\
  WorkSans-Regular.ttf \
  --output-file="WorkSans-Regular-rest-middle.woff2" \
  --flavor=woff2 \
  --layout-features="kern,liga,clig,calt,ccmp,locl,mark,mkmk,\
  onum,pnum,smcp,c2sc,frac,lnum,tnum,subs,sups,\
  cswh,dlig,ss01,ss03,zero"\
  --unicodes="U+0259-03C0" &&

pyftsubset\
  WorkSans-Regular.ttf \
  --output-file="WorkSans-Regular-latin-extended-additional.woff2" \
  --flavor=woff2 \
  --layout-features="kern,liga,clig,calt,ccmp,locl,mark,mkmk,\
  onum,pnum,smcp,c2sc,frac,lnum,tnum,subs,sups,\
  cswh,dlig,ss01,ss03,zero"\
  --unicodes="U+1E00-1EFF" &&

pyftsubset\
  WorkSans-Regular.ttf \
  --output-file="WorkSans-Regular-rest.woff2" \
  --flavor=woff2 \
  --layout-features="kern,liga,clig,calt,ccmp,locl,mark,mkmk,\
  onum,pnum,smcp,c2sc,frac,lnum,tnum,subs,sups,\
  cswh,dlig,ss01,ss03,zero"\
  --unicodes="U+2000-FB02"

And this is the CSS code that uses the files, assuming you serve them from a fonts folder:

/* Regular */
@font-face {
  font-family: "Work Sans";
  font-display: swap;
  font-weight: 400;
  font-style: normal;
  src: url("/fonts/WorkSans-Regular-latin.woff2") format("woff2");
  unicode-range: U+0000-00FF;
}
@font-face {
  font-family: "Work Sans";
  font-display: swap;
  font-weight: 400;
  font-style: normal;
  src: url("/fonts/WorkSans-Regular-latin-extended-a.woff2") format("woff2");
  unicode-range: U+0100-017F;
}
@font-face {
  font-family: "Work Sans";
  font-display: swap;
  font-weight: 400;
  font-style: normal;
  src: url("/fonts/WorkSans-Regular-latin-extended-b.woff2") format("woff2");
  unicode-range: U+0180-024F;
}
@font-face {
  font-family: "Work Sans";
  font-display: swap;
  font-weight: 400;
  font-style: normal;
  src: url("/fonts/WorkSans-Regular-rest-middle.woff2") format("woff2");
  unicode-range: U+0259-03C0;
}
@font-face {
  font-family: "Work Sans";
  font-display: swap;
  font-weight: 400;
  font-style: normal;
  src: url("/fonts/WorkSans-Regular-latin-extended-additional.woff2")
    format("woff2");
  unicode-range: U+1E00-1EFF;
}
@font-face {
  font-family: "Work Sans";
  font-display: swap;
  font-weight: 400;
  font-style: normal;
  src: url("/fonts/WorkSans-Regular-rest.woff2") format("woff2");
  unicode-range: U+2000-FB02;
}

You’ll want to do the same thing for the rest of the weights you want to support and for the italics. This method doesn’t work on Internet Explorer because it downloads all the files. The site is still usable but it’s kind of a disaster performance-wise.

The following table shows the sizes of the subsets after running the commands:

FilenameSize
WorkSans-Regular-latin.woff229kb
WorkSans-Regular-latin-extended-a.woff215kb
WorkSans-Regular-latin-extended-b.woff210kb
WorkSans-Regular-rest-middle.woff25kb
WorkSans-Regular-latin-extended-additional.woff213kb
WorkSans-Regular-rest.woff212kb

This means that, for English content, the browser will download the Latin version (29kb) and maybe the rest (12kb) which is 41kb in total for the regular normal weight. If you have italics and a bold version, the browser will download 123kb (41 × 3) in total, which is not ideal.

A different way to chunk the initial file is to use the subset you created earlier—that has the English characters, the punctuation, and the symbols—and create subsets for the rest blocks. See the following list for an example:

  • English/punctuation/symbols: U+0000-00A0, U+00A2-00A9, U+00AC-00AE, U+00B0-00B7, U+00B9-00BA, U+00BC-00BE, U+00D7, U+00F7, U+2000-206F, U+2074, U+20AC, U+2122, U+2190-21BB, U+2212, U+2215, U+F8FF, U+FEFF, U+FFFD
  • French/German/etc.: U+00A1, U+00AA-00AB, U+00AF, U+00B8, U+00BB, U+00BF-00D6, U+00D8-00F6, U+00F8-00FF, U+0131, U+0152-0153, U+02B0-02FF
  • Latin-extended-a: U+0100-0130, U+0132-0151, U+0154-017F
  • Latin-extended-b: U+0180-024F
  • Latin-extended-additional: U+1E00-1EFF
  • The rest: U+0259, U+0300-03C0, U+2070-2073, U+2075-20AB, U+20AD-2121, U+2123-218F, U+21BC-2211, U+2213-2214, U+2216-F8FE, U+FB01-FB02

These are the new commands:

pyftsubset\
  WorkSans-Regular.ttf \
  --output-file="WorkSans-Regular-english.woff2" \
  --flavor=woff2 \
  --layout-features="kern,liga,clig,calt,ccmp,locl,mark,mkmk,\
  onum,pnum,smcp,c2sc,frac,lnum,tnum,subs,sups,\
  cswh,dlig,ss01,ss03,zero"\
  --unicodes="U+0000-00A0,U+00A2-00A9,U+00AC-00AE,U+00B0-00B7,\
  U+00B9-00BA,U+00BC-00BE,U+00D7,U+00F7,U+2000-206F,U+2074,U+20AC,\
  U+2122,U+2190-21BB,U+2212,U+2215,U+F8FF,U+FEFF,U+FFFD" &&

pyftsubset\
  WorkSans-Regular.ttf \
  --output-file="WorkSans-Regular-rest-latin.woff2" \
  --flavor=woff2 \
  --layout-features="kern,liga,clig,calt,ccmp,locl,mark,mkmk,\
  onum,pnum,smcp,c2sc,frac,lnum,tnum,subs,sups,\
  cswh,dlig,ss01,ss03,zero"\
  --unicodes="U+00A1,U+00AA-00AB,U+00AF,U+00B8,U+00BB,U+00BF-00D6,\
  U+00D8-00F6,U+00F8-00FF,U+0131,U+0152-0153,U+02B0-02FF" &&

pyftsubset\
  WorkSans-Regular.ttf \
  --output-file="WorkSans-Regular-latin-extended-a.woff2" \
  --flavor=woff2 \
  --layout-features="kern,liga,clig,calt,ccmp,locl,mark,mkmk,\
  onum,pnum,smcp,c2sc,frac,lnum,tnum,subs,sups,\
  cswh,dlig,ss01,ss03,zero"\
  --unicodes="U+0100-0130,U+0132-0151,U+0154-017F" &&

pyftsubset\
  WorkSans-Regular.ttf \
  --output-file="WorkSans-Regular-latin-extended-b.woff2" \
  --flavor=woff2 \
  --layout-features="kern,liga,clig,calt,ccmp,locl,mark,mkmk,\
  onum,pnum,smcp,c2sc,frac,lnum,tnum,subs,sups,\
  cswh,dlig,ss01,ss03,zero"\
  --unicodes="U+0180-024F" &&

pyftsubset\
  WorkSans-Regular.ttf \
  --output-file="WorkSans-Regular-latin-extended-additional.woff2" \
  --flavor=woff2 \
  --layout-features="kern,liga,clig,calt,ccmp,locl,mark,mkmk,\
  onum,pnum,smcp,c2sc,frac,lnum,tnum,subs,sups,\
  cswh,dlig,ss01,ss03,zero"\
  --unicodes="U+1E00-1EFF" &&

pyftsubset\
  WorkSans-Regular.ttf \
  --output-file="WorkSans-Regular-rest.woff2" \
  --flavor=woff2 \
  --layout-features="kern,liga,clig,calt,ccmp,locl,mark,mkmk,\
  onum,pnum,smcp,c2sc,frac,lnum,tnum,subs,sups,\
  cswh,dlig,ss01,ss03,zero"\
  --unicodes="U+0259,U+0300-03C0,U+2070-2073,U+2075-20AB,\
  U+20AD-2121,U+2123-218F,U+21BC-2211,U+2213-2214,U+2216-F8FE,\
  U+FB01-FB02"

And these are the file sizes:

FilenameSize
WorkSans-Regular-english.woff227kb
WorkSans-Regular-rest-latin.woff210kb
WorkSans-Regular-latin-extended-a.woff214kb
WorkSans-Regular-latin-extended-b.woff210kb
WorkSans-Regular-latin-extended-additional.woff213kb
WorkSans-Regular-rest.woff211kb

Now the browser will download 27kb for English content and 81kb in total for regular, italic, and bold.

A note for overlaps when using the unicode-range

Because the process of figuring out which characters go to which file is time-consuming and error-prone, you can use Unicode ranges that overlap. The following is a quote from the unicode-range descriptor in the CSS Font Module specification.

If the Unicode ranges overlap for a set of @font-face rules with the same family and style descriptor values, the rules are ordered in the reverse order they were defined; the last rule defined is the first to be checked for a given character.

In other words, if you have overlaps in the Unicode ranges, put the frequently used subset last, the one that is most likely to be downloaded (e.g. English/punctuation/symbols), and the rest of the files above it. So you can create a complex English subset, and for the rest of the characters, create a subset for each Unicode block:

/* Regular */
@font-face {
  font-family: "Work Sans";
  font-display: swap;
  font-weight: 400;
  font-style: normal;
  src: url("/fonts/WorkSans-Regular-latin.woff2") format("woff2");
  unicode-range: U+0000-00FF;
}
@font-face {
  /* … */
  src: url("/fonts/WorkSans-Regular-latin-extended-a.woff2") format("woff2");
  unicode-range: U+0100-017F;
}
@font-face {
  /* … */
  src: url("/fonts/WorkSans-Regular-latin-extended-b.woff2") format("woff2");
  unicode-range: U+0180-024F;
}
@font-face {
  /* … */
  src: url("/fonts/WorkSans-Regular-latin-extended-additional.woff2")
    format("woff2");
  unicode-range: U+1E00-1EFF;
}
@font-face {
  /* … */
  src: url("/fonts/WorkSans-Regular-rest.woff2") format("woff2");
  unicode-range: U+0259-03C0, U+2000-FB02;
}
/* Place the English subset last */
@font-face {
  font-family: "Work Sans";
  font-display: swap;
  font-weight: 400;
  font-style: normal;
  src: url("/fonts/WorkSans-Regular-english.woff2") format("woff2");
  unicode-range: U+0000-00A0, U+00A2-00A9, U+00AC-00AE, U+00B0-00B7,
    U+00B9-00BA, U+00BC-00BE, U+00D7, U+00F7, U+2000-206F, U+2074,
    U+20AC, U+2122, U+2190-21BB, U+2212, U+2215, U+F8FF, U+FEFF, U+FFFD;
}

Recap

Let’s now make a quick recap of the key points of this post:

  • You saw why you may want to subset your fonts.
  • Characters and OpenType features are two of the most important things font files include. You also saw what some OpenType features do.
  • wakamaifondue is an online tool that can help you analyze font files. You used it to see what the Google font files contain.
  • You saw how to install and use pyftsubset to create custom font subsets. More specifically:
    • How to create a web font file with all the characters and features from the original.
    • How to create a Latin subset based on the Google fonts character list but with more layout features.
    • How to create an English only subset.
    • How to use glyphhanger to create a tiny subset for two-stage font loading.
    • Finally, how to keep all the characters by breaking up the initial file into chunks based on the Unicode block they belong to.

Feedback is welcome. This includes corrections, suggestions, and things to explain further.

Appendix

This section contains information that’s useful when working with subsets. You definitely don’t have to read it.

Conversions in JavaScript

  • See the character behind a Unicode hex number.
    1. Convert a hex string to a decimal (base-ten) number:
      Number.parseInt("A9", 16); // 169
    2. Convert that decimal number (or that code point) to a character:
      String.fromCharCode(169); // prints the copyright symbol: ©
      // or fromCodePoint()
  • Get the Unicode hex number for a character
    1. And the other way around, get the char code for the copyright symbol:
      "©".charCodeAt(); // 169
      // or codePointAt()
    2. Convert the char code to a hex string:
      Number(169).toString(16); // A9 or U+00A9

Manual conversions

Convert a hex number to a decimal with pen and paper. The available digits are from 0 to F; A is equal to 10; F is equal to 15.

  • FF = 15×16¹ + 15×16⁰
       = 240    + 15
       = 255
  • BB8 = 11×16² + 11×16¹ + 8×16⁰
        = 2816   + 176    + 8
        = 3000
  • FFEF = 15×16³ + 15×16² + 14×16¹ + 15×16⁰
         = 61440  + 3840   + 224    + 15
         = 65519

Unicode ranges to characters

JavaScript code that takes as input a bunch of Unicode ranges and outputs the characters. 100% not tested.

`U+20,U+27-29,U+2C-2E,U+30-3B,U+41-49,U+4B-50,U+52-56,U+59,
U+5A,U+61-70,U+72-7A,U+7C,U+A0,U+A9,U+107,U+10C,U+10D,U+111,
U+161,U+17E`
  .replace(/U\+/g, "")
  .split(",")
  .reduce((res, item) => {
    if (/-/.test(item)) {
      const [first, last] = item.split("-").map((i) => parseInt(i, 16));
      let numbers = [];
      for (let i = first; i <= last; i++) {
        numbers.push(String.fromCharCode(i));
      }
      return [...res, { range: item, characters: numbers.join(" ") }];
    } else {
      return [
        ...res,
        {
          range: item,
          characters: String.fromCharCode(parseInt(item, 16)),
        },
      ];
    }
  }, []);

Create a Unicode table

Display the Unicode Basic Multilingual Plane (BMP) characters (U+0000-FFFF) in a few lines of code:

// FFFF is 65535 in decimal
new Array(65536)
  .fill(1)
  .map((one, index) => one * index)
  .map((codePoint) => String.fromCodePoint(codePoint));

// or with code points embeded:
new Array(65536)
  .fill(1)
  .map((one, index) => one * index)
  .map((codePoint) => ({
    character: String.fromCodePoint(codePoint),
    codePoint,
    unicode: Number(codePoint)
      .toString(16)
      .padStart(6, "U+000")
      .toUpperCase(),
  }));

Unicode blocks & explanations

See also the Unicode Names List Charts. It presents the Unicode blocks in tables with information for each character.

  • U+0000-00FF (256 characters). Includes control characters (U+0000-001F, 32 characters), basic Latin (U+0020-007F, 96 characters; includes English and general punctuation), the Latin-1 supplement (U+0080-00FF, 128 characters for French, Italian, Spanish, German, and Icelandic).

  • U+0100-017F (Latin Extended-A, 128 characters). Supports Maltese, Polish, Czech, Hungarian, Serbo-Croatian, Turkish, and more.

  • U+0180-024F (Latin Extended-B, 208 characters). Supports African, historic, additions for Romanian, Croatian, Slovenian.

  • U+0250-02FF = 174: Phonetic stuff, and more specifically: IPA Extensions (U+0250-02AF = 96) and spacing modifier letters (U+02B0-02FF = 80). Some languages use characters from these sets.

  • U+0300-036F = 112. Combining diacritical marks. Some languages use characters from this set—see the Unicode PDF. As I understand this, you use them to type text in many languages, including polytonic Greek. When you render text, though, these characters merge to form characters from the script Unicode blocks.

  • U+0370-03FF (Greek and Coptic = 144, neohellenic monotonic)

  • U+0400-04FF (Cyrillic = 256).

  • U+0500-052F (Cyrillic Supplement = 48).

  • U+1D00-1D7F (Phonetic Extensions = 128).

  • U+1D80-1DBF (Phonetic Extensions Supplement = 64).

  • U+1DC0-01DFF (Combing Diacritical Marks Supplement = 64).

  • U+1E00-1EFF (Latin Extended Additional = 256).

  • U+1F00-1FFF (Greek extended = 256, polytonic Greek)

  • U2000-FEFF Rest goodies. This includes:

    • general punctuation (U+2000-206F).
    • superscripts/subscripts (U+2070-209F).
    • currency symbols (U+20A0-20CF, or just the euro 20AC).
    • Letter-like symbols (U+2100-214F, includes TM, Tel. No symbols).
    • Arrows (U+2190-21FF),
    • Mathematical operators (U+2200-22FF, minus, division signs)
    • and many more.

Other Notes

  • Unicode property escapes in JavaScript regular expressions seem very useful, and more specifically: General_Category, Script, and Script_Extensions.
  • There are standardized subsets for Unicode but they are big.
  • There are other Latin sets later in the Unicode table, such as the Latin Extended Additional, and the Latin Extended-C/D/E.
  • You can find punctuation in ASCII, Latin-1 supplement, general punctuation, and supplemental punctuation (U+2E00-2E7F).
  • The “ugly” quote lives in ASCII, the triangle (Italian) in Latin-1 supplement, and the curly quotes in general punctuation.
  • Unicode has small caps in IPA Extensions, Phonetic Extensions, and Latin Extended-D, but they are meant for use in the phonetic alphabet (?). Use OpenType small caps instead.
  • Some typefaces have superscripts and subscripts in Unicode while others have them in OpenType features. Prefer OpenType because they may also include letters. Subscripts and superscripts in Wikipedia.
  • There are blocks for ancient Greek numbers, linear A, and linear B.

Other things to read

Popular

Previous/Next