Reading TTF files and rasterizing them using a handmade approach, Part 1: TTF File reading

2022-08-26

I had been wanting to write this for over a year, yes you read that right a year, I will explain why at the end as its an optional read.

Introduction

I will write this in two parts first reading the TTF and then rasterization, a nice break point between the two processes. Lets start

Part 1: TTFs and what have you.

Notes:

the main reference im using is (https://developer.apple.com/fonts/Type-Reference-Manual/RM06/Chap6.html), i have done my best to write it in way that made sense to me and how i understood it, if you find any mistakens please do let me know.

all the code snippets im using in different sections are going to be ollected in one final single file that can be compiled and ran. ofc this epends on the platform you are using I will use the std libraries in C such s open and such so it becomes less painful to run on differnt platforms.

I will use the font Envy Code R (https://damieng.com/blog/2008/05/26/envy-code-r-preview-7-coding-font-released) for this post.

Most of the code here is for education and will contain oversight when it comes to memory management, type sizes, error checking if a font is well formed or not and other C related code. I assume C compiler on a little endian platform where the size of int is 32 bits, char is 8 bits, short is 16, I know there are standard libraries that contain those specific type sizes but im not going to use them.

Terminology

I'm not a font designer nor even remotely someone who knows too much about them but there are some terminology we should know before going forward as they might be confusing to some here are the terms which are important:

typeface: this is basically the design of the font and what it actually looks like for example Arial is a typeface Arial Regular is a variant of the Arial typeface which you call a font, Arial Bold is another font.
Font: a variant of a typeface such as a bold or italic variant of a specific typeface. a Font is made up of collection of glyphs.
Character: this is a specific symbol that represents a letter, number or some other sign (such as dollar sign $) in written language.
Glyph: a shape/symbol that represents a character in writing having a particular form. for example the glyph for the letter A in the Envy Code R typeface looks like this
Codepoint: an integer that represents a Character index in an encoding scheme, the two most common encoding schemes are (ASCII and Unicode) ASCII uses one byte for the Codepoint hence it only has 255 Codepoints, unicode uses multiple bytes for its Codepoints thus it can encode a much larger range.
Glyph Index: this is an interger that gives you the location of a Glyph inside a font, this changes depending on the font unlike Codepoints which are specific to the encoding scheme (Unicode or ASCII) rather than the font.

Our main point of knowing these terms is that we distinguish a Glyph from a Character in that a Glyph is a representation of a Character in a particular Typeface as in the uppercase A character doesn't change as there is one uppercase A letter in the english language while a Glyph for the uppercase A are plenty and depends on the Typeface

Intro

Truetype Fonts are files that contain information regarding a particular typeface, they contain each of the typeface glyph's outlines, these outlines are bezier curves. The file also includes specific information about grid-fitting a glyph also known as (hinting), its positioning and a host of other information about the typeface. Our main business will be reading the outline data and rasterize, this means this post won't include hinting and other font stuff like shaping and substitutions that is needed for non-english languages. I will do a seperate post on substitution and simple shaping at a later date.

TTFs files use a format that is called SFNT (Spline Font or Scalable Font) which is a file format that houses other fonts such as PostScript fonts, OpenType fonts, TrueType fonts, they have the general overall structure but differnt in certain parts. SFNT was supersceded by OpenType Collections which is an extended version of SFNT.

TTF files do not contain any complex compression thus making most of our work just byte reading, all the data is in big endian.

The adventure begins

before we even begin, lets write some small snippets of code to help us make life easier.

first we need to read an entire file:

char* read_file(char *file_name, int* file_size) { 
  if(strlen(file_name) > 0) {
    FILE* file = fopen(file_name, "rb");
    if(file) {
      fseek(file, 0, SEEK_END);
      int size = ftell(file);
      fseek(file, 0, SEEK_SET);

      if(file_size) { *file_size = size; }
      char *file_content = (char*)malloc(size+1);
      int read_amount = fread(file_content, size, 1, file);
      file_content[size] = '\0';
      if(read_amount) {
        fclose(file);
        return file_content;
      }
      free(file_content);
      fclose(file);
      return NULL;
    }
  }
  return NULL;
}

Now we need to read big endian values, we assume mem is char*, it can also be a uint8_t which is one of the standard int sizes but less dependency on libraries the better, we technically only need 16 and 32 byte reading as you don't have bigger than 32 bit sized types that we care about. the only 64 bit type is longdatetime which is the usual unix epoch timestamp.

we are also going to define some data types to make writing unsigned int a bit shorter

typedef unsigned char u8;
typedef char i8;

typedef unsigned short u16;
typedef short i16;

typedef unsigned int u32;
typedef int i32;

#define READ_BE16(mem) ((((u8*)(mem))[0] << 8) | (((u8*)(mem))[1]))
#define READ_BE32(mem) ((((u8*)(mem))[0] << 24) | (((u8*)(mem))[1] << 16) | (((u8*)(mem))[2] << 8) | (((u8*)(mem))[3]))
#define P_MOVE(mem, a) ((mem) += (a))

#define READ_BE16_MOVE(mem) (READ_BE16((mem))); (P_MOVE((mem), 2))
#define READ_BE32_MOVE(mem) (READ_BE32((mem))); (P_MOVE((mem), 4))

if you notice I haven't distinguished betwween signed and unsigned macros because the reading of the data is the same but the interpertation of the data is dependent on the type read into.

The font files are made up of tables that contain different information. each table has a 4 byte tag which can be seen as an uint32 or 4 characters, these tags are not stored with the table data rather they are stored in a special table called the table directory which comes at he start of the file and identifies the offsets of all the other tables along with their tag.

The file structure starts with a table called the font directoy and it must be the first table in the file, it consists of 2 parts; the offset subtable and table directory, offset subtable contains info on the amount of tables and provides some precalculated numbers to help with binary searching for a particular table, that is if the number of tables is huge or if you just want a faster search than linear, this binary search is very useful when you want to search for a table and you haven't extracted that tables information into a specific data structure and dont' read all the tables in one go. table directory is an array that contains the tags of the table and their location offsets in bytes, much like a directory of a book.

this is how we will define the struct for the font directory

typedef struct  {
  offset_subtable off_sub;
  table_directory* tbl_dir;
} font_directoy;

Offset Subtable

Here is the structure for the Offset subtable:

typedef struct {
  u32  scaler_type;
  u16  numTables;
  u16  searchRange;
  u16  entrySelector;
  u16  rangeShift;
} offset_subtable;

the most important piece of info we need is of course the numTables this will dicatet how many tables there.

the scaler_type is a tag that tells what this SFNT file contains

a value of 0x74727565 ('true' as chars) or 0x00010000 are considered TrueType fonts
a value of 0x4F54544F ('OTTO' as chars) means its an OpenType with PostScript outlines
a value of 0x74797031 ('typ1' as chars) is type 1 postscript font files which was made by adobe in the 80s

the focus of this post is about the first values, the other values are different and not as common.

the searchRange, entrySelector and rangeShift are all used for a binary search for a specific tag which I will not indulge here as a simple linear is fine for me.

Table Directory

Here is the structure:

typedef struct {
  union {
    char tag_c[4];
    u32  tag;
  };
  u32  checkSum;
  u32  offset;
  u32  length;
} table_directory;

the offset is taken as an offset from the start of the file and not from table_directory

what follows the Offset table is actually an array of (table_directory) the number of items in the array is numTables from offset_subtable. the union is for easier debugging as the 4 character version is easier to read as humans one small note about the tag if you read them as big endians they come out reversed so a tag of cmap becomes pamc

Now that we have at least two structs, lets read some data, we begin by reading font_directory which is nothing else than reading the offset_subtable and table_directory,

void read_font_directory(char** mem, font_directory* ft) {
  read_offset_subtable(mem, &ft->off_sub); 
  read_table_directory(mem, &ft->tbl_dir, ft->off_sub.numTables); 
}

the reason for the char** is that we are going to use it as a pointer inside the data stream of the file.

here is the read_offset_subtable:

void read_offset_subtable(char** mem, offset_subtable* off_sub) {
  char* m = *mem;
  off_sub->scaler_type = READ_BE32_MOVE(m);
  off_sub->numTables = READ_BE16_MOVE(m);
  off_sub->searchRange = READ_BE16_MOVE(m);
  off_sub->entrySelector = READ_BE16_MOVE(m);
  off_sub->rangeShift = READ_BE16_MOVE(m);
  *mem = m;
}

et voila we have the offset_table

very simple and straight to the point, then we read the table_directory:

void read_table_directory(char** mem, table_directory** tbl_dir, int tbl_size) {
  char* m = *mem;
  *tbl_dir = (table_directory*)calloc(1, sizeof(table_directory)*tbl_size);

  for(int i = 0; i < tbl_size; ++i) {
    table_directory* t = *tbl_dir + i;
    t->tag = READ_BE32_MOVE(m);
    t->checkSum = READ_BE32_MOVE(m);
    t->offset = READ_BE32_MOVE(m);
    t->length = READ_BE32_MOVE(m);
  }

  *mem = m;
}

we first get the size of the table_directory and the rest is easy reading.

to make it easier to see what we have read we are going to write a print function:

void print_table_directory(table_directory* tbl_dir, int tbl_size) {
  printf("#)\ttag\tlen\toffset\n");
  for(int i = 0; i < tbl_size; ++i) {
    table_directory* t = tbl_dir + i;
    printf("%d)\t%c%c%c%c\t%d\t%d\n", i+1,
        t->tag_c[3], t->tag_c[2],
        t->tag_c[1], t->tag_c[0],
        t->length, t->offset);
  }
}

(Envy Code R has 15 tables)

now that is much better to read.

the tables that we need

given that we are ready to read some more data we move on onto our main quest,

All TTF files (TrueType variants) are required to have the following tables in them.

cmap: this contains the mapping between a paricular Codepoint to Glyph index, i.e: given a character it tells us which Glyph is used to represent that Character in the Typeface
glyf: thie contains the actual Glyph outline data, all these are splines.
head: also called Font Header, it contains general information about a font.
hhea: horizontal header table, contains information about horizontal positioning of a font and its glyphs.
hmtx: an array that has horizontal metrics for each glyph, specifically advance width and left side bearing.
loca: glyph location offsets, this table contains the offset of each glyph's data, its used together with the 'glyf' table this table is used to locate the Glyph data inside the glyf table.
maxp: Maximum Profile, this table contains the extends of the font, i.e: the number of glyphs, biggest number of points in a Glyph, etc. basically it gives you the memory constraints of the font such as you can allocate a buffer to contain the Glyph data and this table tells you how big you should make it so it can contain the biggest Glyph.
name: this table contains the name of the font and human readable data.
post: we won't touch this table so we are going to skip it.

as you can see these tables all work together to get you the data, glyf table contains the actual data, loca contains the offsets into the glyf table which contain the Glyph outline data, cmap gives you the Codepoint to Glyph mapping so you can get a Glyph Index to use with the loca table, head, hhea, hmtx together give you information about the font layouting and positioning.

we start with cmap one of the more fun tables to work with

cmap entered the channel.

This table is a bit complicated if you come across it the first time so I'm gonna try to explain it first then we move on to the structure and reading.

this table is used when you want to map a specific Codepoint to a Glyph index, The cmap table contains a number of subtables which are called encoding subtables because different platforms have developed different conventions when dealing with fonts these encoding subtables help which encoding format should be used on which platform which just means there are different ways the Character to Glyph mapping is encoded and it depends on the platform the font is intended to be used on, a font that is intended for more than one platform would have multiple encoding subtables.

the encoding tables each have a specific format out of nine format that are part of the standard, we will look at one format which is format 4 for now, lets get some structures and begin reading.

cmap begins with a version number and then the number of encoding subtables

typedef struct {
  u16 version;
  u16 numberSubtables;
} cmap;

the encoding subtables have the following structure,

typedef struct {
  u16 platformID;
  u16 platformSpecificID;
  u32 offset;
} cmap_encoding_subtable;

the offset here is an offset from the start of the cmap table not from the start of the file. because the encoding subtables follow the cmap we are going to change the cmap struct into

typedef struct {
  u16 version;
  u16 numberSubtables;
  cmap_encoding_subtable* subtables;
} cmap;

the platformID and platformSpecificID tell you which platform this encoding subtable was intended for. platformID can have these values:

0 is called Unicode Platform and is used when the platfrom supports unicode
1 is to indicate Mac
2 is reserved and not used
3 is Microsoft encoding

both TrueType and OpenType specs recommend using the Unicode Platfrom as its more generally supported. most fonts will provide the Unicode Platform hence that would be our main focus.

when the platformID is 0 the platfromSpecificID can have the following values:

0, 1, 2 - these indicate Unicode standard version 1 and 1.1 respectively and a value of 2 is deprecated.
3 - Unicode 2.0 with Basic Multilingual Plane only (BMP this is called Plane 0 which contains the languages and a lot of symbols),
4 - Unicode 2.0 with non-BMP allowed
5 - Unicode Variation Sequences, these represent variation of a Glyph as an example (from unicode.com FAQ) ≠ which is a variation of =.
6 - very similar to (4) but can use more formats for the cmap than number 4

all we need to take away from all of this (which is still confusing) is that the most common platformID and platformSpecificID combination is (0, 3), so we are going to focus on that combination only, supporting the other combinations is just a practice in being more complete and doesn't contribute too much to the overall approach

all the encoding subtables point to a cmap which has a specific format. all the formats have some common parts, they all start with a u16 format field, followed by the length of this cmap table then the language code which is only used on Macintosh platformID which in return is discouraged to be used.

so the start of each format is this:

typedef struct {
  u16 format;
  u16 length;
  u16 language;
};

I have purposly left out the name for this struct because its not a complete as each format has a different structure.

Because there are nine formats available we will focus on what our font (envy.ttf) contains when it comes to encoding subtables

here is the function to read cmap:

void read_cmap(char* mem, cmap* c) {
  char *m = mem;
  c->version = READ_BE16_MOVE(m);
  c->numberSubtables = READ_BE16_MOVE(m);

  c->subtables = (cmap_encoding_subtable*) calloc(1, sizeof(cmap_encoding_subtable)*c->numberSubtables);
  for(int i = 0; i < c->numberSubtables; ++i) {
    cmap_encoding_subtable* est = c->subtables + i;
    est->platformID = READ_BE16_MOVE(m);
    est->platformSpecificID = READ_BE16_MOVE(m);
    est->offset = READ_BE32_MOVE(m);
  }

}

and one to print it

void print_cmap(cmap* c) {
  printf("#)\tpId\tpsID\toffset\ttype\n");
  for(int i = 0; i < c->numberSubtables; ++i) {
    cmap_encoding_subtable* cet = c->subtables + i;
    printf("%d)\t%d\t%d\t%d\t", i+1, cet->platformID, cet->platformSpecificID, cet->offset);
    switch(cet->platformID) {
      case 0: printf("Unicode"); break;
      case 1: printf("Mac"); break;
      case 2: printf("Not Supported"); break;
      case 3: printf("Microsoft"); break;
    }
    printf("\n");
  }
}

to begin our reading we first need to find the cmap offset in the table_directory:

int main(int argc, char** argv) {
  int file_size = 0;
  char* file = read_file("envy.ttf", &file_size);
  char* mem_ptr = file;

  font_directory ft = {0};
  read_font_directory(&mem_ptr, &ft);
  for(int i = 0; i < ft.off_sub.numTables; ++i){
    if(ft.tbl_dir[i].tag == READ_BE32("cmap")) { 
      cmap c = {0};
      char* tbl_ptr = file + ft.tbl_dir[i].offset;
      read_cmap(tbl_ptr, &c);
      print_cmap(&c);
    }
  }
  return 0;
}

After we find the offset from the table_directory we add it to the start of the file and that is wher our cmap begins and we just pass this pointer to read_cmap

here is our result after we read and print the cmap:

We are going to focus on what format the Unicode Platform has

so lets check it:

as you can see the format value is format 4

`cmap` format 4

format4 is one of the most common formats,

as we know so far cmaps are used to map Codepoints to Glyph indices this mapping of Codepoint to Glyph index is different depending on the format used

format 4 is used when you support the Unicode BMP but not all the Codepoints in the entire plane range (0x0000 to 0xFFFF) thus there are holes in range. this is also indicated by the platformSpecificID (id number 3) for the Unicode Platform.

Note:

The unicode standard is divided into 17 planes which are ranges of Codepoints for letters, symbols, etc of all the different languages, each plane has a size of 16 bits which means each plane contains 2^16 (65,536) Codepoints, the first plane (Plane 0) is the plane that contains the Codepoints of the languages and a lot of symbols most commonly used and its range is 0x0000 -> 0xFFFF

format 4's structure is very straight forward:

typedef struct {
  u16  format;
   u16  length;
   u16  language;
   u16  segCountX2;
   u16  searchRange;
   u16  entrySelector;
   u16  rangeShift;
  u16  reservedPad;
  u16  *endCode;
  u16  *startCode;
  u16  *idDelta;
  u16  *idRangeOffset;
  u16  *glyphIdArray;
 } format4;

the first 3 fields are the ones we mentioned before that are common across all the formats the searchRange, entrySelector, rangeShift are again used for a binary search (which we will also ignore and use a linear search) the reservedPad field must be zero per the specs.

the startCode, endCode, idDelta, idRangeOffset are 4 parallel arrays that are to be used together. they all have size of segCountX2 divided by 2. the glyphIdArray extends until the end of the table, we will calculate the size from the length field and how many bytes of that length remains after we read the 4 arrays

now then lets read our format 4

void read_format4(char* mem, format4** format) {
  char* m = mem;
  
  u16 length = READ_BE16(m + 2);
  format4* f = NULL;

  f = (format4*) calloc(1, length + sizeof(u16*)*5);
  f->format = READ_BE16_MOVE(m);
  f->length = READ_BE16_MOVE(m);
  f->language = READ_BE16_MOVE(m);
  f->segCountX2 = READ_BE16_MOVE(m);
  f->searchRange = READ_BE16_MOVE(m);
  f->entrySelector = READ_BE16_MOVE(m);
  f->rangeShift = READ_BE16_MOVE(m);

  f->endCode = (u16*) ((u8*)f  + sizeof(format4));
  f->startCode = f->endCode + f->segCountX2/2;
  f->idDelta = f->startCode + f->segCountX2/2;
  f->idRangeOffset = f->idDelta + f->segCountX2/2;
  f->glyphIdArray = f->idRangeOffset + f->segCountX2/2;

  char* start_code_start = m + f->segCountX2 + 2;
  char* id_delta_start = m + f->segCountX2*2 + 2;
  char* id_range_start = m + f->segCountX2*3 + 2;

  for(int i = 0; i < f->segCountX2/2; ++i) {
    f->endCode[i] = READ_BE16(m + i*2);
    f->startCode[i] = READ_BE16(start_code_start + i*2);
    f->idDelta[i] = READ_BE16(id_delta_start + i*2);
    f->idRangeOffset[i] = READ_BE16(id_range_start + i*2);
  }

  P_MOVE(m, f->segCountX2*4 + 2);  

  int remaining_bytes = f->length - (m - mem);
  f->glyph_id_count = remaining_bytes/2;
  for(int i = 0; i < remaining_bytes/2; ++i) {
    f->glyphIdArray[i] = READ_BE16_MOVE(m);
  }

  *format = f;
}

The code here is straight forward reading, we get the length first then allocate enough memory for the entire table taking into account memory needed for the pointers of the 5 arrays our format4 struct has, we need to do this because the data of the font doesn't take into account those pointers

We will read the 5 arrays into the end of the memory we allocated, we put the endCode array right after where the format4 struct is in the memory ends, then comes startCode and then the other three arrays after that.

here is a picture to futher clearify how the memory allocated is used

my drawing skills leave much to be desired

lets write a function to print the format4 too.

void print_format4(format4 *f4) {
  printf("Format: %d, Length: %d, Language: %d, Segment Count: %d\n", f4->format, f4->length, f4->language, f4->segCountX2/2);
  printf("Search Params: (searchRange: %d, entrySelector: %d, rangeShift: %d)\n",
      f4->searchRange, f4->entrySelector, f4->rangeShift);
  printf("Segment Ranges:\tstartCode\tendCode\tidDelta\tidRangeOffset\n");
  for(int i = 0; i < f4->segCountX2/2; ++i) {
    printf("--------------:\t% 9d\t% 7d\t% 7d\t% 12d\n", f4->startCode[i], f4->endCode[i], f4->idDelta[i], f4->idRangeOffset[i]);
  }
}

the picture doesn't show the entire printed output as it was too big

Now that we have read the cmap table we will begin our mapping by using the startCode, endCode, idRangeOffset, idDelta arrays which at the end will give us a pointer into the glyphIdArray which gives us the final Glyph Index.

Fantastic `Glyph`s and where to find them. (we are looking for glyph indices but that doesn't sound as nice)

Here is how we are going to do mapping of a Codepoint:

find the index of the first endCode which is bigger than or equal to the Codepoint. all the arrays are sorted by increasing endCode values.
check the corresponding startCode, if its smaller than or equal to the Codepoint then we move to step 3 else we return 0 which means glyph not found.
check the corrosponding idRangeOffset if its not 0 then go to step 4 otherwise go to step 7
take the value from step 3 and add it to the address of the value. (idRangeOffset[i] + (&idRangeOffset[i])
take the difference between the Codepoint and the startCode (code_point - startCode[i]) and add it to step 4.
the value from the dereferncing the pointer in step 5 is checked, if its 0 then the glyph index is 0 and the glyph is not available. if its not zero we add it to the corrosponding idDelta and get the glyph index. skip step 7.
if idRangeOffset is 0 then add the Codepoint to the corrosponding idDelta to get the glyph index.

seems a bit confusing but once we write the code for this its a bit more clear

int get_glyph_index(u16 code_point, format4 *f) {
  int index = -1;
  u16 *ptr = NULL;
  for(int i = 0; i < f->segCountX2/2; i++) {
    if(f->endCode[i] > code_point) {index = i; break;};
  }
  
  if(index == -1) return 0;

  if(f->startCode[index] < code_point) {
    if(f->idRangeOffset[index] != 0) {
      ptr = f->idRangeOffset + index + f->idRangeOffset[index]/2;
      ptr += code_point - f->startCode[index];
      if(*ptr == 0) return 0;
      return *ptr + f->idDelta[index];
    } else {
      return code_point + f->idDelta[index];
    }
  }

  return 0;
}

all right that is all there is to it for getting Glyph Indexs. you can go ahead and try a couple of different indexs, using the font Envy Code R

here is the Glyph Indexs of the uppercase english letters:

break

if you have come this far, take a small break its healthy.

'loca' and the boys

now that we have the Glyph index mapping working, we move onto getting the Glyph data which is stored in the glyf table, the offset of the Glyph in the glyf is stored in the loca table which is a very straight forward simple array of offsets indexed by the Glyph Index but in order to read the loca table we need to know which version of the loca table we are dealing with, there are two version a 16 bit one and a 32 bit one which are named short and long versions respectivly in the short version the offset divided by 2 is actually stored so we need to multiplie the offsets by 2, in the long version this isn't the case, in order to tell which version of the table we have we must read a field in the head table called indexToLocFormat.

the simpler version is:

read indexToLocFormat field in the head table to determine which version of loca we are dealing with.
depending on the version we either have a u16 bit or a u32 bit array of offsets.
after we read the offset from the loca table we will use this offset from the start of the glyf table to get the Glyph data.

the head table structure is as follows:

typedef struct {
  Fixed version;
   Fixed fontRevision;
  
   u32 checkSumAdjustment;
   u32 magicNumber;
  
   u16 flags;
   u16 unitsPerEm;
  
   i64 created;
   i64 modified;
  
   FWord xMin;
   FWord yMin;
   FWord xMax;
   FWord yMax;
  
   u16 macStyle;
   u16 lowestRecPPEM;
   i16 fontDirectionHint;
   i16 indexToLocFormat;
   i16 glyphDataFormat;
}

a whole lot of data and fields, first up the different data types we are seeing:

Fixed: this is a fixed-point 32 bit integer, 16 bits for the whole part (the upper 16 bits) and 16 bit is used for the fractional part (lower 16 bits)[1].
i64: this is the 64 bit sized type we mentioned in the beginning, its a datetime timestamp.
FWord: 16 bit signed integer that describes quantities in FUnits[2].

Now what the fields mean:

version and fontRevision are both self explanatory.
checkSumAdjustment is a checksum of the head tble
the magicNumber is set to 0x5F0F3CF5
we are going to ignore the flags field as it will introduce too much complexity and makes the article a bit longer than it already is.
unitsPerEm is how many FUnits are in 1 em usually power of 2 and most common is (2048)
created and modified are timestamps.
xMin, yMin, xMax, yMax defines a rectangle in FUnits which is the max bounding box for all the glyphs
macStyle, only the first lower 6 bits are used and in order if set they mean font has this style (Bold, Italic, Underline, Outline, Shadow, Condensed, Extended)
lowestRecPPEM is the smallest readable size in pixels
fontDirectionHint hints at what the font direction is suppose to be, its deprecated usually set to (2)
indexToLocFormat 0 is for short offsets (16 bits), 1 is for long offsets (32 bits).
glyphDataFormat set to 0

If you notice that I haven't given the struct a name, that is because reading this into a struct is very easy and straight to the point but because the only thing we need from this table for now is the indexToLocFormat we are going to just read that one field, its 50 bytes from the start of the struct.

for our loca, head and glyf tables we aren't going to read them into structs rather we will directly read the data we need so we will save the pointer to these tables in our font_directory.

while we are at it we know that this font Envy Code R has has a format4 cmap table so we will add the format4 and cmap structs to the font_directory thus making the font_directory into this:

typedef struct  {
  offset_subtable off_sub;
  table_directory* tbl_dir;
  format4* f4;
  cmap* cmap;
  char* glyf;
  char* loca;
  char* head;
} font_directoy;

and we will set these pointers and read the structs in the read_table_directory function thus making our read_table_directory and read_font_directory function into these:

void read_font_directory(char* file_start, char** mem, font_directory* ft) {
  read_offset_subtable(mem, &ft->off_sub); 
  read_table_directory(file_start, mem, ft);
}

void read_table_directory(char* file_start, char** mem, font_directory* ft) {
  char* m = *mem;
  ft->tbl_dir = (table_directory*)calloc(1, sizeof(table_directory)*ft->off_sub.numTables);

  for(int i = 0; i < ft->off_sub.numTables; ++i) {
    table_directory* t = ft->tbl_dir + i;
    t->tag = READ_BE32_MOVE(m);
    t->checkSum = READ_BE32_MOVE(m);
    t->offset = READ_BE32_MOVE(m);
    t->length = READ_BE32_MOVE(m);

    switch(t->tag) {
      case GLYF_TAG: ft->glyf = t->offset + file_start; break;
      case LOCA_TAG: ft->loca = t->offset + file_start; break;
      case HEAD_TAG: ft->head = t->offset + file_start; break;
      case CMAP_TAG: {
        ft->cmap = (cmap*) calloc(1, sizeof(cmap));
        read_cmap(file_start + t->offset, ft->cmap);
        read_format4(file_start + t->offset + ft->cmap->subtables[0].offset, &ft->f4);
      } break;
    }
  }

  *mem = m;
}

in order to make our switch case work, we will have these macros

#define FONT_TAG(a, b, c, d) (a<<24|b<<16|c<<8|d)
#define GLYF_TAG FONT_TAG('g', 'l', 'y', 'f')
#define LOCA_TAG FONT_TAG('l', 'o', 'c', 'a')
#define HEAD_TAG FONT_TAG('h', 'e', 'a', 'd')
#define CMAP_TAG FONT_TAG('c', 'm', 'a', 'p')

FONT_TAG basically recalculates the 32 bit unsigned integer from the characters at compile time making it a constants and useable with a switch.

given we are reading the cmap and format4 the moment we encounter them, it makes our main function a lot less cluttered and simple:

int main(int argc, char** argv) {
  int file_size = 0;
  char* file = read_file("envy.ttf", &file_size);
  char* mem_ptr = file;

  font_directory ft = {0};
  read_font_directory(file, &mem_ptr, &ft);
  return 0;
}

all of these changes make reading indexLocToFormat super simple:

int read_loca_type(font_directory* ft) {
  return READ_BE16(ft->head + 50);
}

so lets read it.

the value is 0 meaning we have the short 16 bit sized version of the loca table

now to read the offset from loca is also very simple we index the loca using the Glyph Index we get from get_glyph_index so we have this snippet.

u32 get_glyph_offset(font_directory *ft, u32 glyph_index) {
  u32 offset = 0;
  if(read_loca_type(ft)) {
    //32 bit
    offset = READ_BE32((u32*)ft->loca + glyph_index);
  } else {
    offset =  READ_BE16((u16*)ft->loca + glyph_index)*2;
  }
  return offset;
}

given that the output for this function isn't very exciting but just to check your code for the Glyph Index 622 which is the letter A the offset is 69700;

we will take a moment here to rewrite our get_glyph_index function to take a Codepoint and font_directory instead of taking the format4 directly.

int get_glyph_index(font_directory* ft, u16 code_point) {
  format4 *f = ft->f4;
  int index = -1;
  u16 *ptr = NULL;
  for(int i = 0; i < f->segCountX2/2; i++) {
    if(f->endCode[i] > code_point) {index = i; break;};
  }

  if(index == -1) return 0;

  if(f->startCode[index] < code_point) {
    if(f->idRangeOffset[index] != 0) {
      ptr = f->idRangeOffset + index + f->idRangeOffset[index]/2;
      ptr += code_point - f->startCode[index];
      if(*ptr == 0) return 0;
      return *ptr + f->idDelta[index];
    } else {
      return code_point + f->idDelta[index];
    }
  }

  return 0;
}

we are done with the loca table now, we move onto the glyf table which is our last section

Where the wild `glyf`s are.

This is the last section of our Part 1 and its extracting the Glyph data more commonly called Glyph Outline and as always we start with the structure,

typedef struct {
  u16 numberOfContours;
  i16 xMin;
  i16 yMin;
  i16 xMax;
  i16 yMax;
};

this isn't a complete struct, the data that follows depends on the numberOfContours field, if numberOfContours is bigger than zero it means its a simple Glyph if its less than zero it means its a componded Glyph meaning its made up of one or more Glyphs with a certain amount of transformations applied, we will have a gander at compound Glyphs in part 1.5 (which might be after Part 2).

so for simple Glyphs we have the following structure:

typedef struct {
  u16 numberOfContours;
  i16 xMin;
  i16 yMin;
  i16 xMax;
  i16 yMax;
  u16 instructionLength;
  u8* instructions;
  u8* flags;
  i16* xCoordinates;
  i16* yCoordinates;
  u16* endPtsOfContours;
} glyph_outline;

if you go ahead and check the reference we are using for this, you will see that the xCoordinates and yCoordinates could either be a u8 or i16 depending on the flags for making things easier we are just going to use the i16 because it will accommodate both sizes.

the numberOfContours tells us how many contours this shape has, it also gives us the size of endPtsOfContours, the endPtsOfContours array gives us the indicies of the end points of the contours i.e: endPtsOfContours[0] gives us the index into xCoordinates and yCoordinates array where the first contour ends and endPtsOfContours[1] gives us the index where the 2nd contour ends. this means the last value in endPtsContour array will give us the number of points all the contours combined have thus we can allocate enough memory for all for all of them.

the intructionLength and instructions together gives us the instructions that are needed to do grid fitting (hinting) for this Glyph we won't be doing this as it requires us to code a complete virtual machine to run the instructions but its a fun project to do so maybe for another time.

the flags array gives us information about each of the points that the Glyph has, each flag is 8 bits and each bit has the following meanings:

Bit 1: if set it means this point is on the Glyphs curve, otherwise the point is off curve.
Bit 2: if set the corrosponding x coordiante is 1 byte otherwise its 2 bytes.
Bit 3: if set the corrosponding y coordinate is 1 byte otherwise its 2 bytes.
Bit 4: (repeat) if set the next byte specifies how many timet his flag repeats. this is a small way to compress the flags array.
Bit 5, 6: these both relate to bit 1 and 2 respectivily and are better explained in a table.
Bit 7, 8: those are reserved and set to zero in TrueType.

the bits (2, 5) and (3, 6) both have the same functionality but one is for the x coordinate the other is for y. 2 bits give us 4 possibilities so here is the table:

Bit 2 (3)	Bit 5 (6)	Meaning
0	0	(0) The current coordinate is 16 bit signed delta change.
0	1	(1) The current coordinate is 16 bit, has the same value as the previous one.
1	0	(2) the current coordinate is 8 bit, value is negative.
1	1	(3) the current coordinate is 8 bit, value is positive.

we are going to use a union that has bitfields to help us make life easier when dealing with these.

typedef union {
  typedef struct {
    u8 on_curve: 1;
    
    u8 x_short: 1;
    u8 y_short: 1;
    
    u8 repeat: 1;
    
    u8 x_short_pos: 1;
    u8 y_short_pos: 1;
    
    u8 reserved1: 1;
    u8 reserved2: 1;
  };
  u8 flag;
} glyph_flag;

we will replace the type of the flags field in the glyph_outline struct to be of glyph_flag this will be easier for some coding.

this is all the information we need to read the Glyph outline correctly, we first read and uncompress the flags then we read the coordinates.

first the entire function:

glyph_outline get_glyph_outline(font_directory* ft, u32 glyph_index) {
  u32 offset = get_glyph_offset(ft, glyph_index);
  char* glyph_start = ft->glyf + offset;
  glyph_outline outline = {0};
  outline.numberOfContours = READ_BE16_MOVE(glyph_start);
  outline.xMin = READ_BE16_MOVE(glyph_start);
  outline.yMin = READ_BE16_MOVE(glyph_start);
  outline.xMax = READ_BE16_MOVE(glyph_start);
  outline.yMax = READ_BE16_MOVE(glyph_start);

  outline.endPtsOfContours = (u16*) calloc(1, outline.numberOfContours*sizeof(u16));
  for(int i = 0; i < outline.numberOfContours; ++i) {
    outline.endPtsOfContours[i] = READ_BE16_MOVE(glyph_start);
  }

  outline.instructionLength = READ_BE16_MOVE(glyph_start);
  outline.instructions = (u8*)calloc(1, outline.instructionLength);
  memcpy(outline.instructions, glyph_start, outline.instructionLength);
  P_MOVE(glyph_start, outline.instructionLength);

  int last_index = outline.endPtsOfContours[outline.numberOfContours-1];
  outline.flags = (glyph_flag*) calloc(1, last_index + 1);

  for(int i = 0; i < (last_index + 1); ++i) {
    outline.flags[i].flag = *glyph_start;
    glyph_start++;
    if(outline.flags[i].repeat) {
      u8 repeat_count = *glyph_start;
      while(repeat_count-- > 0) {
        i++;
        outline.flags[i] = outline.flags[i-1];
      }
      glyph_start++;
    }
  }

  outline.xCoordinates = (i16*) calloc(1, (last_index+1)*2);
  i16 prev_coordinate = 0;
  i16 current_coordinate = 0;
  for(int i = 0; i < (last_index+1); ++i) {
    int flag_combined = outline.flags[i].x_short << 1 | outline.flags[i].x_short_pos;
    switch(flag_combined) {
      case 0: {
        current_coordinate = READ_BE16_MOVE(glyph_start);
      } break;
      case 1: { current_coordinate = 0; }break;
      case 2: { current_coordinate = (*(u8*)glyph_start++)*-1; }break;
      case 3: { current_coordinate = (*(u8*)glyph_start++); } break;
    }

    outline.xCoordinates[i] = current_coordinate + prev_coordinate;
    prev_coordinate = outline.xCoordinates[i];
  }

  outline.yCoordinates = (i16*) calloc(1, (last_index+1)*2);
  current_coordinate = 0;
  prev_coordinate = 0;
  for(int i = 0; i < (last_index+1); ++i) {
    int flag_combined = outline.flags[i].y_short << 1 | outline.flags[i].y_short_pos;
    switch(flag_combined) {
      case 0: {
        current_coordinate = READ_BE16_MOVE(glyph_start);
      } break;
      case 1: { current_coordinate = 0; }break;
      case 2: { current_coordinate = (*(u8*)glyph_start++)*-1; }break;
      case 3: { current_coordinate = (*(u8*)glyph_start++); } break;
    }

    outline.yCoordinates[i] = current_coordinate + prev_coordinate;
    prev_coordinate = outline.yCoordinates[i];
  }

  return outline;
}

one of our bigger functions so lets break it down:

u32 offset = get_glyph_offset(ft, glyph_index);
char* glyph_start = ft->glyf + offset;
glyph_outline outline = {0};

outline.numberOfContours = READ_BE16_MOVE(glyph_start);
outline.xMin = READ_BE16_MOVE(glyph_start);
outline.yMin = READ_BE16_MOVE(glyph_start);
outline.xMax = READ_BE16_MOVE(glyph_start);
outline.yMax = READ_BE16_MOVE(glyph_start);

outline.endPtsOfContours = (u16*) calloc(1, outline.numberOfContours*sizeof(u16));
for(int i = 0; i < outline.numberOfContours; ++i) {
  outline.endPtsOfContours[i] = READ_BE16_MOVE(glyph_start);
}

outline.instructionLength = READ_BE16_MOVE(glyph_start);
outline.instructions = (u8*)calloc(1, outline.instructionLength);
memcpy(outline.instructions, glyph_start, outline.instructionLength);
P_MOVE(glyph_start, outline.instructionLength);

this part is pretty straight forward, we get the offset of the Glyph and position glyph_start to be at the start of that Glyphs data. we start a new glyph_outline struct and read the easier fields.

the instructions are just an arary of bytes so doing a memcpy is a lot faster than looping over memory.

now the part the reads the flags array

int last_index = outline.endPtsOfContours[outline.numberOfContours-1];
outline.flags = (glyph_flag*) calloc(1, last_index + 1);

for(int i = 0; i < (last_index + 1); ++i) {
  outline.flags[i].flag = *glyph_start;
  glyph_start++;
  if(outline.flags[i].repeat) {
    u8 repeat_count = *glyph_start;
    while(repeat_count-- > 0) {
      i++;
      outline.flags[i] = outline.flags[i-1];
    }
    glyph_start++;
  }
}

we already know how big the flags array will be based on endPtsOfContours which gives us the last index of the last contour's point meaning the size of array is last_index + 1.

we read the flags one by one and everytime we encounter a repeat we read the next byte which gives us the amount we have to repeat.

now that we have the flags we can determine the x and y coordinates:

outline.xCoordinates = (i16*) calloc(1, (last_index+1)*2);
i16 prev_coordinate = 0;
i16 current_coordinate = 0;
for(int i = 0; i < (last_index+1); ++i) {
  int flag_combined = outline.flags[i].x_short << 1 | outline.flags[i].x_short_pos;
  switch(flag_combined) {
    case 0: {
      current_coordinate = READ_BE16_MOVE(glyph_start);
    } break;
    case 1: { current_coordinate = 0; }break;
    case 2: { current_coordinate = (*glyph_start++)*-1; }break;
    case 3: { current_coordinate = (*glyph_start++); } break;
  }

  outline.xCoordinates[i] = current_coordinate + prev_coordinate;
  prev_coordinate = outline.xCoordinates[i];
}

this code is the same for both xCoordinates and yCoordinates, every coordinate point is relative to the previous one except the first which relative to 0 which means at each iteration the value for the current coordinate is the previous value summed with the data we just read. under case 1 its indicated that the value of the current coordinate is the same as previous thus we don't have any data to read. we repeat the same loop for the yCoordiantes and that is it.

now to print the outline.

void print_glyph_outline(glyph_outline *outline) {
  printf("#contours\t(xMin,yMin)\t(xMax,yMax)\tinst_length\n");
  printf("%9d\t(%d,%d)\t\t(%d,%d)\t%d\n", outline->numberOfContours,
      outline->xMin, outline->yMin,
      outline->xMax, outline->yMax,
      outline->instructionLength);

  printf("#)\t(  x  ,  y  )\n");
  int last_index = outline->endPtsOfContours[outline->numberOfContours-1];
  for(int i = 0; i <= last_index; ++i) {
    printf("%d)\t(%5d,%5d)\n", i, outline->xCoordinates[i], outline->yCoordinates[i]);
  }
}

and that is all she wrote. (for now)

Is it Over?

There is still Part 2 coming which will be how to rasierize this outline, it will include initializing OpengGL Core profile on Windows nativily without any libraries, loading the OpenGL functions we need, Scanline Raserization of our outline, some basic AA.

as a wrap-up here I'm going to include the things we haven't covered.

we didn't touch the head, hhea and hmtx and the other tables that relate to positioning of Glyphs I will try to include those in a seperate post so that this doesn't really become too big of a post.
We skipped reading any platformID, platformSpecificID pair other than (0, 3)
We skipped reading any cmap table format other than format 4
We skipped reading Comounded Glyph data and just focused on simple Glyphs
We skipped doing anything we the instructions we read for a Glyph

[1] in order to convert a Fixed number into our normal float types, we first need to figure what our fraction is which is the lower 16 bits divided by (2^16) we then simply add this to our whole part and we get the number in float. In OpenType this version is actually divided back into 2 16 bit fields, one called Major version and the other Minor version. [2] every font is designed on a 2D grid whose coordinates are in FUnits the size of this grid is 1 em, and how many FUnits are in this 1 em is defined by the unitsPerEm field of the head table, the higher this count the more precision font designers have.

Here is all the code combined:

#include <stdio.h>

#define READ_BE16(mem) ((((u8*)(mem))[0] << 8) | (((u8*)(mem))[1]))
#define READ_BE32(mem) ((((u8*)(mem))[0] << 24) | (((u8*)(mem))[1] << 16) | (((u8*)(mem))[2] << 8) | (((u8*)(mem))[3]))
#define P_MOVE(mem, a) ((mem) += (a))

#define READ_BE16_MOVE(mem) (READ_BE16((mem))); (P_MOVE((mem), 2))
#define READ_BE32_MOVE(mem) (READ_BE32((mem))); (P_MOVE((mem), 4))

#define FONT_TAG(a, b, c, d) (a<<24|b<<16|c<<8|d)
#define GLYF_TAG FONT_TAG('g', 'l', 'y', 'f')
#define LOCA_TAG FONT_TAG('l', 'o', 'c', 'a')
#define HEAD_TAG FONT_TAG('h', 'e', 'a', 'd')
#define CMAP_TAG FONT_TAG('c', 'm', 'a', 'p')


char* read_file(char *file_name, int* file_size) { 
  if(strlen(file_name) > 0) {
    FILE* file = fopen(file_name, "rb");
    if(file) {
      fseek(file, 0, SEEK_END);
      int size = ftell(file);
      fseek(file, 0, SEEK_SET);

      if(file_size) { *file_size = size; }
      char *file_content = (char*)malloc(size+1);
      int read_amount = fread(file_content, size, 1, file);
      file_content[size] = '\0';
      if(read_amount) {
        fclose(file);
        return file_content;
      }
      free(file_content);
      fclose(file);
      return NULL;
    }
  }
  return NULL;
}

typedef unsigned char u8;
typedef char i8;

typedef unsigned short u16;
typedef short i16;

typedef unsigned int u32;
typedef int i32;


typedef struct {
  u32  scaler_type;
  u16  numTables;
  u16  searchRange;
  u16  entrySelector;
  u16  rangeShift;
} offset_subtable;


typedef struct {
  u16 platformID;
  u16 platformSpecificID;
  u32 offset;
} cmap_encoding_subtable;


typedef struct {
  u16 version;
  u16 numberSubtables;
  cmap_encoding_subtable* subtables;
} cmap;

typedef struct {
  u16  format;
   u16  length;
   u16  language;
   u16  segCountX2;
   u16  searchRange;
   u16  entrySelector;
   u16  rangeShift;
  u16  reservedPad;
  u16  *endCode;
  u16  *startCode;
  u16  *idDelta;
  u16  *idRangeOffset;
  u16  *glyphIdArray;
} format4;

typedef struct {
  union { 
    char tag_c[4];
    u32  tag;
  };
  u32  checkSum;
  u32  offset;
  u32  length;
} table_directory;

typedef struct  {
  offset_subtable off_sub;
  table_directory* tbl_dir;
  format4* f4;
  cmap* cmap;
  char* glyf;
  char* loca;
  char* head;
} font_directory; 

typedef union {
  typedef struct {
    u8 on_curve: 1;
    
    u8 x_short: 1;
    u8 y_short: 1;
    
    u8 repeat: 1;
    
    u8 x_short_pos: 1;
    u8 y_short_pos: 1;
    
    u8 reserved1: 1;
    u8 reserved2: 1;
  };
  u8 flag;
} glyph_flag;


typedef struct {
  u16 numberOfContours;
  i16 xMin;
  i16 yMin;
  i16 xMax;
  i16 yMax;
  u16 instructionLength;
  u8* instructions;
  glyph_flag* flags;
  i16* xCoordinates;
  i16* yCoordinates;
  u16* endPtsOfContours;
} glyph_outline;



void read_offset_subtable(char** mem, offset_subtable* off_sub) {
  char* m = *mem;
  off_sub->scaler_type = READ_BE32_MOVE(m);
  off_sub->numTables = READ_BE16_MOVE(m);
  off_sub->searchRange = READ_BE16_MOVE(m);
  off_sub->entrySelector = READ_BE16_MOVE(m);
  off_sub->rangeShift = READ_BE16_MOVE(m);

  *mem = m;
}


void read_cmap(char* mem, cmap* c) {
  char *m = mem;
  c->version = READ_BE16_MOVE(m);
  c->numberSubtables = READ_BE16_MOVE(m);

  c->subtables = (cmap_encoding_subtable*) calloc(1, sizeof(cmap_encoding_subtable)*c->numberSubtables);
  for(int i = 0; i < c->numberSubtables; ++i) {
    cmap_encoding_subtable* est = c->subtables + i;
    est->platformID = READ_BE16_MOVE(m);
    est->platformSpecificID = READ_BE16_MOVE(m);
    est->offset = READ_BE32_MOVE(m);
  }

}

void print_cmap(cmap* c) {
  printf("#)\tpId\tpsID\toffset\ttype\n");
  for(int i = 0; i < c->numberSubtables; ++i) {
    cmap_encoding_subtable* cet = c->subtables + i;
    printf("%d)\t%d\t%d\t%d\t", i+1, cet->platformID, cet->platformSpecificID, cet->offset);
    switch(cet->platformID) {
      case 0: printf("Unicode"); break;
      case 1: printf("Mac"); break;
      case 2: printf("Not Supported"); break;
      case 3: printf("Microsoft"); break;
    }
    printf("\n");
  }
}

void read_format4(char* mem, format4** format) {
  char* m = mem;

  u16 length = READ_BE16(m + 2);

  format4* f = NULL;

  f = (format4*) calloc(1, length + sizeof(u16*)*5);
  f->format = READ_BE16_MOVE(m);
  f->length = READ_BE16_MOVE(m);
  f->language = READ_BE16_MOVE(m);
  f->segCountX2 = READ_BE16_MOVE(m);
  f->searchRange = READ_BE16_MOVE(m);
  f->entrySelector = READ_BE16_MOVE(m);
  f->rangeShift = READ_BE16_MOVE(m);

  f->endCode = (u16*) ((u8*)f  + sizeof(format4));
  f->startCode = f->endCode + f->segCountX2/2;
  f->idDelta = f->startCode + f->segCountX2/2;
  f->idRangeOffset = f->idDelta + f->segCountX2/2;
  f->glyphIdArray = f->idRangeOffset + f->segCountX2/2;

  char* start_code_start = m + f->segCountX2 + 2;
  char* id_delta_start = m + f->segCountX2*2 + 2;
  char* id_range_start = m + f->segCountX2*3 + 2;

  for(int i = 0; i < f->segCountX2/2; ++i) {
    f->endCode[i] = READ_BE16(m + i*2);
    f->startCode[i] = READ_BE16(start_code_start + i*2);
    f->idDelta[i] = READ_BE16(id_delta_start + i*2);
    f->idRangeOffset[i] = READ_BE16(id_range_start + i*2);
  }

  P_MOVE(m, f->segCountX2*4 + 2);  

  int remaining_bytes = f->length - (m - mem);
  for(int i = 0; i < remaining_bytes/2; ++i) {
    f->glyphIdArray[i] = READ_BE16_MOVE(m);
  }

  *format = f;
}

void print_format4(format4 *f4) {
  printf("Format: %d, Length: %d, Language: %d, Segment Count: %d\n", f4->format, f4->length, f4->language, f4->segCountX2/2);
  printf("Search Params: (searchRange: %d, entrySelector: %d, rangeShift: %d)\n",
      f4->searchRange, f4->entrySelector, f4->rangeShift);
  printf("Segment Ranges:\tstartCode\tendCode\tidDelta\tidRangeOffset\n");
  for(int i = 0; i < f4->segCountX2/2; ++i) {
    printf("--------------:\t% 9d\t% 7d\t% 7d\t% 12d\n", f4->startCode[i], f4->endCode[i], f4->idDelta[i], f4->idRangeOffset[i]);
  }
}


void read_table_directory(char* file_start, char** mem, font_directory* ft) {
  char* m = *mem;
  ft->tbl_dir = (table_directory*)calloc(1, sizeof(table_directory)*ft->off_sub.numTables);

  for(int i = 0; i < ft->off_sub.numTables; ++i) {
    table_directory* t = ft->tbl_dir + i;
    t->tag = READ_BE32_MOVE(m);
    t->checkSum = READ_BE32_MOVE(m);
    t->offset = READ_BE32_MOVE(m);
    t->length = READ_BE32_MOVE(m);

    switch(t->tag) {
      case GLYF_TAG: ft->glyf = t->offset + file_start; break;
      case LOCA_TAG: ft->loca = t->offset + file_start; break;
      case HEAD_TAG: ft->head = t->offset + file_start; break;
      case CMAP_TAG: {
        ft->cmap = (cmap*) calloc(1, sizeof(cmap));
        read_cmap(file_start + t->offset, ft->cmap);
        read_format4(file_start + t->offset + ft->cmap->subtables[0].offset, &ft->f4);
      } break;
    }
  }

  *mem = m;
}

void print_table_directory(table_directory* tbl_dir, int tbl_size) {
  printf("#)\ttag\tlen\toffset\n");
  for(int i = 0; i < tbl_size; ++i) {
    table_directory* t = tbl_dir + i;
    printf("%d)\t%c%c%c%c\t%d\t%d\n", i+1,
        t->tag_c[3], t->tag_c[2],
        t->tag_c[1], t->tag_c[0],
        t->length, t->offset);
  }
}

void read_font_directory(char* file_start, char** mem, font_directory* ft) {
  read_offset_subtable(mem, &ft->off_sub); 
  read_table_directory(file_start, mem, ft);
}


int get_glyph_index(font_directory* ft, u16 code_point) {
  format4 *f = ft->f4;
  int index = -1;
  u16 *ptr = NULL;
  for(int i = 0; i < f->segCountX2/2; i++) {
    if(f->endCode[i] > code_point) {index = i; break;};
  }
  
  if(index == -1) return 0;

  if(f->startCode[index] < code_point) {
    if(f->idRangeOffset[index] != 0) {
      ptr = f->idRangeOffset + index + f->idRangeOffset[index]/2;
      ptr += code_point - f->startCode[index];
      if(*ptr == 0) return 0;
      return *ptr + f->idDelta[index];
    } else {
      return code_point + f->idDelta[index];
    }
  }

  return 0;
}

int read_loca_type(font_directory* ft) {
  return READ_BE16(ft->head + 50);
}

u32 get_glyph_offset(font_directory *ft, u32 glyph_index) {
  u32 offset = 0;
  if(read_loca_type(ft)) {
    //32 bit
    offset = READ_BE32((u32*)ft->loca + glyph_index);
  } else {
    offset =  READ_BE16((u16*)ft->loca + glyph_index)*2;
  }
  return offset;
}


glyph_outline get_glyph_outline(font_directory* ft, u32 glyph_index) {
  u32 offset = get_glyph_offset(ft, glyph_index);
  unsigned char* glyph_start = ft->glyf + offset;
  glyph_outline outline = {0};
  outline.numberOfContours = READ_BE16_MOVE(glyph_start);
  outline.xMin = READ_BE16_MOVE(glyph_start);
  outline.yMin = READ_BE16_MOVE(glyph_start);
  outline.xMax = READ_BE16_MOVE(glyph_start);
  outline.yMax = READ_BE16_MOVE(glyph_start);

  outline.endPtsOfContours = (u16*) calloc(1, outline.numberOfContours*sizeof(u16));
  for(int i = 0; i < outline.numberOfContours; ++i) {
    outline.endPtsOfContours[i] = READ_BE16_MOVE(glyph_start);
  }

  outline.instructionLength = READ_BE16_MOVE(glyph_start);
  outline.instructions = (u8*)calloc(1, outline.instructionLength);
  memcpy(outline.instructions, glyph_start, outline.instructionLength);
  P_MOVE(glyph_start, outline.instructionLength);

  int last_index = outline.endPtsOfContours[outline.numberOfContours-1];
  outline.flags = (glyph_flag*) calloc(1, last_index + 1);

  for(int i = 0; i < (last_index + 1); ++i) {
    outline.flags[i].flag = *glyph_start;
    glyph_start++;
    if(outline.flags[i].repeat) {
      u8 repeat_count = *glyph_start;
      while(repeat_count-- > 0) {
        i++;
        outline.flags[i] = outline.flags[i-1];
      }
      glyph_start++;
    }
  }


  outline.xCoordinates = (i16*) calloc(1, (last_index+1)*2);
  i16 prev_coordinate = 0;
  i16 current_coordinate = 0;
  for(int i = 0; i < (last_index+1); ++i) {
    int flag_combined = outline.flags[i].x_short << 1 | outline.flags[i].x_short_pos;
    switch(flag_combined) {
      case 0: {
        current_coordinate = READ_BE16_MOVE(glyph_start);
      } break;
      case 1: { current_coordinate = 0; }break;
      case 2: { current_coordinate = (*glyph_start++)*-1; }break;
      case 3: { current_coordinate = (*glyph_start++); } break;
    }

    outline.xCoordinates[i] = current_coordinate + prev_coordinate;
    prev_coordinate = outline.xCoordinates[i];
  }

  outline.yCoordinates = (i16*) calloc(1, (last_index+1)*2);
  current_coordinate = 0;
  prev_coordinate = 0;
  for(int i = 0; i < (last_index+1); ++i) {
    int flag_combined = outline.flags[i].y_short << 1 | outline.flags[i].y_short_pos;
    switch(flag_combined) {
      case 0: {
        current_coordinate = READ_BE16_MOVE(glyph_start);
      } break;
      case 1: { current_coordinate = 0; }break;
      case 2: { current_coordinate = (*glyph_start++)*-1; }break;
      case 3: { current_coordinate = (*glyph_start++); } break;
    }

    outline.yCoordinates[i] = current_coordinate + prev_coordinate;
    prev_coordinate = outline.yCoordinates[i];
  }

  return outline;
}

void print_glyph_outline(glyph_outline *outline) {
  printf("#contours\t(xMin,yMin)\t(xMax,yMax)\tinst_length\n");
  printf("%9d\t(%d,%d)\t\t(%d,%d)\t%d\n", outline->numberOfContours,
      outline->xMin, outline->yMin,
      outline->xMax, outline->yMax,
      outline->instructionLength);

  printf("#)\t(  x  ,  y  )\n");
  int last_index = outline->endPtsOfContours[outline->numberOfContours-1];
  for(int i = 0; i <= last_index; ++i) {
    printf("%d)\t(%5d,%5d)\n", i, outline->xCoordinates[i], outline->yCoordinates[i]);
  }
}

int main(int argc, char** argv) {
  int file_size = 0;
  char* file = read_file("envy.ttf", &file_size);
  char* mem_ptr = file;

  font_directory ft = {0};
  read_font_directory(file, &mem_ptr, &ft);

  glyph_outline A = get_glyph_outline(&ft, get_glyph_index(&ft, 'A'));
  print_glyph_outline(&A);
  return 0;
}