I had been wanting to write this for over a year, yes you read that right a year, I will explain why at the end as its an optional read.
Introduction
I will write this in two parts first reading the TTF and then rasterization, a nice break point between the two processes. Lets start
Part 1: TTFs and what have you.
Notes:
- the main reference im using is (https://developer.apple.com/fonts/Type-Reference-Manual/RM06/Chap6.html), i have done my best to write it in way that made sense to me and how i understood it, if you find any mistakens please do let me know.
- all the code snippets im using in different sections are going to be ollected in one final single file that can be compiled and ran. ofc this epends on the platform you are using I will use the std libraries in C such s open and such so it becomes less painful to run on differnt platforms.
- I will use the font Envy Code R (https://damieng.com/blog/2008/05/26/envy-code-r-preview-7-coding-font-released) for this post.
- Most of the code here is for education and will contain oversight when it comes to memory management, type sizes, error checking if a font is well formed or not and other C related code. I assume C compiler on a little endian platform where the size of
int
is 32 bits,char
is 8 bits,short
is 16, I know there are standard libraries that contain those specific type sizes but im not going to use them.
Terminology
I'm not a font designer nor even remotely someone who knows too much about them but there are some terminology we should know before going forward as they might be confusing to some here are the terms which are important:
typeface
: this is basically the design of the font and what it actually looks like for exampleArial
is a typefaceArial Regular
is a variant of theArial
typeface which you call a font,Arial Bold
is another font.Font
: a variant of a typeface such as a bold or italic variant of a specific typeface. a Font is made up of collection of glyphs.Character
: this is a specific symbol that represents a letter, number or some other sign (such as dollar sign $) in written language.Glyph
: a shape/symbol that represents a character in writing having a particular form. for example the glyph for the letterA
in theEnvy Code R
typeface looks like thisCodepoint
: an integer that represents aCharacter
index in an encoding scheme, the two most common encoding schemes are (ASCII and Unicode) ASCII uses one byte for theCodepoint
hence it only has 255Codepoints
, unicode uses multiple bytes for itsCodepoints
thus it can encode a much larger range.Glyph Index
: this is an interger that gives you the location of aGlyph
inside a font, this changes depending on the font unlikeCodepoints
which are specific to the encoding scheme (Unicode or ASCII) rather than the font.
Our main point of knowing these terms is that we distinguish a Glyph
from a Character
in that a Glyph
is a representation of a Character
in a particular Typeface
as in the uppercase A
character doesn't change as there is one uppercase A
letter in the english
language while a Glyph
for the uppercase A
are plenty and depends on the Typeface
Intro
Truetype Fonts are files that contain information regarding a particular typeface, they contain each of the typeface glyph's outlines, these outlines are bezier curves. The file also includes specific information about grid-fitting a glyph also known as (hinting), its positioning and a host of other information about the typeface. Our main business will be reading the outline data and rasterize, this means this post won't include hinting and other font stuff like shaping and substitutions that is needed for non-english languages. I will do a seperate post on substitution and simple shaping at a later date.
TTFs files use a format that is called SFNT (Spline Font or Scalable Font) which is a file format that houses other fonts such as PostScript fonts, OpenType fonts, TrueType fonts, they have the general overall structure but differnt in certain parts. SFNT was supersceded by OpenType Collections which is an extended version of SFNT.
TTF files do not contain any complex compression thus making most of our work just byte reading, all the data is in big endian.
The adventure begins
before we even begin, lets write some small snippets of code to help us make life easier.
first we need to read an entire file:
char* read_file(char *file_name, int* file_size) {
if(strlen(file_name) > 0) {
FILE* file = fopen(file_name, "rb");
if(file) {
fseek(file, 0, SEEK_END);
int size = ftell(file);
fseek(file, 0, SEEK_SET);
if(file_size) { *file_size = size; }
char *file_content = (char*)malloc(size+1);
int read_amount = fread(file_content, size, 1, file);
file_content[size] = '\0';
if(read_amount) {
fclose(file);
return file_content;
}
free(file_content);
fclose(file);
return NULL;
}
}
return NULL;
}
Now we need to read big endian values, we assume mem is char*
,
it can also be a uint8_t
which is one of the standard int sizes but
less dependency on libraries the better, we technically only need 16 and 32 byte
reading as you don't have bigger than 32 bit sized types that we care about.
the only 64 bit type is longdatetime which is the usual unix epoch timestamp.
we are also going to define some data types to make writing unsigned int
a bit shorter
typedef unsigned char u8;
typedef char i8;
typedef unsigned short u16;
typedef short i16;
typedef unsigned int u32;
typedef int i32;
#define READ_BE16(mem) ((((u8*)(mem))[0] << 8) | (((u8*)(mem))[1]))
#define READ_BE32(mem) ((((u8*)(mem))[0] << 24) | (((u8*)(mem))[1] << 16) | (((u8*)(mem))[2] << 8) | (((u8*)(mem))[3]))
#define P_MOVE(mem, a) ((mem) += (a))
#define READ_BE16_MOVE(mem) (READ_BE16((mem))); (P_MOVE((mem), 2))
#define READ_BE32_MOVE(mem) (READ_BE32((mem))); (P_MOVE((mem), 4))
if you notice I haven't distinguished betwween signed and unsigned macros because the reading of the data is the same but the interpertation of the data is dependent on the type read into.
The font files are made up of tables that contain different information.
each table has a 4 byte tag which can be seen as an uint32
or 4 characters,
these tags are not stored with the table data rather they are stored in
a special table called the table directory which comes at he start of the file and identifies the offsets of all the other tables along with their tag.
The file structure starts with a table called the font directoy
and it must be the first table in the file,
it consists of 2 parts; the offset subtable
and table directory
,
offset subtable
contains info on the amount of tables and provides some precalculated numbers to help with binary searching for a particular table,
that is if the number of tables is huge or if you just want a faster search than linear,
this binary search is very useful when you want to search for a table and you haven't extracted that tables information into a specific data structure and dont' read all the tables in one go.
table directory
is an array that contains the tags of the table and their location offsets in bytes,
much like a directory of a book.
this is how we will define the struct for the font directory
typedef struct {
offset_subtable off_sub;
table_directory* tbl_dir;
} font_directoy;
Offset Subtable
Here is the structure for the Offset subtable
:
typedef struct {
u32 scaler_type;
u16 numTables;
u16 searchRange;
u16 entrySelector;
u16 rangeShift;
} offset_subtable;
the most important piece of info we need is of course the numTables
this will dicatet
how many tables there.
the scaler_type
is a tag that tells what this SFNT file contains
- a value of
0x74727565 ('true' as chars)
or0x00010000
are considered TrueType fonts - a value of
0x4F54544F ('OTTO' as chars)
means its an OpenType with PostScript outlines - a value of
0x74797031 ('typ1' as chars)
is type 1 postscript font files which was made by adobe in the 80s
the focus of this post is about the first values, the other values are different and not as common.
the searchRange
, entrySelector
and rangeShift
are all used for a binary search for a specific tag
which I will not indulge here as a simple linear is fine for me.
Table Directory
Here is the structure:
typedef struct {
union {
char tag_c[4];
u32 tag;
};
u32 checkSum;
u32 offset;
u32 length;
} table_directory;
the offset
is taken as an offset from the start of the file and not from table_directory
what follows the Offset table
is actually an array of (table_directory
)
the number of items in the array is numTables
from offset_subtable
.
the union
is for easier debugging as the 4 character version is easier to read as humans
one small note about the tag
if you read them as big endians
they come out reversed so a tag of cmap
becomes pamc
Now that we have at least two structs, lets read some data,
we begin by reading font_directory
which is nothing else than reading
the offset_subtable
and table_directory
,
void read_font_directory(char** mem, font_directory* ft) {
read_offset_subtable(mem, &ft->off_sub);
read_table_directory(mem, &ft->tbl_dir, ft->off_sub.numTables);
}
the reason for the char**
is that we are going to use it as a pointer inside the data stream of the file.
here is the read_offset_subtable
:
void read_offset_subtable(char** mem, offset_subtable* off_sub) {
char* m = *mem;
off_sub->scaler_type = READ_BE32_MOVE(m);
off_sub->numTables = READ_BE16_MOVE(m);
off_sub->searchRange = READ_BE16_MOVE(m);
off_sub->entrySelector = READ_BE16_MOVE(m);
off_sub->rangeShift = READ_BE16_MOVE(m);
*mem = m;
}
et voila we have the offset_table
very simple and straight to the point, then we read the table_directory:
void read_table_directory(char** mem, table_directory** tbl_dir, int tbl_size) {
char* m = *mem;
*tbl_dir = (table_directory*)calloc(1, sizeof(table_directory)*tbl_size);
for(int i = 0; i < tbl_size; ++i) {
table_directory* t = *tbl_dir + i;
t->tag = READ_BE32_MOVE(m);
t->checkSum = READ_BE32_MOVE(m);
t->offset = READ_BE32_MOVE(m);
t->length = READ_BE32_MOVE(m);
}
*mem = m;
}
we first get the size of the table_directory
and the rest is easy reading.
to make it easier to see what we have read we are going to write a print function:
void print_table_directory(table_directory* tbl_dir, int tbl_size) {
printf("#)\ttag\tlen\toffset\n");
for(int i = 0; i < tbl_size; ++i) {
table_directory* t = tbl_dir + i;
printf("%d)\t%c%c%c%c\t%d\t%d\n", i+1,
t->tag_c[3], t->tag_c[2],
t->tag_c[1], t->tag_c[0],
t->length, t->offset);
}
}
(Envy Code R has 15 tables)
now that is much better to read.
the tables that we need
given that we are ready to read some more data we move on onto our main quest,
All TTF files (TrueType variants) are required to have the following tables in them.
cmap
: this contains the mapping between a paricularCodepoint
toGlyph
index, i.e: given a character it tells us whichGlyph
is used to represent thatCharacter
in theTypeface
glyf
: thie contains the actualGlyph
outline data, all these are splines.head
: also called Font Header, it contains general information about a font.hhea
: horizontal header table, contains information about horizontal positioning of a font and its glyphs.hmtx
: an array that has horizontal metrics for each glyph, specificallyadvance width
andleft side bearing
.loca
: glyph location offsets, this table contains the offset of each glyph's data, its used together with the 'glyf' table this table is used to locate theGlyph
data inside theglyf
table.maxp
: Maximum Profile, this table contains the extends of the font, i.e: the number of glyphs, biggest number of points in aGlyph
, etc. basically it gives you the memory constraints of the font such as you can allocate a buffer to contain theGlyph
data and this table tells you how big you should make it so it can contain the biggestGlyph
.name
: this table contains the name of the font and human readable data.post
: we won't touch this table so we are going to skip it.
as you can see these tables all work together to get you the data, glyf
table contains the actual data,
loca
contains the offsets into the glyf
table which contain the Glyph outline
data,
cmap
gives you the Codepoint
to Glyph
mapping so you can get a Glyph Index
to use with the loca
table,
head
, hhea
, hmtx
together give you information about the font layouting and positioning.
we start with cmap
one of the more fun tables to work with
cmap entered the channel.
This table is a bit complicated if you come across it the first time so I'm gonna try to explain it first then we move on to the structure and reading.
this table is used when you want to map a specific Codepoint
to a Glyph index
,
The cmap
table contains a number of subtables which are called encoding subtables
because different platforms have developed different conventions when dealing with fonts
these encoding subtables
help which encoding format
should be used on which platform
which just means there are different ways the Character
to Glyph
mapping is encoded
and it depends on the platform the font is intended to be used on, a font that is
intended for more than one platform would have multiple encoding subtables
.
the encoding tables
each have a specific format out of nine format that are part of the standard,
we will look at one format which is format 4
for now, lets get some structures and begin reading.
cmap
begins with a version number
and then the number of encoding subtables
typedef struct {
u16 version;
u16 numberSubtables;
} cmap;
the encoding subtables
have the following structure,
typedef struct {
u16 platformID;
u16 platformSpecificID;
u32 offset;
} cmap_encoding_subtable;
the offset
here is an offset from the start of the cmap
table not from the start of the file.
because the encoding subtables
follow the cmap
we are going to change the cmap
struct into
typedef struct {
u16 version;
u16 numberSubtables;
cmap_encoding_subtable* subtables;
} cmap;
the platformID
and platformSpecificID
tell you which platform this encoding subtable
was intended for. platformID
can have these values:
0
is calledUnicode Platform
and is used when the platfrom supports unicode1
is to indicate Mac2
is reserved and not used3
is Microsoft encoding
both TrueType and OpenType specs recommend using the Unicode Platfrom
as its more generally supported.
most fonts will provide the Unicode Platform
hence that would be our main focus.
when the platformID
is 0
the platfromSpecificID
can have the following values:
0
,1
,2
- these indicate Unicode standard version 1 and 1.1 respectively and a value of 2 is deprecated.3
- Unicode 2.0 with Basic Multilingual Plane only (BMP this is called Plane 0 which contains the languages and a lot of symbols),4
- Unicode 2.0 with non-BMP allowed5
- Unicode Variation Sequences, these represent variation of aGlyph
as an example (from unicode.com FAQ) ≠ which is a variation of =.6
- very similar to (4) but can use more formats for thecmap
than number 4
all we need to take away from all of this (which is still confusing) is that
the most common platformID
and platformSpecificID
combination is (0, 3)
,
so we are going to focus on that combination only, supporting the other combinations
is just a practice in being more complete and doesn't contribute too much to the overall approach
all the encoding subtables
point to a cmap
which has a specific format.
all the formats have some common parts, they all start with a u16 format
field, followed by
the length
of this cmap
table then the language
code which is only used on Macintosh platformID
which in return is discouraged to be used.
so the start of each format is this:
typedef struct {
u16 format;
u16 length;
u16 language;
};
I have purposly left out the name for this struct because its not a complete as each format has a different structure.
Because there are nine formats available we will focus on what our font (envy.ttf
) contains
when it comes to encoding subtables
here is the function to read cmap
:
void read_cmap(char* mem, cmap* c) {
char *m = mem;
c->version = READ_BE16_MOVE(m);
c->numberSubtables = READ_BE16_MOVE(m);
c->subtables = (cmap_encoding_subtable*) calloc(1, sizeof(cmap_encoding_subtable)*c->numberSubtables);
for(int i = 0; i < c->numberSubtables; ++i) {
cmap_encoding_subtable* est = c->subtables + i;
est->platformID = READ_BE16_MOVE(m);
est->platformSpecificID = READ_BE16_MOVE(m);
est->offset = READ_BE32_MOVE(m);
}
}
and one to print it
void print_cmap(cmap* c) {
printf("#)\tpId\tpsID\toffset\ttype\n");
for(int i = 0; i < c->numberSubtables; ++i) {
cmap_encoding_subtable* cet = c->subtables + i;
printf("%d)\t%d\t%d\t%d\t", i+1, cet->platformID, cet->platformSpecificID, cet->offset);
switch(cet->platformID) {
case 0: printf("Unicode"); break;
case 1: printf("Mac"); break;
case 2: printf("Not Supported"); break;
case 3: printf("Microsoft"); break;
}
printf("\n");
}
}
to begin our reading we first need to find the cmap
offset in the table_directory
:
int main(int argc, char** argv) {
int file_size = 0;
char* file = read_file("envy.ttf", &file_size);
char* mem_ptr = file;
font_directory ft = {0};
read_font_directory(&mem_ptr, &ft);
for(int i = 0; i < ft.off_sub.numTables; ++i){
if(ft.tbl_dir[i].tag == READ_BE32("cmap")) {
cmap c = {0};
char* tbl_ptr = file + ft.tbl_dir[i].offset;
read_cmap(tbl_ptr, &c);
print_cmap(&c);
}
}
return 0;
}
After we find the offset
from the table_directory
we add it to the start of the file and that is wher our cmap
begins
and we just pass this pointer to read_cmap
here is our result after we read and print the cmap
:
We are going to focus on what format the Unicode Platform
has
so lets check it:
as you can see the format value is format 4
cmap
format 4
format4
is one of the most common formats,
as we know so far cmap
s are used to map Codepoints
to Glyph indices
this mapping of Codepoint
to Glyph index
is different depending on the format used
format 4 is used when you support the Unicode BMP
but not all the Codepoints
in the entire plane range (0x0000 to 0xFFFF) thus there are holes
in range.
this is also indicated by the platformSpecificID
(id number 3) for the Unicode Platform
.
Note:
The unicode standard is divided into 17 planes which are ranges of
Codepoints
for letters, symbols, etc of all the different languages, each plane has a size of 16 bits which means each plane contains 2^16 (65,536)Codepoints
, the first plane (Plane 0) is the plane that contains theCodepoint
s of the languages and a lot of symbols most commonly used and its range is0x0000 -> 0xFFFF
format 4
's structure is very straight forward:
typedef struct {
u16 format;
u16 length;
u16 language;
u16 segCountX2;
u16 searchRange;
u16 entrySelector;
u16 rangeShift;
u16 reservedPad;
u16 *endCode;
u16 *startCode;
u16 *idDelta;
u16 *idRangeOffset;
u16 *glyphIdArray;
} format4;
the first 3 fields are the ones we mentioned before that are common across all the formats
the searchRange
, entrySelector
, rangeShift
are again used for a binary search (which we will also ignore and use a linear search)
the reservedPad
field must be zero per the specs.
the startCode
, endCode
, idDelta
, idRangeOffset
are 4 parallel arrays that are to be used together.
they all have size of segCountX2
divided by 2.
the glyphIdArray
extends until the end of the table, we will calculate the size from
the length
field and how many bytes of that length remains after we read the 4 arrays
now then lets read our format 4
void read_format4(char* mem, format4** format) {
char* m = mem;
u16 length = READ_BE16(m + 2);
format4* f = NULL;
f = (format4*) calloc(1, length + sizeof(u16*)*5);
f->format = READ_BE16_MOVE(m);
f->length = READ_BE16_MOVE(m);
f->language = READ_BE16_MOVE(m);
f->segCountX2 = READ_BE16_MOVE(m);
f->searchRange = READ_BE16_MOVE(m);
f->entrySelector = READ_BE16_MOVE(m);
f->rangeShift = READ_BE16_MOVE(m);
f->endCode = (u16*) ((u8*)f + sizeof(format4));
f->startCode = f->endCode + f->segCountX2/2;
f->idDelta = f->startCode + f->segCountX2/2;
f->idRangeOffset = f->idDelta + f->segCountX2/2;
f->glyphIdArray = f->idRangeOffset + f->segCountX2/2;
char* start_code_start = m + f->segCountX2 + 2;
char* id_delta_start = m + f->segCountX2*2 + 2;
char* id_range_start = m + f->segCountX2*3 + 2;
for(int i = 0; i < f->segCountX2/2; ++i) {
f->endCode[i] = READ_BE16(m + i*2);
f->startCode[i] = READ_BE16(start_code_start + i*2);
f->idDelta[i] = READ_BE16(id_delta_start + i*2);
f->idRangeOffset[i] = READ_BE16(id_range_start + i*2);
}
P_MOVE(m, f->segCountX2*4 + 2);
int remaining_bytes = f->length - (m - mem);
f->glyph_id_count = remaining_bytes/2;
for(int i = 0; i < remaining_bytes/2; ++i) {
f->glyphIdArray[i] = READ_BE16_MOVE(m);
}
*format = f;
}
The code here is straight forward reading,
we get the length first then allocate enough memory for the entire table taking
into account memory needed for the pointers of the 5 arrays our format4
struct has,
we need to do this because the data of the font doesn't take into account those pointers
We will read the 5 arrays into the end of the memory we allocated, we put the
endCode
array right after where the format4
struct is in the memory ends,
then comes startCode
and then the other three arrays after that.
here is a picture to futher clearify how the memory allocated is used
my drawing skills leave much to be desired
lets write a function to print the format4
too.
void print_format4(format4 *f4) {
printf("Format: %d, Length: %d, Language: %d, Segment Count: %d\n", f4->format, f4->length, f4->language, f4->segCountX2/2);
printf("Search Params: (searchRange: %d, entrySelector: %d, rangeShift: %d)\n",
f4->searchRange, f4->entrySelector, f4->rangeShift);
printf("Segment Ranges:\tstartCode\tendCode\tidDelta\tidRangeOffset\n");
for(int i = 0; i < f4->segCountX2/2; ++i) {
printf("--------------:\t% 9d\t% 7d\t% 7d\t% 12d\n", f4->startCode[i], f4->endCode[i], f4->idDelta[i], f4->idRangeOffset[i]);
}
}
the picture doesn't show the entire printed output as it was too big
Now that we have read the cmap
table we will begin our mapping by
using the startCode
, endCode
, idRangeOffset
, idDelta
arrays which at the end
will give us a pointer into the glyphIdArray
which gives us the final Glyph Index
.
Fantastic Glyph
s and where to find them. (we are looking for glyph indices but that doesn't sound as nice)
Here is how we are going to do mapping of a Codepoint
:
- find the index of the first
endCode
which is bigger than or equal to theCodepoint
. all the arrays are sorted by increasingendCode
values. - check the corresponding
startCode
, if its smaller than or equal to theCodepoint
then we move to step 3 else we return 0 which means glyph not found. - check the corrosponding
idRangeOffset
if its not 0 then go to step 4 otherwise go to step 7 - take the value from step 3 and add it to the address of the value. (
idRangeOffset[i] + (&idRangeOffset[i])
- take the difference between the
Codepoint
and thestartCode
(code_point - startCode[i]
) and add it to step 4. - the value from the dereferncing the pointer in step 5 is checked, if its 0 then the glyph index is 0 and the glyph is not available.
if its not zero we add it to the corrosponding
idDelta
and get the glyph index. skip step 7. - if
idRangeOffset
is 0 then add theCodepoint
to the corrospondingidDelta
to get the glyph index.
seems a bit confusing but once we write the code for this its a bit more clear
int get_glyph_index(u16 code_point, format4 *f) {
int index = -1;
u16 *ptr = NULL;
for(int i = 0; i < f->segCountX2/2; i++) {
if(f->endCode[i] > code_point) {index = i; break;};
}
if(index == -1) return 0;
if(f->startCode[index] < code_point) {
if(f->idRangeOffset[index] != 0) {
ptr = f->idRangeOffset + index + f->idRangeOffset[index]/2;
ptr += code_point - f->startCode[index];
if(*ptr == 0) return 0;
return *ptr + f->idDelta[index];
} else {
return code_point + f->idDelta[index];
}
}
return 0;
}
all right that is all there is to it for getting Glyph Index
s.
you can go ahead and try a couple of different indexs, using the font Envy Code R
here is the Glyph Index
s of the uppercase english letters:
break
if you have come this far, take a small break its healthy.
'loca' and the boys
now that we have the Glyph index
mapping working, we move onto getting the Glyph
data
which is stored in the glyf
table, the offset of the Glyph
in the glyf
is
stored in the loca
table which is a very straight forward simple array of offset
s indexed by the Glyph Index
but in order to read the loca
table we need to know which version of the loca
table we are dealing with,
there are two version a 16 bit one and a 32 bit one which are named short
and long
versions respectivly
in the short
version the offset divided by 2 is actually stored so we need to multiplie the offsets by 2,
in the long
version this isn't the case, in order to tell which version of the table we have
we must read a field in the head
table called indexToLocFormat
.
the simpler version is:
- read
indexToLocFormat
field in thehead
table to determine which version ofloca
we are dealing with. - depending on the version we either have a u16 bit or a u32 bit array of offsets.
- after we read the offset from the
loca
table we will use this offset from the start of theglyf
table to get theGlyph
data.
the head
table structure is as follows:
typedef struct {
Fixed version;
Fixed fontRevision;
u32 checkSumAdjustment;
u32 magicNumber;
u16 flags;
u16 unitsPerEm;
i64 created;
i64 modified;
FWord xMin;
FWord yMin;
FWord xMax;
FWord yMax;
u16 macStyle;
u16 lowestRecPPEM;
i16 fontDirectionHint;
i16 indexToLocFormat;
i16 glyphDataFormat;
}
a whole lot of data and fields, first up the different data types we are seeing:
- Fixed: this is a fixed-point 32 bit integer, 16 bits for the whole part (the upper 16 bits) and 16 bit is used for the fractional part (lower 16 bits)[1].
- i64: this is the 64 bit sized type we mentioned in the beginning, its a datetime timestamp.
- FWord: 16 bit signed integer that describes quantities in FUnits[2].
Now what the fields mean:
version
andfontRevision
are both self explanatory.checkSumAdjustment
is a checksum of thehead
tble- the
magicNumber
is set to0x5F0F3CF5
- we are going to ignore the
flags
field as it will introduce too much complexity and makes the article a bit longer than it already is. unitsPerEm
is how many FUnits are in1 em
usually power of 2 and most common is (2048)- created and modified are timestamps.
- xMin, yMin, xMax, yMax defines a rectangle in FUnits which is the max bounding box for all the glyphs
macStyle
, only the first lower 6 bits are used and in order if set they mean font has this style (Bold, Italic, Underline, Outline, Shadow, Condensed, Extended)lowestRecPPEM
is the smallest readable size in pixelsfontDirectionHint
hints at what the font direction is suppose to be, its deprecated usually set to (2)indexToLocFormat
0 is forshort
offsets (16 bits), 1 is forlong
offsets (32 bits).glyphDataFormat
set to 0
If you notice that I haven't given the struct a name, that is because reading this into a struct is very easy and straight to the point
but because the only thing we need from this table for now is the indexToLocFormat
we are going to just read that one field, its 50 bytes from the start of the struct.
for our loca
, head
and glyf
tables we aren't going to read them into structs rather we will directly read the data we need
so we will save the pointer to these tables in our font_directory
.
while we are at it we know that this font Envy Code R
has has a format4 cmap
table so we will add the format4
and cmap
structs to the font_directory
thus making the font_directory
into this:
typedef struct {
offset_subtable off_sub;
table_directory* tbl_dir;
format4* f4;
cmap* cmap;
char* glyf;
char* loca;
char* head;
} font_directoy;
and we will set these pointers and read the structs in the read_table_directory
function
thus making our read_table_directory
and read_font_directory
function into these:
void read_font_directory(char* file_start, char** mem, font_directory* ft) {
read_offset_subtable(mem, &ft->off_sub);
read_table_directory(file_start, mem, ft);
}
void read_table_directory(char* file_start, char** mem, font_directory* ft) {
char* m = *mem;
ft->tbl_dir = (table_directory*)calloc(1, sizeof(table_directory)*ft->off_sub.numTables);
for(int i = 0; i < ft->off_sub.numTables; ++i) {
table_directory* t = ft->tbl_dir + i;
t->tag = READ_BE32_MOVE(m);
t->checkSum = READ_BE32_MOVE(m);
t->offset = READ_BE32_MOVE(m);
t->length = READ_BE32_MOVE(m);
switch(t->tag) {
case GLYF_TAG: ft->glyf = t->offset + file_start; break;
case LOCA_TAG: ft->loca = t->offset + file_start; break;
case HEAD_TAG: ft->head = t->offset + file_start; break;
case CMAP_TAG: {
ft->cmap = (cmap*) calloc(1, sizeof(cmap));
read_cmap(file_start + t->offset, ft->cmap);
read_format4(file_start + t->offset + ft->cmap->subtables[0].offset, &ft->f4);
} break;
}
}
*mem = m;
}
in order to make our switch case work, we will have these macros
#define FONT_TAG(a, b, c, d) (a<<24|b<<16|c<<8|d)
#define GLYF_TAG FONT_TAG('g', 'l', 'y', 'f')
#define LOCA_TAG FONT_TAG('l', 'o', 'c', 'a')
#define HEAD_TAG FONT_TAG('h', 'e', 'a', 'd')
#define CMAP_TAG FONT_TAG('c', 'm', 'a', 'p')
FONT_TAG
basically recalculates the 32 bit unsigned integer from the characters at compile time making it a constants and useable with a switch.
given we are reading the cmap
and format4
the moment we encounter them, it makes our main function a lot less cluttered and simple:
int main(int argc, char** argv) {
int file_size = 0;
char* file = read_file("envy.ttf", &file_size);
char* mem_ptr = file;
font_directory ft = {0};
read_font_directory(file, &mem_ptr, &ft);
return 0;
}
all of these changes make reading indexLocToFormat
super simple:
int read_loca_type(font_directory* ft) {
return READ_BE16(ft->head + 50);
}
so lets read it.
the value is 0
meaning we have the short
16 bit sized version of the loca
table
now to read the offset from loca
is also very simple we index the loca
using
the Glyph Index
we get from get_glyph_index
so we have this snippet.
u32 get_glyph_offset(font_directory *ft, u32 glyph_index) {
u32 offset = 0;
if(read_loca_type(ft)) {
//32 bit
offset = READ_BE32((u32*)ft->loca + glyph_index);
} else {
offset = READ_BE16((u16*)ft->loca + glyph_index)*2;
}
return offset;
}
given that the output for this function isn't very exciting but just to check your code
for the Glyph Index
622 which is the letter A
the offset is 69700
;
we will take a moment here to rewrite our get_glyph_index
function to take a Codepoint
and font_directory
instead of taking the format4
directly.
int get_glyph_index(font_directory* ft, u16 code_point) {
format4 *f = ft->f4;
int index = -1;
u16 *ptr = NULL;
for(int i = 0; i < f->segCountX2/2; i++) {
if(f->endCode[i] > code_point) {index = i; break;};
}
if(index == -1) return 0;
if(f->startCode[index] < code_point) {
if(f->idRangeOffset[index] != 0) {
ptr = f->idRangeOffset + index + f->idRangeOffset[index]/2;
ptr += code_point - f->startCode[index];
if(*ptr == 0) return 0;
return *ptr + f->idDelta[index];
} else {
return code_point + f->idDelta[index];
}
}
return 0;
}
we are done with the loca
table now, we move onto the glyf
table which is our last section
Where the wild glyf
s are.
This is the last section of our Part 1 and its extracting the Glyph
data
more commonly called Glyph Outline
and as always we start with the structure,
typedef struct {
u16 numberOfContours;
i16 xMin;
i16 yMin;
i16 xMax;
i16 yMax;
};
this isn't a complete struct, the data that follows depends on the numberOfContours
field, if numberOfContours
is bigger than zero it means its a simple Glyph
if its less than zero it means its a componded Glyph
meaning its made up of
one or more Glyph
s with a certain amount of transformations applied, we will have a gander at compound
Glyph
s in part 1.5 (which might be after Part 2).
so for simple Glyph
s we have the following structure:
typedef struct {
u16 numberOfContours;
i16 xMin;
i16 yMin;
i16 xMax;
i16 yMax;
u16 instructionLength;
u8* instructions;
u8* flags;
i16* xCoordinates;
i16* yCoordinates;
u16* endPtsOfContours;
} glyph_outline;
if you go ahead and check the reference we are using for this, you will see that
the xCoordinates
and yCoordinates
could either be a u8
or i16
depending on the flags
for making things easier we are just going to use the i16
because it will accommodate both sizes.
the numberOfContours
tells us how many contours this shape has, it also gives us the
size of endPtsOfContours
, the endPtsOfContours
array gives us the indicies of the end points of the contours
i.e: endPtsOfContours[0]
gives us the index into xCoordinates
and yCoordinates
array
where the first contour ends and endPtsOfContours[1]
gives us the index where the 2nd contour ends.
this means the last value in endPtsContour
array will give us the number of points all the contours combined have
thus we can allocate enough memory for all for all of them.
the intructionLength
and instructions
together gives us the instructions that are needed to do grid fitting (hinting) for this Glyph
we won't be doing this as it requires us to code a complete virtual machine to run the instructions but its a fun project to do so maybe for another time.
the flags
array gives us information about each of the points that the Glyph
has, each flag is 8 bits and each bit has the following meanings:
- Bit 1: if set it means this point is on the
Glyph
s curve, otherwise the point is off curve. - Bit 2: if set the corrosponding x coordiante is 1 byte otherwise its 2 bytes.
- Bit 3: if set the corrosponding y coordinate is 1 byte otherwise its 2 bytes.
- Bit 4: (repeat) if set the next byte specifies how many timet his flag repeats. this is a small way to compress the flags array.
- Bit 5, 6: these both relate to bit 1 and 2 respectivily and are better explained in a table.
- Bit 7, 8: those are reserved and set to zero in TrueType.
the bits (2, 5) and (3, 6) both have the same functionality but one is for the x coordinate the other is for y. 2 bits give us 4 possibilities so here is the table:
Bit 2 (3) | Bit 5 (6) | Meaning |
---|---|---|
0 | 0 | (0) The current coordinate is 16 bit signed delta change. |
0 | 1 | (1) The current coordinate is 16 bit, has the same value as the previous one. |
1 | 0 | (2) the current coordinate is 8 bit, value is negative. |
1 | 1 | (3) the current coordinate is 8 bit, value is positive. |
we are going to use a union that has bitfields to help us make life easier when dealing with these.
typedef union {
typedef struct {
u8 on_curve: 1;
u8 x_short: 1;
u8 y_short: 1;
u8 repeat: 1;
u8 x_short_pos: 1;
u8 y_short_pos: 1;
u8 reserved1: 1;
u8 reserved2: 1;
};
u8 flag;
} glyph_flag;
we will replace the type of the flags
field in the glyph_outline
struct to be of glyph_flag
this will be easier for some coding.
this is all the information we need to read the Glyph outline
correctly, we first read and uncompress the flags
then we read the coordinates.
first the entire function:
glyph_outline get_glyph_outline(font_directory* ft, u32 glyph_index) {
u32 offset = get_glyph_offset(ft, glyph_index);
char* glyph_start = ft->glyf + offset;
glyph_outline outline = {0};
outline.numberOfContours = READ_BE16_MOVE(glyph_start);
outline.xMin = READ_BE16_MOVE(glyph_start);
outline.yMin = READ_BE16_MOVE(glyph_start);
outline.xMax = READ_BE16_MOVE(glyph_start);
outline.yMax = READ_BE16_MOVE(glyph_start);
outline.endPtsOfContours = (u16*) calloc(1, outline.numberOfContours*sizeof(u16));
for(int i = 0; i < outline.numberOfContours; ++i) {
outline.endPtsOfContours[i] = READ_BE16_MOVE(glyph_start);
}
outline.instructionLength = READ_BE16_MOVE(glyph_start);
outline.instructions = (u8*)calloc(1, outline.instructionLength);
memcpy(outline.instructions, glyph_start, outline.instructionLength);
P_MOVE(glyph_start, outline.instructionLength);
int last_index = outline.endPtsOfContours[outline.numberOfContours-1];
outline.flags = (glyph_flag*) calloc(1, last_index + 1);
for(int i = 0; i < (last_index + 1); ++i) {
outline.flags[i].flag = *glyph_start;
glyph_start++;
if(outline.flags[i].repeat) {
u8 repeat_count = *glyph_start;
while(repeat_count-- > 0) {
i++;
outline.flags[i] = outline.flags[i-1];
}
glyph_start++;
}
}
outline.xCoordinates = (i16*) calloc(1, (last_index+1)*2);
i16 prev_coordinate = 0;
i16 current_coordinate = 0;
for(int i = 0; i < (last_index+1); ++i) {
int flag_combined = outline.flags[i].x_short << 1 | outline.flags[i].x_short_pos;
switch(flag_combined) {
case 0: {
current_coordinate = READ_BE16_MOVE(glyph_start);
} break;
case 1: { current_coordinate = 0; }break;
case 2: { current_coordinate = (*(u8*)glyph_start++)*-1; }break;
case 3: { current_coordinate = (*(u8*)glyph_start++); } break;
}
outline.xCoordinates[i] = current_coordinate + prev_coordinate;
prev_coordinate = outline.xCoordinates[i];
}
outline.yCoordinates = (i16*) calloc(1, (last_index+1)*2);
current_coordinate = 0;
prev_coordinate = 0;
for(int i = 0; i < (last_index+1); ++i) {
int flag_combined = outline.flags[i].y_short << 1 | outline.flags[i].y_short_pos;
switch(flag_combined) {
case 0: {
current_coordinate = READ_BE16_MOVE(glyph_start);
} break;
case 1: { current_coordinate = 0; }break;
case 2: { current_coordinate = (*(u8*)glyph_start++)*-1; }break;
case 3: { current_coordinate = (*(u8*)glyph_start++); } break;
}
outline.yCoordinates[i] = current_coordinate + prev_coordinate;
prev_coordinate = outline.yCoordinates[i];
}
return outline;
}
one of our bigger functions so lets break it down:
u32 offset = get_glyph_offset(ft, glyph_index);
char* glyph_start = ft->glyf + offset;
glyph_outline outline = {0};
outline.numberOfContours = READ_BE16_MOVE(glyph_start);
outline.xMin = READ_BE16_MOVE(glyph_start);
outline.yMin = READ_BE16_MOVE(glyph_start);
outline.xMax = READ_BE16_MOVE(glyph_start);
outline.yMax = READ_BE16_MOVE(glyph_start);
outline.endPtsOfContours = (u16*) calloc(1, outline.numberOfContours*sizeof(u16));
for(int i = 0; i < outline.numberOfContours; ++i) {
outline.endPtsOfContours[i] = READ_BE16_MOVE(glyph_start);
}
outline.instructionLength = READ_BE16_MOVE(glyph_start);
outline.instructions = (u8*)calloc(1, outline.instructionLength);
memcpy(outline.instructions, glyph_start, outline.instructionLength);
P_MOVE(glyph_start, outline.instructionLength);
this part is pretty straight forward, we get the offset of the Glyph
and position glyph_start
to be at the start of that Glyph
s data.
we start a new glyph_outline
struct and read the easier fields.
the instructions
are just an arary of bytes so doing a memcpy
is a lot faster
than looping over memory.
now the part the reads the flags
array
int last_index = outline.endPtsOfContours[outline.numberOfContours-1];
outline.flags = (glyph_flag*) calloc(1, last_index + 1);
for(int i = 0; i < (last_index + 1); ++i) {
outline.flags[i].flag = *glyph_start;
glyph_start++;
if(outline.flags[i].repeat) {
u8 repeat_count = *glyph_start;
while(repeat_count-- > 0) {
i++;
outline.flags[i] = outline.flags[i-1];
}
glyph_start++;
}
}
we already know how big the flags
array will be based on endPtsOfContours
which gives us the last index
of the last contour's point meaning the size of array is last_index + 1
.
we read the flags
one by one and everytime we encounter a repeat
we read the next byte
which gives us the amount we have to repeat.
now that we have the flags we can determine the x and y coordinates:
outline.xCoordinates = (i16*) calloc(1, (last_index+1)*2);
i16 prev_coordinate = 0;
i16 current_coordinate = 0;
for(int i = 0; i < (last_index+1); ++i) {
int flag_combined = outline.flags[i].x_short << 1 | outline.flags[i].x_short_pos;
switch(flag_combined) {
case 0: {
current_coordinate = READ_BE16_MOVE(glyph_start);
} break;
case 1: { current_coordinate = 0; }break;
case 2: { current_coordinate = (*glyph_start++)*-1; }break;
case 3: { current_coordinate = (*glyph_start++); } break;
}
outline.xCoordinates[i] = current_coordinate + prev_coordinate;
prev_coordinate = outline.xCoordinates[i];
}
this code is the same for both xCoordinates
and yCoordinates
,
every coordinate point is relative to the previous one except the first which relative to 0
which means at each iteration the value for the current coordinate is the previous value summed with the data we just read.
under case 1
its indicated that the value of the current coordinate is the same as previous thus we don't have any data to read.
we repeat the same loop for the yCoordiantes
and that is it.
now to print the outline.
void print_glyph_outline(glyph_outline *outline) {
printf("#contours\t(xMin,yMin)\t(xMax,yMax)\tinst_length\n");
printf("%9d\t(%d,%d)\t\t(%d,%d)\t%d\n", outline->numberOfContours,
outline->xMin, outline->yMin,
outline->xMax, outline->yMax,
outline->instructionLength);
printf("#)\t( x , y )\n");
int last_index = outline->endPtsOfContours[outline->numberOfContours-1];
for(int i = 0; i <= last_index; ++i) {
printf("%d)\t(%5d,%5d)\n", i, outline->xCoordinates[i], outline->yCoordinates[i]);
}
}
and that is all she wrote. (for now)
Is it Over?
There is still Part 2 coming which will be how to rasierize this outline, it will include initializing OpengGL Core profile on Windows nativily without any libraries, loading the OpenGL functions we need, Scanline Raserization of our outline, some basic AA.
as a wrap-up here I'm going to include the things we haven't covered.
- we didn't touch the
head
,hhea
andhmtx
and the other tables that relate to positioning ofGlyph
s I will try to include those in a seperate post so that this doesn't really become too big of a post. - We skipped reading any
platformID
,platformSpecificID
pair other than (0, 3) - We skipped reading any
cmap
table format other thanformat 4
- We skipped reading
Comounded Glyph
data and just focused on simpleGlyph
s - We skipped doing anything we the
instructions
we read for aGlyph
[1] in order to convert a Fixed number into our normal float
types, we first need to figure what our fraction is
which is the lower 16 bits
divided by (2^16) we then simply add this to our whole part and we get the number in float.
In OpenType this version is actually divided back into 2 16 bit fields, one called Major version and the other Minor version.
[2] every font is designed on a 2D grid whose coordinates are in FUnits the size of this grid is 1 em
,
and how many FUnits are in this 1 em
is defined by the unitsPerEm
field of the head
table,
the higher this count the more precision font designers have.
Here is all the code combined:
#include <stdio.h>
#define READ_BE16(mem) ((((u8*)(mem))[0] << 8) | (((u8*)(mem))[1]))
#define READ_BE32(mem) ((((u8*)(mem))[0] << 24) | (((u8*)(mem))[1] << 16) | (((u8*)(mem))[2] << 8) | (((u8*)(mem))[3]))
#define P_MOVE(mem, a) ((mem) += (a))
#define READ_BE16_MOVE(mem) (READ_BE16((mem))); (P_MOVE((mem), 2))
#define READ_BE32_MOVE(mem) (READ_BE32((mem))); (P_MOVE((mem), 4))
#define FONT_TAG(a, b, c, d) (a<<24|b<<16|c<<8|d)
#define GLYF_TAG FONT_TAG('g', 'l', 'y', 'f')
#define LOCA_TAG FONT_TAG('l', 'o', 'c', 'a')
#define HEAD_TAG FONT_TAG('h', 'e', 'a', 'd')
#define CMAP_TAG FONT_TAG('c', 'm', 'a', 'p')
char* read_file(char *file_name, int* file_size) {
if(strlen(file_name) > 0) {
FILE* file = fopen(file_name, "rb");
if(file) {
fseek(file, 0, SEEK_END);
int size = ftell(file);
fseek(file, 0, SEEK_SET);
if(file_size) { *file_size = size; }
char *file_content = (char*)malloc(size+1);
int read_amount = fread(file_content, size, 1, file);
file_content[size] = '\0';
if(read_amount) {
fclose(file);
return file_content;
}
free(file_content);
fclose(file);
return NULL;
}
}
return NULL;
}
typedef unsigned char u8;
typedef char i8;
typedef unsigned short u16;
typedef short i16;
typedef unsigned int u32;
typedef int i32;
typedef struct {
u32 scaler_type;
u16 numTables;
u16 searchRange;
u16 entrySelector;
u16 rangeShift;
} offset_subtable;
typedef struct {
u16 platformID;
u16 platformSpecificID;
u32 offset;
} cmap_encoding_subtable;
typedef struct {
u16 version;
u16 numberSubtables;
cmap_encoding_subtable* subtables;
} cmap;
typedef struct {
u16 format;
u16 length;
u16 language;
u16 segCountX2;
u16 searchRange;
u16 entrySelector;
u16 rangeShift;
u16 reservedPad;
u16 *endCode;
u16 *startCode;
u16 *idDelta;
u16 *idRangeOffset;
u16 *glyphIdArray;
} format4;
typedef struct {
union {
char tag_c[4];
u32 tag;
};
u32 checkSum;
u32 offset;
u32 length;
} table_directory;
typedef struct {
offset_subtable off_sub;
table_directory* tbl_dir;
format4* f4;
cmap* cmap;
char* glyf;
char* loca;
char* head;
} font_directory;
typedef union {
typedef struct {
u8 on_curve: 1;
u8 x_short: 1;
u8 y_short: 1;
u8 repeat: 1;
u8 x_short_pos: 1;
u8 y_short_pos: 1;
u8 reserved1: 1;
u8 reserved2: 1;
};
u8 flag;
} glyph_flag;
typedef struct {
u16 numberOfContours;
i16 xMin;
i16 yMin;
i16 xMax;
i16 yMax;
u16 instructionLength;
u8* instructions;
glyph_flag* flags;
i16* xCoordinates;
i16* yCoordinates;
u16* endPtsOfContours;
} glyph_outline;
void read_offset_subtable(char** mem, offset_subtable* off_sub) {
char* m = *mem;
off_sub->scaler_type = READ_BE32_MOVE(m);
off_sub->numTables = READ_BE16_MOVE(m);
off_sub->searchRange = READ_BE16_MOVE(m);
off_sub->entrySelector = READ_BE16_MOVE(m);
off_sub->rangeShift = READ_BE16_MOVE(m);
*mem = m;
}
void read_cmap(char* mem, cmap* c) {
char *m = mem;
c->version = READ_BE16_MOVE(m);
c->numberSubtables = READ_BE16_MOVE(m);
c->subtables = (cmap_encoding_subtable*) calloc(1, sizeof(cmap_encoding_subtable)*c->numberSubtables);
for(int i = 0; i < c->numberSubtables; ++i) {
cmap_encoding_subtable* est = c->subtables + i;
est->platformID = READ_BE16_MOVE(m);
est->platformSpecificID = READ_BE16_MOVE(m);
est->offset = READ_BE32_MOVE(m);
}
}
void print_cmap(cmap* c) {
printf("#)\tpId\tpsID\toffset\ttype\n");
for(int i = 0; i < c->numberSubtables; ++i) {
cmap_encoding_subtable* cet = c->subtables + i;
printf("%d)\t%d\t%d\t%d\t", i+1, cet->platformID, cet->platformSpecificID, cet->offset);
switch(cet->platformID) {
case 0: printf("Unicode"); break;
case 1: printf("Mac"); break;
case 2: printf("Not Supported"); break;
case 3: printf("Microsoft"); break;
}
printf("\n");
}
}
void read_format4(char* mem, format4** format) {
char* m = mem;
u16 length = READ_BE16(m + 2);
format4* f = NULL;
f = (format4*) calloc(1, length + sizeof(u16*)*5);
f->format = READ_BE16_MOVE(m);
f->length = READ_BE16_MOVE(m);
f->language = READ_BE16_MOVE(m);
f->segCountX2 = READ_BE16_MOVE(m);
f->searchRange = READ_BE16_MOVE(m);
f->entrySelector = READ_BE16_MOVE(m);
f->rangeShift = READ_BE16_MOVE(m);
f->endCode = (u16*) ((u8*)f + sizeof(format4));
f->startCode = f->endCode + f->segCountX2/2;
f->idDelta = f->startCode + f->segCountX2/2;
f->idRangeOffset = f->idDelta + f->segCountX2/2;
f->glyphIdArray = f->idRangeOffset + f->segCountX2/2;
char* start_code_start = m + f->segCountX2 + 2;
char* id_delta_start = m + f->segCountX2*2 + 2;
char* id_range_start = m + f->segCountX2*3 + 2;
for(int i = 0; i < f->segCountX2/2; ++i) {
f->endCode[i] = READ_BE16(m + i*2);
f->startCode[i] = READ_BE16(start_code_start + i*2);
f->idDelta[i] = READ_BE16(id_delta_start + i*2);
f->idRangeOffset[i] = READ_BE16(id_range_start + i*2);
}
P_MOVE(m, f->segCountX2*4 + 2);
int remaining_bytes = f->length - (m - mem);
for(int i = 0; i < remaining_bytes/2; ++i) {
f->glyphIdArray[i] = READ_BE16_MOVE(m);
}
*format = f;
}
void print_format4(format4 *f4) {
printf("Format: %d, Length: %d, Language: %d, Segment Count: %d\n", f4->format, f4->length, f4->language, f4->segCountX2/2);
printf("Search Params: (searchRange: %d, entrySelector: %d, rangeShift: %d)\n",
f4->searchRange, f4->entrySelector, f4->rangeShift);
printf("Segment Ranges:\tstartCode\tendCode\tidDelta\tidRangeOffset\n");
for(int i = 0; i < f4->segCountX2/2; ++i) {
printf("--------------:\t% 9d\t% 7d\t% 7d\t% 12d\n", f4->startCode[i], f4->endCode[i], f4->idDelta[i], f4->idRangeOffset[i]);
}
}
void read_table_directory(char* file_start, char** mem, font_directory* ft) {
char* m = *mem;
ft->tbl_dir = (table_directory*)calloc(1, sizeof(table_directory)*ft->off_sub.numTables);
for(int i = 0; i < ft->off_sub.numTables; ++i) {
table_directory* t = ft->tbl_dir + i;
t->tag = READ_BE32_MOVE(m);
t->checkSum = READ_BE32_MOVE(m);
t->offset = READ_BE32_MOVE(m);
t->length = READ_BE32_MOVE(m);
switch(t->tag) {
case GLYF_TAG: ft->glyf = t->offset + file_start; break;
case LOCA_TAG: ft->loca = t->offset + file_start; break;
case HEAD_TAG: ft->head = t->offset + file_start; break;
case CMAP_TAG: {
ft->cmap = (cmap*) calloc(1, sizeof(cmap));
read_cmap(file_start + t->offset, ft->cmap);
read_format4(file_start + t->offset + ft->cmap->subtables[0].offset, &ft->f4);
} break;
}
}
*mem = m;
}
void print_table_directory(table_directory* tbl_dir, int tbl_size) {
printf("#)\ttag\tlen\toffset\n");
for(int i = 0; i < tbl_size; ++i) {
table_directory* t = tbl_dir + i;
printf("%d)\t%c%c%c%c\t%d\t%d\n", i+1,
t->tag_c[3], t->tag_c[2],
t->tag_c[1], t->tag_c[0],
t->length, t->offset);
}
}
void read_font_directory(char* file_start, char** mem, font_directory* ft) {
read_offset_subtable(mem, &ft->off_sub);
read_table_directory(file_start, mem, ft);
}
int get_glyph_index(font_directory* ft, u16 code_point) {
format4 *f = ft->f4;
int index = -1;
u16 *ptr = NULL;
for(int i = 0; i < f->segCountX2/2; i++) {
if(f->endCode[i] > code_point) {index = i; break;};
}
if(index == -1) return 0;
if(f->startCode[index] < code_point) {
if(f->idRangeOffset[index] != 0) {
ptr = f->idRangeOffset + index + f->idRangeOffset[index]/2;
ptr += code_point - f->startCode[index];
if(*ptr == 0) return 0;
return *ptr + f->idDelta[index];
} else {
return code_point + f->idDelta[index];
}
}
return 0;
}
int read_loca_type(font_directory* ft) {
return READ_BE16(ft->head + 50);
}
u32 get_glyph_offset(font_directory *ft, u32 glyph_index) {
u32 offset = 0;
if(read_loca_type(ft)) {
//32 bit
offset = READ_BE32((u32*)ft->loca + glyph_index);
} else {
offset = READ_BE16((u16*)ft->loca + glyph_index)*2;
}
return offset;
}
glyph_outline get_glyph_outline(font_directory* ft, u32 glyph_index) {
u32 offset = get_glyph_offset(ft, glyph_index);
unsigned char* glyph_start = ft->glyf + offset;
glyph_outline outline = {0};
outline.numberOfContours = READ_BE16_MOVE(glyph_start);
outline.xMin = READ_BE16_MOVE(glyph_start);
outline.yMin = READ_BE16_MOVE(glyph_start);
outline.xMax = READ_BE16_MOVE(glyph_start);
outline.yMax = READ_BE16_MOVE(glyph_start);
outline.endPtsOfContours = (u16*) calloc(1, outline.numberOfContours*sizeof(u16));
for(int i = 0; i < outline.numberOfContours; ++i) {
outline.endPtsOfContours[i] = READ_BE16_MOVE(glyph_start);
}
outline.instructionLength = READ_BE16_MOVE(glyph_start);
outline.instructions = (u8*)calloc(1, outline.instructionLength);
memcpy(outline.instructions, glyph_start, outline.instructionLength);
P_MOVE(glyph_start, outline.instructionLength);
int last_index = outline.endPtsOfContours[outline.numberOfContours-1];
outline.flags = (glyph_flag*) calloc(1, last_index + 1);
for(int i = 0; i < (last_index + 1); ++i) {
outline.flags[i].flag = *glyph_start;
glyph_start++;
if(outline.flags[i].repeat) {
u8 repeat_count = *glyph_start;
while(repeat_count-- > 0) {
i++;
outline.flags[i] = outline.flags[i-1];
}
glyph_start++;
}
}
outline.xCoordinates = (i16*) calloc(1, (last_index+1)*2);
i16 prev_coordinate = 0;
i16 current_coordinate = 0;
for(int i = 0; i < (last_index+1); ++i) {
int flag_combined = outline.flags[i].x_short << 1 | outline.flags[i].x_short_pos;
switch(flag_combined) {
case 0: {
current_coordinate = READ_BE16_MOVE(glyph_start);
} break;
case 1: { current_coordinate = 0; }break;
case 2: { current_coordinate = (*glyph_start++)*-1; }break;
case 3: { current_coordinate = (*glyph_start++); } break;
}
outline.xCoordinates[i] = current_coordinate + prev_coordinate;
prev_coordinate = outline.xCoordinates[i];
}
outline.yCoordinates = (i16*) calloc(1, (last_index+1)*2);
current_coordinate = 0;
prev_coordinate = 0;
for(int i = 0; i < (last_index+1); ++i) {
int flag_combined = outline.flags[i].y_short << 1 | outline.flags[i].y_short_pos;
switch(flag_combined) {
case 0: {
current_coordinate = READ_BE16_MOVE(glyph_start);
} break;
case 1: { current_coordinate = 0; }break;
case 2: { current_coordinate = (*glyph_start++)*-1; }break;
case 3: { current_coordinate = (*glyph_start++); } break;
}
outline.yCoordinates[i] = current_coordinate + prev_coordinate;
prev_coordinate = outline.yCoordinates[i];
}
return outline;
}
void print_glyph_outline(glyph_outline *outline) {
printf("#contours\t(xMin,yMin)\t(xMax,yMax)\tinst_length\n");
printf("%9d\t(%d,%d)\t\t(%d,%d)\t%d\n", outline->numberOfContours,
outline->xMin, outline->yMin,
outline->xMax, outline->yMax,
outline->instructionLength);
printf("#)\t( x , y )\n");
int last_index = outline->endPtsOfContours[outline->numberOfContours-1];
for(int i = 0; i <= last_index; ++i) {
printf("%d)\t(%5d,%5d)\n", i, outline->xCoordinates[i], outline->yCoordinates[i]);
}
}
int main(int argc, char** argv) {
int file_size = 0;
char* file = read_file("envy.ttf", &file_size);
char* mem_ptr = file;
font_directory ft = {0};
read_font_directory(file, &mem_ptr, &ft);
glyph_outline A = get_glyph_outline(&ft, get_glyph_index(&ft, 'A'));
print_glyph_outline(&A);
return 0;
}