Vertex Buffers, Strides, Offsets and API Design

The stride of a vertex buffer is the amount of bytes added to a vertex's attribute location to get to the next vertex's attribute location. Encoding this in a graphics API is a straight forward endeavour, you just specify this stride in some form.

Vulkan defines vertex inputs via two structs, VkVertexInputBindingDescription and VkVertexInputAttributeDescription. BindingDescription defines a Vertex Buffer's location/slot and its stride, also defines the rate i.e is it per vertex or per instance. AttributeDescription defines the layout inside a VertexBuffer, i.e it defines the attributes, their type, and their offset, and which Vertex Buffer slot they belong to.

Metal defines this via MTLVertxDescriptor, this class has two properties you need to fill out, layouts and attributes. layouts defines essentially the same thing as BindingDescription, i.e it defines a Vertex Buffers stride, stepFunction (whether its per vertex or per instance*), stepRate* attributes defines essentially the same thing as AttributeDescription, i.e the attribute formats inside the VertexBuffer, the format of the attribute, its offset, and which Vertex Buffer slot it comes from.

The above are sufficient to define any shape of a vertex buffer input we want however, I believe two structs is a bit of an overkill, so I'm hoping to reduce this back down to 1 structure.


API Design

The main information that is needed by the vertex input is as follows:

  1. The vertex buffers and how many of them there are
  2. The number of attributes inside each vertex buffer
  3. The format/types of the attributes
  4. The rate of the vertex buffer, i.e per vertex or per instance
  5. The offsets of the attributes inside the vertex buffer
  6. The stride or essentially how many bytes should we skip to get to the next vertex's attributes

It is possible to encode the first 4 without introducing a second struct and do automatic stride calculation. However 5, and 6 will introduce ambiguity.

I will introduce the struct sh_vertex_input_t as the carrier of our information:

struct sh_vertex_input_t {
    char *name; // For debugging
    sh_vertex_format_e type;
    b8 separate : 1; // a boolean, unsigned byte, we only need one bit
} 

This type I can add to my pipeline struct to encode the vertex inputs.

First I will use two separate fields for per vertex and per instance fields, My pipeline type has a vertex_inputs field for per vertex attributes, and has instance_inputs for instance attributes. This is what my pipeline will look like:

struct sh_vertex_input_array_t {
    sh_vertex_input_t *data;
    u32 count;
};

struct sh_pipeline_t {
    ...
    sh_vertex_input_array_t vertex_inputs;
    sh_vertex_input_array_t instance_inputs;
    ...
} 

The reason I have introduced the sh_vertex_input_array_t is that we can statically create pipelines like this with some macro help:

sh_pipeline_t pipeline = {
    .vertex_inputs = sh_array_input( (sh_vertex_input_t[]) {
        { .name = "position", .type = SH_XYZW32_FLOAT },
        { .name = "uv"      , .type = SH_XY32_FLOAT   },
        ...,
    })
} 

This is much cleaner than two structs.

So we have with the above encoded, 2, 3, 4 easily.

To encode 1 and the number of vertex buffers, we will use the separate field. Every vertex attribute that has the separate set to true will introduce a new buffer.

.vertex_inputs = sh_array_input( (sh_vertex_input_t[]) {
    { .name = "position", .type = SH_XYZW32_FLOAT, .separate = true },
    { .name = "uv"      , .type = SH_XY32_FLOAT                     },
    { .name = "normal"  , .type = SH_XYZ32_FLOAT                    },
    ...,
}) 

This essentially translates to: Vertex Buffer 0 will have the position data only, Vertex Buffer 1 will have uv and normal.
When we loop over our attributes and we see a separate = true we will create a new buffer and essentially all the subsequent attributes go into that buffer.

You can change the .separate into .slot and explicitly set the vertex buffer location and this might be a better approach, but this has drawbacks in terms of verbosity.
With the separate field, if we set it to true it begins a new buffer while false (the default zero initialized value) will continue the buffer.
Using the .slot and writing the buffer explicitly will force us to either specify the .slot for none or all the attributes because if we skip specifying it for an attribute it will zero initialize to 0 which will be ambiguous in terms of do we want to continue on the same buffer or put this in buffer 0 This might not be an issue for C++ as it can specify a default value other than 0 in which case the .slot design might be more viable.

So far with separate included, we have encoded 1, 2, 3, 4 without a second struct

Stride and Offset

If we exclude the idea of offsets then we have enough information to calculate 6 by simply summing up the sizes of the types of the attributes and setting this to be the stride. This would work beautifully and cleanly.
We can take this a bit further by introducing another boolean called unused this allows us to specify the stride without increasing attributes. Here is an example where this might be useful:

struct sh_model_vertex_t {
    vec4 position;
    vec3 normal;
    vec2 uv;
    vec4 color;
};

sh_buffer_t model_vertex_data = {
    ...
    .size = sizeof(sh_model_vertex_t)*<number_of_vertices>
    ...
};

sh_pipeline_t model_render_pipeline = {
    ...
    .vertex_inputs = sh_array_input( (sh_vertex_input_t[]) {
        { .name = "position", .type = SH_XYZW32_FLOAT },
        { .name = "normal"  , .type = SH_XYZ32_FLOAT  },
        { .name = "uv"      , .type = SH_XY32_FLOAT   },
        { .name = "color"   , .type = SH_XYZW32_FLOAT },
    })
    ...
};

sh_pipeline_t shadow_pass = {
    ...
    .vertex_inputs = sh_array_input( (sh_vertex_input_t[]) {
        { .name = "position", .type = SH_XYZW32_FLOAT },
        { .name = "normal"  , .type = SH_XYZ32_FLOAT  , .unused = sh_true },
        { .name = "uv"      , .type = SH_XY32_FLOAT   , .unused = sh_true },
        { .name = "color"   , .type = SH_XYZW32_FLOAT , .unused = sh_true },
    })
    ...
};
 

We define two pipelines, one for the model rendering and another for shadow where we only ever use the position. At a glance we can infer what the shadow pass does with our data, and which attributes it uses, however its not nice to have to type out 3 unused attributes just for a stride calculation.
If we skip providing these unused attributes we are left with 3 potential avenues:

  1. We reconfigure our data to separate position into one buffer and move the rest of the attributes into another buffer.

    • For vertex position data this is actually a recommended approach on mobile devices by both Google and ARM, desktop GPUs might not benefit as much.
  2. We have reached the limit of only ever using 1 struct, we might need to go back to using 2 structs, however in our case the 2nd struct will only ever have 1 field called stride because we have encoded the other bits of information already.

  3. We encode the stride on the Vertex Buffer object itself. This actually makes sense and some console platforms do this. This will make our buffer looks like this:

sh_buffer_t model_vertex_data = {
   ...
   .size = sizeof(sh_model_vertex_t)*<number_of_vertices>,
   .stride = sizeof(sh_model_vertex_t)
   ...
}; 

This actually is perfectly valid and a nice approach, you can even go further and replace size with element_count to drive home the idea that the stride + element_count define the size and the size isn't directly specified.

The third approach would be the nicest, however in order for this to work in terms of the API you have two options:

  1. Pass the pipeline creation the vertex buffer used but this couples the pipeline with the vertex buffer.
    Doing this will put you in a rough situation in terms of API design because it would look ugly and can cause easy gotchas.

  2. Use dynamic stride binding in the platform graphics API. Unfortunately on Vulkan this requires 1.3 and on metal it requires iOS 17.0+/macOS 14.0+ which will introduce limitations on what devices you can support. iOS 17 was released in 2023 for example.

Introducing offset will add ambiguity. Much like .slot variable, the field offset be ambiguous if we depend on zero initialization, i.e whether we mean set this attribute to offset 0 or please continue from the last position:

.vertex_inputs = sh_array_input( (sh_vertex_input_t[]) {
    { .name = "position", .type = SH_XYZW32_FLOAT , .offset = 12},
    { .name = "normal"  , .type = SH_XYZ32_FLOAT  }, // ambiguous, do we want offset to be at 0 or do we want to continue at 12?
    { .name = "uv"      , .type = SH_XY32_FLOAT   },
    { .name = "color"   , .type = SH_XYZW32_FLOAT },
}) 

The introduction of offset leaves us 4 possibilities if we want to infer what the client wants to do. These possibilities come from trying to pattern match/detect. We only need to consider two consecutive attributes:

  1. .offset = 0 followed by .offset = <non-zero> : Not ambiguous, first attribute is set to offset zero and the attribute after is set to an offset explicitly.

  2. .offset = 0 followed by .offset = 0 : Not ambiguous, first attribute is set to offset zero and the attribute after we can assume follows, i.e offset of second attribute is just size of the type of the first attribute

  3. .offset = <non-zero> followed by .offset = 0 : Ambiguous, do we want to set the offset to zero? or do we want to follow the first offset?

  4. .offset = <non-zero> followed by .offset = <non-zero> : Not ambiguous, offsets for both attributes set explicitly

Of those 4 possibilities only 3 is ambiguous. We can introduce a special value SH_ATTRIBUTE_OFFSET_CONTINUE set it to be -1 will break the ambiguity but this also means we have to explicitly continue. Hence more typing.
Unfortunately there is no clear fix for this, C++'s default field value might come in handy here but to have designated struct initializer would require C++20.

Even if we didn't have the ambiguity of the offset itself, introducing offsets will break the automatic stride calculation. We can rely back on the unused field to fully encode the offset and stride in one go:

.vertex_inputs = sh_array_input( (sh_vertex_input_t[]) {
    { .name = "pad"     , .type = SH_XYZW32_FLOAT , .unused = true}, // offset 0
    { .name = "pad"     , .type = SH_XYZW32_FLOAT , .unused = true}, // offset 4
    { .name = "position", .type = SH_XYZW32_FLOAT },                 // offset 8
    { .name = "normal"  , .type = SH_XYZ32_FLOAT  },                 // offset 12
    { .name = "pad"     , .type = SH_XYZ32_FLOAT  , .unused = true}, // offset 15
    { .name = "uv"      , .type = SH_XY32_FLOAT   },                 // offset 18
    { .name = "color"   , .type = SH_XYZW32_FLOAT },                 // offset 20
}) 

of course all of this is very verbose and not very useful in terms of encoding the information.

I do not have a solution to the offset ambiguity, however I can bite the bullet and move to vulkan 1.3 to fix the stride encoding and make it dynamic.

Typical Vertex Buffers

Depending on who you ask and platform people, you have a few choices when it comes to vertex data layout specification: Lets assume V => Vertex Position , N => Vertex Normal, C => Vertex Color

  1. [VNC,VNC,VNC]: One vertex buffer, interleaved vertex attributes. The design without offset and dynamic stride can specify this.

  2. [VVV], [NC, NC]: One vertex buffer for position, one for interleaved everything else. The design so far can handle this too.

  3. [VVVNNNCCC]: One vertex buffer, non-interleaved. The design cannot handle it without specifying all offsets and dynamic stride.

  4. [VVV][NNN][CCC]: One vertex buffer per attribute. The design can handle this

If I forgo allowing the 3rd choice and have dynamic strides we can cover a lot of bases cleanly.

Conclusion

I think the best approach for now is to move to Vulkan 1.3 and latest metal and disallow option 3 and warn when the offsets of two attributes is zero.