The stride of a vertex buffer is the amount of bytes added to a vertex's attribute location to get to the next vertex's attribute location. Encoding this in a graphics API is a straight forward endeavour, you just specify this stride in some form.
Vulkan defines vertex inputs via two structs, VkVertexInputBindingDescription
and VkVertexInputAttributeDescription
.
BindingDescription
defines a Vertex Buffer
's location/slot and its stride, also defines the rate
i.e is it per vertex or per instance.
AttributeDescription
defines the layout
inside a VertexBuffer
, i.e it defines the attributes, their type, and their offset, and which Vertex Buffer
slot they belong to.
Metal defines this via MTLVertxDescriptor
, this class has two properties you need to fill out, layouts
and attributes
.
layouts
defines essentially the same thing as BindingDescription
, i.e it defines a Vertex Buffers
stride, stepFunction (whether its per vertex or per instance*), stepRate*
attributes
defines essentially the same thing as AttributeDescription
, i.e the attribute formats inside the VertexBuffer
, the format of the attribute, its offset, and which Vertex Buffer
slot it comes from.
The above are sufficient to define any shape of a vertex buffer input we want however, I believe two structs is a bit of an overkill, so I'm hoping to reduce this back down to 1
structure.
API Design
The main information that is needed by the vertex input is as follows:
- The vertex buffers and how many of them there are
- The number of attributes inside each vertex buffer
- The format/types of the attributes
- The rate of the vertex buffer, i.e per vertex or per instance
- The offsets of the attributes inside the vertex buffer
- The stride or essentially how many bytes should we skip to get to the next vertex's attributes
It is possible to encode the first 4 without introducing a second struct and do automatic stride calculation. However 5, and 6 will introduce ambiguity.
I will introduce the struct sh_vertex_input_t
as the carrier of our information:
struct sh_vertex_input_t {
char *name; // For debugging
sh_vertex_format_e type;
b8 separate : 1; // a boolean, unsigned byte, we only need one bit
}
This type I can add to my pipeline struct to encode the vertex inputs.
First I will use two separate fields for per vertex and per instance fields,
My pipeline type has a vertex_inputs
field for per vertex attributes, and has instance_inputs
for instance attributes. This is what my pipeline will look like:
struct sh_vertex_input_array_t {
sh_vertex_input_t *data;
u32 count;
};
struct sh_pipeline_t {
...
sh_vertex_input_array_t vertex_inputs;
sh_vertex_input_array_t instance_inputs;
...
}
The reason I have introduced the sh_vertex_input_array_t
is that we can statically create pipelines like this with some macro help:
sh_pipeline_t pipeline = {
.vertex_inputs = sh_array_input( (sh_vertex_input_t[]) {
{ .name = "position", .type = SH_XYZW32_FLOAT },
{ .name = "uv" , .type = SH_XY32_FLOAT },
...,
})
}
This is much cleaner than two structs.
So we have with the above encoded, 2, 3, 4 easily.
To encode 1 and the number of vertex buffers, we will use the separate
field. Every vertex attribute that has the separate
set to true will introduce a new buffer.
.vertex_inputs = sh_array_input( (sh_vertex_input_t[]) {
{ .name = "position", .type = SH_XYZW32_FLOAT, .separate = true },
{ .name = "uv" , .type = SH_XY32_FLOAT },
{ .name = "normal" , .type = SH_XYZ32_FLOAT },
...,
})
This essentially translates to: Vertex Buffer 0
will have the position data only, Vertex Buffer 1
will have uv and normal.
When we loop over our attributes and we see a separate = true
we will create a new buffer and essentially all the subsequent attributes go into that buffer.
You can change the .separate
into .slot
and explicitly set the vertex buffer location and this might be a better approach, but this has drawbacks in terms of verbosity.
With the separate
field, if we set it to true
it begins a new buffer while false
(the default zero initialized value) will continue the buffer.
Using the .slot
and writing the buffer explicitly will force us to either specify the .slot
for none or all the attributes because if we skip specifying it for an attribute it will zero initialize to 0 which will be ambiguous in terms of do we want to continue on the same buffer or put this in buffer 0
This might not be an issue for C++ as it can specify a default value other than 0 in which case the .slot
design might be more viable.
So far with separate
included, we have encoded 1, 2, 3, 4 without a second struct
Stride and Offset
If we exclude the idea of offsets
then we have enough information to calculate 6 by simply summing up the sizes of the types of the attributes and setting this to be the stride. This would work beautifully and cleanly.
We can take this a bit further by introducing another boolean called unused
this allows us to specify the stride without increasing attributes. Here is an example where this might be useful:
struct sh_model_vertex_t {
vec4 position;
vec3 normal;
vec2 uv;
vec4 color;
};
sh_buffer_t model_vertex_data = {
...
.size = sizeof(sh_model_vertex_t)*<number_of_vertices>
...
};
sh_pipeline_t model_render_pipeline = {
...
.vertex_inputs = sh_array_input( (sh_vertex_input_t[]) {
{ .name = "position", .type = SH_XYZW32_FLOAT },
{ .name = "normal" , .type = SH_XYZ32_FLOAT },
{ .name = "uv" , .type = SH_XY32_FLOAT },
{ .name = "color" , .type = SH_XYZW32_FLOAT },
})
...
};
sh_pipeline_t shadow_pass = {
...
.vertex_inputs = sh_array_input( (sh_vertex_input_t[]) {
{ .name = "position", .type = SH_XYZW32_FLOAT },
{ .name = "normal" , .type = SH_XYZ32_FLOAT , .unused = sh_true },
{ .name = "uv" , .type = SH_XY32_FLOAT , .unused = sh_true },
{ .name = "color" , .type = SH_XYZW32_FLOAT , .unused = sh_true },
})
...
};
We define two pipelines, one for the model rendering and another for shadow where we only ever use the position.
At a glance we can infer what the shadow pass does with our data, and which attributes it uses, however its not nice to have to type out 3 unused attributes just for a stride calculation.
If we skip providing these unused attributes we are left with 3 potential avenues:
-
We reconfigure our data to separate position into one buffer and move the rest of the attributes into another buffer.
- For vertex position data this is actually a recommended approach on mobile devices by both Google and ARM, desktop GPUs might not benefit as much.
-
We have reached the limit of only ever using 1 struct, we might need to go back to using 2 structs, however in our case the 2nd struct will only ever have 1 field called
stride
because we have encoded the other bits of information already. -
We encode the stride on the
Vertex Buffer
object itself. This actually makes sense and some console platforms do this. This will make our buffer looks like this:
sh_buffer_t model_vertex_data = {
...
.size = sizeof(sh_model_vertex_t)*<number_of_vertices>,
.stride = sizeof(sh_model_vertex_t)
...
};
This actually is perfectly valid and a nice approach, you can even go further and replace size
with element_count
to drive home the idea that the stride + element_count
define the size and the size isn't directly specified.
The third approach would be the nicest, however in order for this to work in terms of the API you have two options:
-
Pass the pipeline creation the vertex buffer used but this couples the pipeline with the vertex buffer.
Doing this will put you in a rough situation in terms of API design because it would look ugly and can cause easy gotchas. -
Use dynamic stride binding in the platform graphics API. Unfortunately on Vulkan this requires
1.3
and on metal it requiresiOS 17.0+/macOS 14.0+
which will introduce limitations on what devices you can support. iOS 17 was released in 2023 for example.
Introducing offset
will add ambiguity. Much like .slot
variable, the field offset
be ambiguous if we depend on zero initialization, i.e whether we mean set this attribute to offset 0 or please continue from the last position:
.vertex_inputs = sh_array_input( (sh_vertex_input_t[]) {
{ .name = "position", .type = SH_XYZW32_FLOAT , .offset = 12},
{ .name = "normal" , .type = SH_XYZ32_FLOAT }, // ambiguous, do we want offset to be at 0 or do we want to continue at 12?
{ .name = "uv" , .type = SH_XY32_FLOAT },
{ .name = "color" , .type = SH_XYZW32_FLOAT },
})
The introduction of offset
leaves us 4 possibilities if we want to infer what the client wants to do.
These possibilities come from trying to pattern match/detect. We only need to consider two consecutive attributes:
-
.offset = 0
followed by.offset = <non-zero>
: Not ambiguous, first attribute is set to offset zero and the attribute after is set to an offset explicitly. -
.offset = 0
followed by.offset = 0
: Not ambiguous, first attribute is set to offset zero and the attribute after we can assume follows, i.e offset of second attribute is just size of the type of the first attribute -
.offset = <non-zero>
followed by.offset = 0
: Ambiguous, do we want to set the offset to zero? or do we want to follow the first offset? -
.offset = <non-zero>
followed by.offset = <non-zero>
: Not ambiguous, offsets for both attributes set explicitly
Of those 4 possibilities only 3 is ambiguous. We can introduce a special value SH_ATTRIBUTE_OFFSET_CONTINUE
set it to be -1
will break the ambiguity but this also means we have to explicitly continue. Hence more typing.
Unfortunately there is no clear fix for this, C++'s default field value might come in handy here but to have designated struct initializer would require C++20.
Even if we didn't have the ambiguity of the offset itself, introducing offsets will break the automatic stride calculation.
We can rely back on the unused
field to fully encode the offset and stride in one go:
.vertex_inputs = sh_array_input( (sh_vertex_input_t[]) {
{ .name = "pad" , .type = SH_XYZW32_FLOAT , .unused = true}, // offset 0
{ .name = "pad" , .type = SH_XYZW32_FLOAT , .unused = true}, // offset 4
{ .name = "position", .type = SH_XYZW32_FLOAT }, // offset 8
{ .name = "normal" , .type = SH_XYZ32_FLOAT }, // offset 12
{ .name = "pad" , .type = SH_XYZ32_FLOAT , .unused = true}, // offset 15
{ .name = "uv" , .type = SH_XY32_FLOAT }, // offset 18
{ .name = "color" , .type = SH_XYZW32_FLOAT }, // offset 20
})
of course all of this is very verbose and not very useful in terms of encoding the information.
I do not have a solution to the offset ambiguity, however I can bite the bullet and move to vulkan 1.3
to fix the stride encoding and make it dynamic.
Typical Vertex Buffers
Depending on who you ask and platform people, you have a few choices when it comes to vertex data layout specification:
Lets assume V => Vertex Position
, N => Vertex Normal
, C => Vertex Color
-
[VNC,VNC,VNC]
: One vertex buffer, interleaved vertex attributes. The design without offset and dynamic stride can specify this. -
[VVV], [NC, NC]
: One vertex buffer for position, one for interleaved everything else. The design so far can handle this too. -
[VVVNNNCCC]
: One vertex buffer, non-interleaved. The design cannot handle it without specifying all offsets and dynamic stride. -
[VVV][NNN][CCC]
: One vertex buffer per attribute. The design can handle this
If I forgo allowing the 3rd choice and have dynamic strides we can cover a lot of bases cleanly.
Conclusion
I think the best approach for now is to move to Vulkan 1.3
and latest metal and disallow option 3 and warn when the offsets of two attributes is zero.