01 - Chunks of Bytecode

Aug 18, 2025

- - Getting Started
  - Chunks of Instructions
    - The Allocator Interface
    - Errors, Try and Catch
  - Disassembling Chunks
    - Io.Writer
    - Strings and Slices
    - File.Writer, Io.Writer and Buffered IO
  - Constants and Values
    - The Constant Pool
  - Constant Instructions
  - Line Information
    - Disassembling Line Information
  - Organizing Files
    - Files are Structs

In the first chapter, We’ll start by defining the structure of our bytecode, which are the basic instructions we’ll compile source code into and then feed to the virtual machine to be executed. We’ll also write a debugging utility to display bytecode in a human-readable format.

Before continuing, read the start of Chunks of Bytecode, and keep the book handy so you understand what we’re writing.

Getting Started

Now it’s time to write some Zig! Be warned that progress will be slow in this first section as I take my time explaining key Zig features and syntax. Once that’s out of the way, you’ll be comfortable writing Zig and we can speed things up.

Let’s first inspect a “hello, world!” program:

const std = @import("std");

pub fn main() !void {
    std.debug.print("Hello, {s}!\n", .{"Interpreters"});
}

First we import the standard library. Zig’s standard library is much fuller than C’s and includes common utilities like a logging interface, dynamic arrays and much more. Zig’s imports are also namespaced, which means we can access the standard library through std. whereas C only has one global namespace.

The function signature also carries some important information. First, functions must be marked pub for public if they are used outside the same file. The return type !void can be read as “any error or void”. The exclamation mark (!) is used to denote that an error may be returned.

When an error is returned from main, it will be printed and the program will terminate with a non-zero exit code to indicate failure. This is something I’ll rely heavily on throughout this series, but I encourage you to think about how you can handle failure more elegantly wherever possible. We’ll learn about Zig’s interesting error system soon.

Inside main, we call std.debug.print, which prints formatted text a la printf to the standard error stream. The “{s}” is a placeholder for a string value which is provided in the second argument.

Use zig build run to make sure everything’s correct:

$ zig build run
Hello, Interpreters!

We won’t be needing an equivalent to common.h. Zig’s standard library has everything we’ll need.

Chunks of Instructions

The first thing to do is write the structure to store our bytecode, called a “chunk”. We need 3 basic functions:

Initialize an empty chunkS.
Add a bytecode instruction.
Free the chunk’s memory after use.

This is a perfect use case for a dynamic/growable array. I recommend reading through this part of the book to see how a dynamic array is implemented as it’s very much worth understanding how this structure works. Thankfully, we’re spared from writing it ourselves as Zig’s standard library has done the work for us in the form of std.ArrayList and its relatives.

Let’s wrap an arraylist inside a Zig struct to store our bytecode.

// Add after main():

const Chunk = struct {
    code: std.ArrayList(u8),

    pub const OpCode = enum(u8) {
        @"return",
    };
};

Zig type declarations take the same form as imports with const Name = type; Our struct has one field: code of type std.ArrayList(u8). ArrayList is a generic type, meaning you can specify the type of the elements it stores. We’re using u8 (an unsigned 8-bit integer, aka a byte) as the element type rather than OpCode because as we’ll see soon, the code stream can contain bytes other than opcodes.

OpCode has been declared inside Chunk, meaning it will be referred to as Chunk.OpCode. For now we have just one instruction: return. Because return is a reserved keyword in Zig, we need to use the @"" syntax so Zig reads it as an identifier.

For initializing a chunk, we’ll use a constant value rather than a function. We’ll call this value empty for an empty chunk.

    // add inside Chunk after OpCode:

    pub const empty = Chunk{
        .code = .empty,
    };

ArrayList uses the same pattern to initialize an empty arraylist, so we can use the .empty value (shorthand for ArrayListUmanaged.empty) as the starting value for an empty chunk.

Adding instructions and freeing the chunk both involve memory management, so it’s time to cover one of Zig’s more prominent features.

The Allocator Interface

In C, managing memory is quite straightforward. You are given malloc() to obtain a block of memory and free() to give it back once you’re done with it. How malloc() and free() actually obtain and manage these blocks is not our concern. This is very straightforward, but presents 2 problems:

The user has no choice over the memory management strategy which malloc and free use. Some use cases may benefit from a specialized allocation strategy, but code which uses malloc/free provide no control over this.
Memory allocations are hidden, meaning libraries can allocate memory with no indication to the caller and often fail to properly handling allocation failure or allow the caller to handle it themselves.

Zig addresses these problems with the Allocator interface. This is basically a struct with a set of function pointers for allocating and freeing memory. This object must be passed to any function which performs memory allocation so that the caller can choose which allocation functions the function will use.

Functions which allocate always may fail, and these errors can be propagated to the caller or handled within the function somehow.

You’ll be seeing a lot of Allocator from here on out, so it’s important to understand why it’s there and what it’s used for.

Let’s add an alias to Allocator for our convenience.

const std = @import("std");
const Allocator = std.mem.Allocator; // add this line

With that out of the way, let’s create a function to append a byte of code to the chunk.

    // add *inside* Chunk after `empty`:

    pub fn append(chunk: *Chunk, gpa: Allocator, byte: u8) Allocator.Error!void {
        try chunk.code.append(gpa, byte);
    }

This is a member function declared inside the struct Chunk. It takes a pointer to a Chunk as the first parameter and Zig allows member functions with this signature to be called like a method: e.g chunk.deinit(gpa);, where chunk is an existing Chunk instance which is passed as a pointer in the first argument.

For now it’s just a wrapper around ArrayList‘s own append function, but we’ll add a little more functionality nearer the end of the chapter.

It also takes the Allocator parameter by the name of gpa which stands for “General Purpose Allocator”. This is shorter than “allocator” and also distinguishes it from an arena allocator, which is used differently.

ArrayList also has a .deinit() function for freeing the contents, but since the chunk will be allocating other data apart from the bytecode, it makes sense to write our own member function to free an entire Chunk.

    // Add inside Chunk after `empty`:

    // `deinit` is the commonly used name for a freeing function.
    pub fn deinit(chunk: *Chunk, gpa: Allocator) void {
        chunk.code.deinit(gpa);
    }

You may have noticed the Allocator.Error in the function signature for the append, along with the try keyword when calling the append function. This makes now a good time to talk about how error handling is done in Zig.

Errors, Try and Catch

Errors in Zig are defined in types called error sets which are similar to enums. The Allocator.Error error set contains just one error and looks like this:

// (in std.mem.Allocator)
pub const Error = error {
    OutOfMemory,
};

Types which look like Error!SomethingElse are called error unions, and either contain an error value in the Error set, or a payload value of type SomethingElse.

Functions which return an error union can use the try keyword to unwrap the value of an error union or return it’s error in case of failure. A more flexible way to handle errors is using catch which allows you to provide an alternative expression which will be evaluated if an error union contains an error.

    // Returns in case of an error.
    const unwrapped_value = try mayFail(); 

    // Use a default value in case of an error.
    const unwrapped_value = mayFail() catch default_value; 

    // `catch` can capture the value of the error to handle different error cases.
    mayFail() catch |err| switch (err) { 
        error.Foo => return err,
        else => @panic("TODO: handle all the errors!"),
    }

While the keywords may look familiar, understand that this is not the same as try/catch in languages with exceptions. Zig does not have exceptions and errors are ordinary values.

There’s a bit more to Zig’s errors than that but this is enough to work with for now. You can always refer to the docs for more detail.

Disassembling Chunks

It’s finally time to put our new type to use in main!

pub fn main() !void {
    var debug_allocator = std.heap.DebugAllocator(.{}).init;
    defer _ = debug_allocator.deinit(); // check for memory leaks

    const gpa = debug_allocator.allocator();

    var chunk = Chunk.empty;

    // Try removing this line and running the program. Does Zig catch the leak?
    defer chunk.deinit(gpa);

    try chunk.append(gpa, @intFromEnum(Chunk.OpCode.@"return"));
}

Since we need an Allocator to allocate the array, we need to pick a concrete implementation to use. I’ve chosen the standard library’s DebugAllocator, which can detect and report memory leaks in case we forget to free things. The .allocator(); member function returns the Allocator interface object which we can then pass to our functions.

This code includes a new keyword: defer which is used both times we deinit something. Putting defer in front of a statement causes the statement to run when the scope exits. This makes freeing resources like memory or files easier to read and write as they only need to be written once just after the resource is created.

Deferred statements are executed in reverse order at the end of the scope, so the chunk will be freed before the allocator is deinitialized, which is what we want. Check out the docs on defer to learn more.

If we try zig build run now, the code will hopefully run without errors, but nothing much appears to happen. Now we want to be able to display the bytecode in a chunk somehow. This means writing the bytecode disassembler, which will print the code in a human-readable format. Let’s write a dissasembler function to pass our test chunk into.

// add after `Chunk`

pub fn disassembleChunk(
    chunk: *const Chunk,
    name: []const u8,
    writer: *Io.Writer,
) Io.Writer.Error!void {
    try writer.print("== {s} ==\n", .{name});

    var offset: usize = 0;

    while (offset < chunk.code.items.len) {
        offset = try disassembleInstruction(chunk, offset, writer);
    }
}

This simply loops over the instructions in bytecode, keeping track of the current offset with a usize - a pointer-sized integer.

This code introduces us to a few more Zig features: strings, the Io.Writer interface, and while loops. I’ll assume the while loop is obvious enough and focus on the other two.

Io.Writer

The Io.Writer interface is similar to the Allocator one, except this is for outputting bytes instead of allocating them. This means that the caller can choose how and where the output is written to, e.g stdout, stderr, a text file or even a network socket! We’ll make this choice in main but for now just add an alias for Io at the top of the file.

const std = @import("std");
const Allocator = std.mem.Allocator;
const Io = std.Io; // new line

Strings and Slices

Zig does not have a dedicated string type. Instead, strings are commonly stored in a type such as []const u8. The empty square brackets indicate a slice, which is a primitive type for an array with a runtime-known size. The const means our function can’t modify the contents of the slice and u8 is the element type, which is just a byte here.

The slice is a very useful tool which is included in many other languages. They are used in Zig in favour of C’s null-terminated char * pointers because of their better ergonomics and for being less error-prone. Read the docs to get a clearer understanding of what slices are or what they’re used for.

Our chunk.code arraylist exposes it’s elements through a slice called items. We check against this slice’s length for our while loop.

We’re using a while loop instead of a simple for because later instructions will have different sizes, so the offset may jump by more than 1 byte each iteration.

Let’s move on to writing the core disassembling logic.

// add after disassembleChunk:

pub fn disassembleInstruction(
    chunk: *const Chunk,
    offset: usize,
    writer: *Io.Writer,
) Io.Writer.Error!usize {
    try writer.print("{d:04} ", .{offset});

    const instruction: Chunk.OpCode = @enumFromInt(chunk.code.items[offset]);

    switch (instruction) {
        .@"return" => {
            try writer.print("{s}\n", .{@tagName(instruction)});
            return offset + 1;
        },
    }
}

We first print the instruction’s offset, padded to 4 characters, then use a switch statement to inspect the instruction. We only have one instruction for now so we just print it’s name and return the next offset.

Note that we don’t need a default case for the switch since Zig can tell that we’ve covered all possibilities. We also won’t be writing a simpleInstruction function like the book does.

Finally, we’ll apply our disassembler to our little test chunk in main by first creating a buffered writer to stdout and then calling disassembleChunk.

    // in `main`:

    try chunk.append(gpa, @intFromEnum(Chunk.OpCode.@"return"));

    // new code starts
    try disassembleChunk(&chunk, "test chunk", writer);

    try writer.flush(); // Don't forget to flush! ;)
    // new code ends
}

The call to disassembleChunk is straightforward but the meat of this code comes in how we create the Writer interface to stdout.

    // in `main`:

    const gpa = debug_allocator.allocator();

    // new code starts
    const stdout_file = std.fs.File.stdout(); // Grab the stdout file handle
    var buffer: [1024]u8 = undefined;  // Make an intermediate buffer
    var stdout_writer: std.fs.File.Writer = stdout_file.writer(&buffer); // Create a writer
    const writer: *Io.Writer = &stdout_writer.interface; // Obtain the writer's interface
    // new code ends

    var chunk = Chunk.empty;

Alright, there’s a lot to dig into here. Buckle up!

File.Writer, Io.Writer and Buffered IO

It’s important to draw the distinction between the File.Writer and Io.Writer. The latter is an interface similar to Allocator while the first is an implementation of that interface. File.Writer contains a field called interface which is an Io.Writer, a pointer to which can be passed to functions which write.

Creating a writer also means creating a buffer. The purpose of this buffer is to store output temporarily before it get’s passed to the writer’s underlying implementation. This is faster than writing output piecemeal as it reduces the number of calls to the implementation. Buffered IO is a common technique used in almost every language but is usually hidden behind the scenes. I’ve chosen to create a 1KiB buffer and have left it uninitialized with undefined .

The buffer is stored in the Io.Writer interface rather than the implementation which is why it is passed as a pointer, since the interface stores mutable state to track how full the buffer is.

A call to flush is required at the end to write out any left over data in the buffer.

Now zig build run should yield some satisfying output:

$ zig build run
== test chunk ==
0000 return

Constants and Values

Now that we’ve set up the machinery to create and display bytecode chunks, it’s time to add another instruction. The constant instruction will be used to load a constant value (e.g a literal number in the source code). This also means that constant values will need to be stored in a chunk somehow.

We’ll start for defining a type for values. For now this is just a 64-bit floating-point number, but this will be a more complex type when we add support for different types in the language.

// add after Chunk:

pub const Value = f64;

The Constant Pool

Constant values will be stored in a growable array (called the “constant pool”) just like our bytecode. In the book this involves rewriting a dynamic array, but we already know how to use Zig’s arraylist.

const Chunk = struct {
    code: std.ArrayList(OpCode),
    constants: std.ArrayList(Value), // new line!

    // ...

    pub const empty = Chunk{
        .code = .empty,
        .constants = .empty, // new line!
    };

We also need to free this arraylist when the chunk is freed:

    pub fn deinit(chunk: *Chunk, gpa: Allocator) void {
        chunk.code.deinit(gpa);
        chunk.constants.deinit(gpa); // new line
    }

Constants will be loaded when the instruction to load one is encountered. The next byte after this instruction will be the index of the value in the constant pool. Since we’re only using a single byte to store this index, we’re limited to 2^8 = 256 constants per chunk.

We could append directly to the constants array to add each constant value but we’ll also have to check if we have too many constants and also save the index of this new constant every time we add one. Let’s write a function inside Chunk which solves both of those problems.

    // add inside Chunk after append:
    
    const AddConstantError = Allocator.Error || error{TooManyConstants};

    // Returns error.TooManyConstants if the chunk already has 256 constants.
    pub fn addConstant(chunk: *Chunk, gpa: Allocator, value: Value) AddConstantError!u8 {
        if (chunk.constants.items.len > std.math.maxInt(u8)) { 
            return error.TooManyConstants;
        }

        // We've already made sure the length can fit in a u8.
        const new_index: u8 = @intCast(chunk.constants.items.len);

        try chunk.constants.append(gpa, value);

        return new_index;
    }

We first check to make sure the new index will fit in a single byte then we store the new index, add the value to the constant pool and return the index.

This is a good showcase of the aforementioned error system. We’ve created a new error type, which is the union between the Allocator.Error error set and a new error set containing TooManyConstants. We combine these sets together with || and use it in our function signature.

This way the caller will know why the function failed and can report the reason for the error.

Constant Instructions

We need another variant in our OpCode enum to represent a “load constant” instruction.

    pub const OpCode = enum(u8) {
        constant, // new line
        @"return",
    };

We put this new instruction to use in our test chunk.

    // inside main:

    defer chunk.deinit(gpa);

    // new code start
    const constant = try chunk.addConstant(gpa, 1.2);
    try chunk.append(gpa, @intFromEnum(Chunk.OpCode.constant));
    try chunk.append(gpa, constant);
    // new code ends

    try chunk.code.append(gpa, @intFromEnum(Chunk.OpCode.@"return"));

All that’s left is to disassemble these constant instructions. Let’s add another case inside disassembleInstruction.

    switch (instruction) {
        .@"return" => {
            try writer.print("{s}\n", .{@tagName(instruction)});
            return offset + 1;
        },
        // new case:
        .constant => {
            const constant = chunk.code.items[offset + 1];
            const value = chunk.constants.items[constant];
            try writer.print(
                "{s:<16} {d:04} '{d}'\n",
                .{ @tagName(instruction), constant, value },
            );
            return offset + 2;
        },
    }

I’ve forgone writing a printValue function. I’ll do that later when we upgrade our Value type to support more than 1 type, which will give us a chance to look at Zig’s formatting module! For now, check out the new disassembly, now twice as big!

$ zig build run
== test chunk ==
0000 return
0001 constant         0000 '1.2 '

Line Information

If we want to be able to give helpful error messages when a Lox program produces a runtime error, we need to know the line numbers associated with each instruction. The approach we use or this is very simple, just another arraylist. Each element in the lines arraylist will give the line number of the corresponding byte in code. We’ll need to add this field to Chunk.

const Chunk = struct {
    code: std.ArrayList(u8),
    lines: std.ArrayList(u32), // new line!
    constants: std.ArrayList(Value),

    // ...

    pub const empty = Chunk{
        .code = .empty,
        .lines = .empty, // new line!
        .constants = .empty,
    };

    // ...

    pub fn deinit(chunk: *Chunk, gpa: Allocator) void {
        chunk.code.deinit(gpa);
        chunk.lines.deinit(gpa); // new line!
        chunk.constants.deinit(gpa);
    }

    // ...

};

We’ll require a line number to be passed whenever a byte is written to the chunk.

   pub fn append(chunk: *Chunk, gpa: Allocator, byte: u8, line: u32) Allocator.Error!void {
       try chunk.code.append(gpa, byte);
       try chunk.lines.append(gpa, line); // new line (and parameter)!
   }

Disassembling Line Information

Ok, now we can add a line number to the instructions in our little bytecode program.

    // in main, modify 3 lines:

    try chunk.append(gpa, @intFromEnum(Chunk.OpCode.constant), 123);
    try chunk.append(gpa, constant, 123);

    try chunk.append(gpa, @intFromEnum(Chunk.OpCode.@"return"), 123);

Of course, this information isn’t useful apart from displaying it, so let’s do just that in disassembleInstruction.

    // in disassembleInstruction:

    try writer.print("{d:04} ", .{offset});

    // new code starts
    if (offset > 0 and chunk.lines.items[offset] == chunk.lines.items[offset - 1]) {
        try writer.print("   | ", .{});
    } else {
        try writer.print("{d:>4} ", .{chunk.lines.items[offset]});
    }
    // new code ends

    const instruction: Chunk.OpCode = @enumFromInt(chunk.code.items[offset]);

We’re doing a bit of fancy formatting by replacing the line number with a line if it’s the same as the line number of the previous instruction, but hopefully this code will make sense to you by now.

Try zig build run-ning and, if all is well, you’ll see a beautifully formatted disassembly.

$ zig build run
== test chunk ==
0000  123 constant         0000 '1.2'
0002    | return

That’s all the code we’re writing in this chapter! There is however one last change we should make before moving on. I’ve been adding all the code directly in main.zig, but we should separate things up before it gets too cluttered.

Organizing Files

Starting with the Value type which, while pretty unimpressive right now, will grow into a more complicated type with it’s own related functionality. Give it it’s own file at src/values.zig and cut and paste the single line.

// file: src/values.zig

pub const Value = f64;

Don’t forget to mark the declaration with pub so that we can use it from other files.

Now the Chunk struct, which will give me the opportunity to show you one last Zig feature. I promise it’s worth it!

Files are Structs

In Zig, a file is implicitly a struct. What does this mean? Well, you can imagine the top level of a .zig file as being implicitly wrapped in struct { ... }. This means that fields can be declared at the top level and the struct defined by the file can be instantiated and used as data.

This is useful for us because we can put our Chunk type into it’s own file, declaring the fields and declarations at the top level and import it to use within main. Create a new file in src/Chunk.zig and copy the contents of the Chunk struct inside the curly braces into the file.

// file: src/Chunk.zig

const Chunk = @This(); // new!

const std = @import("std");
const Allocator = std.mem.Allocator;
const values = @import("values.zig"); // new!
const Value = values.Value; // new!

code: std.ArrayList(u8),
lines: std.ArrayList(u32),
constants: std.ArrayList(Value),

pub const OpCode = enum(u8) {
    constant,
    @"return",
};

// <other declarations>

The necessary imports also have to be copied over, including a new import of Value. There’s also one additional line: const Chunk = @This(). Since we’re putting the type in it’s own file, it doesn’t have an identifier to be referred to by. We create this identifier by using the @This() builtin to capture the surrounding type, which is the file in this case.

The reason we can’t do this for Value is that it is not and won’t be a struct.

Finally, the disassembly code. We’ll move this into it’s own file called src/debug.zig. Create this file and copy the two functions along with their required imports.

// file: src/debug.zig

const std = @import("std");
const Io = std.Io;
const Chunk = @import("Chunk.zig");

// Remember to make these public with `pub`!
pub fn disassembleChunk(...) {
    // ...
}
pub fn disassembleInstruction(...) {
    // ...
}

Disassembling depends on both Io and Chunk which we import. It’s important to mark the functions pub so that they can be used from other files.

Finally, add imports to these new files in main.zig, which should now include only the main function.

const std = @import("std");
const Allocator = std.mem.Allocator;
const Io = std.Io;
const Chunk = @import("Chunk.zig"); // new line!
const debug = @import("debug.zig"); // new line!
const disassembleChunk = debug.disassembleChunk; // new line!

pub fn main() !void {
    // ...
}

Check that everything still works with zig build run and when you’re ready, move on to the next chapter where we’ll build our own tiny virtual machine to feed some bytecode to.

$ tree src/
src/
├── Chunk.zig
├── debug.zig
├── main.zig
└── values.zig

$ zig build run
== test chunk ==
0000  123 constant         0000 '1.2'
0002    | return