This post is the first in a series about DoomBF, a project, the goal of which is to run Doom (1993) in the esoteric programming language Brainfuck.

Brainfuck

... is an esoteric programming language created by Urban Müller in 1993. It's self-description is:

The brainfuck compiler knows the following instructions:
Cmd  Effect
---  ------
+    Increases element under pointer
-    Decrases element under pointer
>    Increases pointer
<    Decreases pointer
[    Starts loop, flag under pointer
]    Indicates end of loop
.    Outputs ASCII code under pointer
,    Reads char and stores ASCII under ptr

The "pointer" (hereafter called "Data Pointer" or "DP") points to an array of integer cells (the "Tape").

Intro

The goal for this post is to make a brainfuck interpreter for running and debugging programs. This task was publically posted on October 2nd 2025. It is as follows (translation):

The task: write a good interpreter for Brainfuck in С.
Must support a large tape size (a couple of gigabytes),
Preloading values onto the tape (loading data before running),
You must be able to set the size of cells before compilation 8/16/32/64 bits,
A good command interpreter, including running commands from memory (what?), and getting from by callback, running a different callback on commands . and ,.
Think about a JIT variant.

One part of the task was already completed: I thought about a JIT variant; I decided that it would be difficult, so I'll be making an interpreter instead.

I also added my own rules to the interpreter, they are:

The interpreter must follow the brainfuck standard in its default configuration, and all options that cause it to deviate must be clearly marked
On all configurations, all (succeding) brainfuck programs must behave in exactly the same way

Brainfuck Standard

The most important thing you should now about the brainfuck standard, is that it doesn't really exists. As far as I know, the only canonical definition of the brainfuck language is the original package by Urban Müller, which isn't a strict definition.

The closest thing to an actual specification is this one by Daniel B. Cristofani. The full thing is kinda long, but in short:

Cells are 8-bit or wider
The cells may or may not wrap on overflow
The tape is at least 30000 cells to the right

This is really broad, and an interpreter following the bare minimum is not nice to work with, and also not Turing complete (because of limited memory).

For this reason my interpreter follows a more common model:

Cells are 8-bits (configurable)
The cells wrap on both under- and overflow
The tape is bi-directional and infinite

Base interpreter

The brainfuck interpreter is a famously simple piece of software (that is it's point, after all). The only optimisations I've added are loop-precalculation: matching all brackets before execution, and a jump table: replacing a switch case with an array of code pointers (I think a really smart C compiler would be able to do the second one by itself).

Infinite tape

The usual way to do a large tape is dynamic memory allocation, this, however, only offers as much memory as you have RAM (which on a phone might be less than 4GB), so I decided to keep the memory allocation constant, and swap the unused memory pages out to disk.

I initially planned to make this system threaded, so I designed the system in a way where the interpreter and memory manager could run independently. I didn't actually implement threading yet, but it's planned.

The tape is split into "pages", with a configurable size. The interpreter runs on a memory region the size of 4 pages (the "Hot Tape").

Each one of those 4 memory spaces could either have the data for some page (be loaded), or have undefined data (be unloaded).

Loaded? + + +
Page n: 0 1 2 3
Pg.UID: 4 5 6
DataP:    ^

The "Page UID" is the global unique identifier for the page, while the "Page N" is the id of the slot it's loaded into (page_n = page_uid % 4). In this example the pointer is at page 0x05 (loaded into slot 1).

When the data pointer moves between pages, one page is saved, and another one is loaded

store: 0x4
load: 0x7

Loaded?   + + +
Page n: 0 1 2 3
Pg.UID:   5 6 7
DataP:    ->^

The Data Pointer also wraps around on the boundaries of the Hot Tape. The actual Data Pointer value doesn't, but when accessing memory, it does it modulo hot-tape-size

This allows the execution to continue immediately after a page switch, even while the pages are being swapped; unless the pointer moves more than 1 page, while a page is being loaded. This can happen in two ways:

DP moves more than 1 page in a single instruction. This is actually a problem, since I (later) added "instruction rolling"; it is avoided by making the page size bigger than the rolling limit.
DP moves more than 1 pages in multiple instructions, while another thread is blocked on the disk. This is an unsolved problem, but it is avoided by the fact that I don't have threads yet.

With that solved, the page loading becomes easy. The only problem is that it takes ages to copy values between pages ([>+<-]; a really common construction in brainfuck), since the interpreter loads a new page every loop iteration. This is solved later, when I add advanced rolling.

BrainFuck Macros

This was a weird phase, where the interpreter had two modes: brainfuck and BrainFuck Macros, chosen at compiletime. The second one is for running files in my custom format, created by an external utility, which were just a run-length-encoded version on brainfuck (each instruction had a count after it).

Later, I ended up putting the preprocessing step inside the interpreter, which is much better.

Debugger

This is the main feature of my interpreter. It consists of a simple CLI interface for stepping through a program. I don't any fancy hooking, like real debuggers have to, I just call the debugger on every instruction, and it then decides to either continue execution, or display the command prompt.

Over time I added a couple of features:

Loop exiting -- a simple state machine that runs until the next ], and then runs until the instruction after it
Addressmaps (pinning a portion of the tape to the hotbar) -- a new printing stage in the debugger, with some compilated logic for displaying strings and unloaded variables
Sourcemaps -- a new data type, with a cursed state machine for determining which comments belong to which instructions, that even I don't understand

New instruction set

A big thing that annoyed me about the interpreter was the fact that, while in + and - instructions the argument was stored right next to it, on loop instructions ([ and ]) the argument was stored in a separate loops list; so I decided to remake the instruction set.

Each instruction is now variable size: from 1 to 8 bytes (yes, my interpreter won't run on 32-bit machines). The first byte is always the instruction name, and all others are dependent on the instruction.

From the industrial-bf ReadMe:

+-------------------------------+
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
|===============================|
| + |VAL|   .   .   .   .   .   . Add VAL + 1 to the current cell
| - |VAL|   .   .   .   .   .   . Subtract VAL + 1 from the current cell
| < |VAL|   .   .   .   .   .   . Move VAL + 1 cells to the left
| > |VAL|   .   .   .   .   .   . Move VAL + 1 cells to the right
| / |VAL|   .   .   .   .   .   . Divide the current cell by VAL. Go into an infinite loop if it is not divisible.
| \ |VAL|   .   .   .   .   .   . The same as /, but divides -(current cell)
| ^ |OFFSET |VAL|   .   .   .   . Store (current cell)*I_VAL at offset I_OFFSET (relative to current data pointer)
| [ | END INDEX                 | Start a loop (END INDEX indicates the address of the matching ])
| ] | BEGINNING INDEX           | End a loop
| ! | EXPECTED VALUE            | Terminate if (current cell) is not EXPECTED VALUE
| @ | EXPECTED VALUE            | Terminate if the data pointer is not EXPECTED VALUE
| 0 |   .   .   .   .   .   .   . Set the current cell to zero
| . |   .   .   .   .   .   .   . Output the current cell
| , |   .   .   .   .   .   .   . Read a byte into the current cell. Do not change in EOF
| # |   .   .   .   .   .   .   . Breakpoint (see Debugger)
|\0 |   .   .   .   .   .   .   . Terminate the program (only appears at the end)
+-------------------------------+

(You might notice some non-brainfuck instructions here, I'll explain them later)

Unfortunately this brings the maximum size of a program from 16384 PiB (2^64^ bytes) to just 256 GiB (2^48^ bytes), which I think is still plenty.

Advanced rolling

This is when I added a system to roll common brainfuck constructs: [+>-<] and [-] into a single instruction.

Move instruction (^) -- at every [ in the source it checks if the opened loop is "balanced": only contains >, <, + and -, with the amounts of < and > being equal. If it is, calculates the required sequence of ^ commands (terminated by 0). A / or \ is added if the origin cell (the cell the loop starts on) is changed more than once (like in [-->+<]). If a loop like that is entered, while the origin cell is not divisible by that amount, it'll enter an infinite loop, so the instructions reflect that.

Zero instruction (0) -- at every [, checks if the loop is exactly [-] or [+]. If it is, inserts a 0 command.

Preloading data onto the tape

One of the requirements for the interpreter was the ability to preload data onto the tape. I added this by simply making the interpreter load pages 0, 1 and -1 from disk upon startup. I also made a small utility that splits an arbitrary file into the required page files, and names them appropriately.

I probably don't need to load the -1 page, since a file that touches it has to be 16384 PiB (2^64^ bytes), but still left that in, in case somebody needs to load data there, and uses another tool to create the pagefiles.

Windows issues

Byte order

All of the computers I have are little-endian, meaning that the least significant byte of a number (the 3 in 123) is at the beginning in memory. This causes issues for my program, since when storing an instruction using a number, like 0x5babcdef (begin a loop that ends at 0xabcdef), it gets stored as 0xef 0xcd 0xab 0x5b, and the command-byte (0x5b) always has to be the first byte.

I could've implemented a smarter way to store commands, that stores the command-byte separately, keeping all numbers in their host byte order; but I decided it would be simpler to just store everything in big-endian and convert it when executing.

This conversion is done by the hton family or functions (where htonl and ntohl, by definition, always do the exact same thing, but, strangely, are two separate functions). These functions are from <arpa/inet.h> (because byte-order conversions are usually done when doing networking). This header file does not exist on Windows.

Unfortunately these functions are the only way I know of to detect byte order (without just writing a multi-byte number, which would kill all compiler optimizations), so I just assume that all windows hosts are little-endian, and that's it.

Integer length

Linux thinks that a long should be 64-bit (on 64-bit machines), while Windows (I assume for backwards compatibility) thinks it's 32-bit. This forced me to replace all uses of long with uint64_t; and while I was at it, I also replaced all other integer types, except char, by using #pragma GCC poison to check that I haven't missed anything.

Benchmarks

I ran the benchmarks on the standard mandelbot.b. Here are the results:

qdb     6.3s
ibf     3.8s
tritium 1.6s
bf-li   0.3s

It's nice to see that my optimization work has paid off, and I'm about 40% faster than qdb (my go-to reference interpreter). I'm still overtaken by tritium (a really cool fast interpreter), but the winner is bf-li.

bf-li isn't really in the same category as the other three, because it's actually a JIT compiler, but still, I've seen the power that JIT brings, so I will add it to my own interpreter... In the next post.