Posted Nov 10th, 2023.
Hello world! This is the first so-called "devlog", i.e. a record of the work and progress I make each week. Ideally, I want to use these posts to go deep into interesting problems I encountered that week, or attempt to explain something I learned recently.
For a bit of context, I recently got a bit more time to work on open-source development work, and frankly, I really miss posting weekly company changelogs. They helped frame each week, and was a good way to close out a week on a high note. This is my attempt to write them on a personal level.
This week, I did two things:
One thing you'll probably notice is that both of these projects involve porting something from Go/C/C++ to Zig (and yes, LevelDB is particularly tedious). There are two reasons for the re-writes, first, it gives me a chance to understand how these dependencies work under the hood, and secondly, re-writing means I can make some use-case-specific optimizations (and more importantly, simplifications).
I'm a really big fan of Perkeep (Brad Fitzpatrick's "side project"), and in an attempt to build my own version of it, I need to be able to chunk files properly. Essentially, in plain terms, this means being able to split up a larger file into smaller "chunks" (and by re-assembling the chunks, you can re-assemble the file). There's a bit more to this (lots of cleverness when picking where and how to chunk files), but that's the basic jist!
Storing files as chunks has a number of benefits:
Building the Zig version of this was relatively easy, and I was able to use some of Zig's comptime
features to do things pretty elegantly. A good example of this? Constructing the lookup table of polynomials for use within the core chunking loop.
Here's what it looked like in C:
static bool tables_initialized = false;
static uint64_t mod_table[256];
static uint64_t out_table[256];
// ...
if (!tables_initialized) {
calc_tables();
tables_initialized = true;
}
Go:
type tables struct {
out [256]Pol
mod [256]Pol
}
// cache precomputed tables, these are read-only anyway
var cache struct {
entries map[Pol]*tables
sync.Mutex
}
func init() {
cache.entries = make(map[Pol]*tables)
}
// ...
func (c *Chunker) fillTables() {
// ...
}
Both the Go/C version involve using global variables and computing the lookup tables at runtime (usually during initialization or the first time certain functions are called). In Zig, this is done entirely at compile time:
comptime var mod_table: [256]u64 = undefined;
comptime var out_table: [256]u64 = undefined;
for (0..256) |b| {
var hash: u64 = 0;
hash = appendByte(hash, @as(u8, @intCast(b)), polynomial);
for (0..window_size - 1) |_| {
hash = appendByte(hash, 0, polynomial);
}
out_table[b] = hash;
const b_u64 = @as(u64, @intCast(b));
mod_table[b] = mod(b_u64 << poly_deg, polynomial) | (b_u64 << poly_deg);
}
No global variables, no runtime cost, and no allocations. The way God intended.
After the chunker was finished, I decided it was time to start working on a pure Zig version of LevelDB. For a bit of background, LevelDB is a (relatively well-known) on-disk key/value store, which keeps a map between arbitrary keys and values (strings) in sorted order.
Now wait, LevelDB has a C library, right? And didn't I read somewhere that Zig has great C support? Couldn't you just compile LevelDB itself and link your Zig code to it? Wouldn't that be so much better than re-writing it? It'd probably only be like ~500 lines right?
Yep, yep, yeah, and yes-ish, 1287 lines (done it before). You're forgetting though that I'm a) unemployed, b) a bit of a perfectionist, and c) have a big enough ego to think that I can do a better job than Jeff Dean and Sanjay Ghemawat.
A little note on that last point: LevelDB is a legendary piece of software, built by arguably two of the best engineers in the world. My shitty version that I built in two weeks will not be anywhere close to the quality or elegance of the original version, nor have anywhere near the impact. It will however serve as a good learning opportunity, and a chance to understand how the (somewhat) idiomatic Zig implementation of LevelDB differs from the original C++ codebase.
A few differences are already emerging, such as:
(Quick note, these changes have made the implementation worse in a general sense, but considerably better for my specific use case. Writing it from scratch means I can make these sorts of design changes.)
LevelDB is one of, if not my favorite piece of software in existence. It's simple, elegant and works amazingly well for what it's built for. Better yet, its codebase is chock-full of little design decisions that, when understood, helps the reader gain a unique appreciation for both Jeff and Sanjay as (world-class) engineers. This is my attempt at explaining one of those designs that made me go, "woah, these guys really knew what they were doing"!
A bit of background knowledge, by default, keys in LevelDB are ordered "lexographically", which is simply a fancy way of saying "the bytes are in order". Few examples, "a" is before "b" ("a" = [61], "b" = [62], 61 < 62), and "ab" is before "abc" ("ab" = [61, 62], "abc" = [61, 62, 63], same prefix, "ab" shorter). You can read more here.
By leveraging the fact that keys are stored in sorted order, LevelDB can do some fantastically brilliant stuff, specifically, snapshots (seeing the database contents as it was in a previous point in time). To do this, they rely on a type called InternalKey
, which is a small wrapper around a userkey
passed in by the user.
pub const InternalKey = struct {
user_key: []const u8,
trailer: packed struct {
sequence_number: u56,
value_type: u8, // Either "Value" (1) or "Deletion" (0)
},
};
Essentially, LevelDB stores keys as-is, with an additional 8 bytes tacked on the end. Why do it this way? Well, let's see what this looks like with real keys and values.
Inserting the following internal keys (sorted order, increasing sequence numbers):
const testingMemEntries = [_]struct {
k: []const u8,
v: []const u8,
vt: format.ValueType,
}{
.{ .k = "deleted", .v = "scaryvalue", .vt = .Value },
.{ .k = "deleted", .v = "", .vt = .Deletion },
.{ .k = "deleted", .v = "boo", .vt = .Value },
.{ .k = "deleted", .v = "", .vt = .Deletion },
.{ .k = "foo", .v = "bar", .vt = .Value },
.{ .k = "foo2", .v = "bar1", .vt = .Value },
.{ .k = "faz", .v = "", .vt = .Deletion },
.{ .k = "faz", .v = "baz", .vt = .Value },
.{ .k = "foo2", .v = "bar2", .vt = .Value },
};
Result:
> 'deleted' -> '' (4; Deletion)
> 'deleted' -> 'boo' (3; Value)
> 'deleted' -> '' (2; Deletion)
> 'deleted' -> 'scaryvalue' (1; Value)
> 'faz' -> 'baz' (8; Value)
> 'faz' -> '' (7; Deletion)
> 'foo' -> 'bar' (5; Value)
> 'foo2' -> 'bar2' (9; Value)
> 'foo2' -> 'bar1' (6; Value)
Most important observation here is that by ordering the keys with this scheme (and some custom comparator magic), the resultant ordering groups all entries with the same key together, yet has sequence numbers in descending order.
When reading a value from the database, the caller passes in both a key and a sequence number (kind of). Let's consider the key "deleted", at varying sequence numbers.
If the user passes in sequence number 5 to the database, the database can use binary search to find the first key greater-than-or-equal (in terms of lexographic-ish ordering) to <"deleted"><4> which yields the following list of keys:
> 'deleted' -> '' (4; Deletion)
> 'deleted' -> 'boo' (3; Value)
> 'deleted' -> '' (2; Deletion)
> 'deleted' -> 'scaryvalue' (1; Value)
In this case, it sees the "Deletion" as the first entry, and thus reports to the user that the key wasn't found.
Now, let's consider what happens if we query the DB for the same key, but a different sequence number, say 3. In this case, DB does the same thing, but due to the ordering, skips past the first entry, thus, the database sees:
> 'deleted' -> 'boo' (3; Value)
> 'deleted' -> '' (2; Deletion)
> 'deleted' -> 'scaryvalue' (1; Value)
With a sequence number of 3, the database now sees the "Value" entry as the first one, and returns that as a valid value for the user's key.
The key idea: By increasing the sequence number by one each time you mutate the database (i.e. delete a key, or write a new value for a key), the caller can pass in any sequence number to see the database at "that point in time", i.e. any writes with a larger sequence number will be ignored.
To be entirely honest, you can make a LevelDB comformant database implementation in like probably <1.5k lines of code by keeping a sorted array of these entries and keeping everything in memory. The complexity of LevelDB only comes in when you have to deal with the filesystem (i.e. storing files of these ordered key/value pairs) and accessing them efficiently.
I'm not really sure how to end this, but I hope this was interesting and that you perhaps even learned something cool! The last week has been really fun, and I'm really looking forward to finishing up the LevelDB implementation next week and starting work on the next piece of the puzzle (likely some non-cringe AI stuff I think).
Until then, would love to hear your notes and thoughts on this week, and on the format/structure of these devlogs! Can reach out on Twitter or via email.
f8dcea21b9224f130809f849657bf534aae2f00f
19 files changed, 5790 insertions(+)