Building a Distributed Key-Value Store in Go
A technical deep-dive into building a persistent key-value store from scratch in Go covering the three-layer architecture, write-ahead logging (WAL) for crash recovery, and why I chose bbolt as the durable checkpoint engine.
Most developers reach for Redis or etcd without ever questioning what's happening underneath. I wanted to actually understand it so I am building one from scratch in Go. This post documents the architecture decisions, the tradeoffs, and what I learned along the way.
The store is built around three layers, each serving a distinct role. At the top sits an in-memory map[string]string protected by a sync.RWMutex. Every read and write touches this layer first, which means the hot path has zero disk I/O and sub-microsecond latency. The tradeoff is obvious: memory is volatile. A crash wipes it entirely.
That's where the Write-Ahead Log (WAL) comes in. Before any write mutates the in-memory map, it is first appended to a WAL file on disk. The format is intentionally minimal a 9-byte binary header encoding the operation type, key length, and value length, followed by the raw key and value bytes. Because it's append-only, writes are sequential and fast. On restart, Replay() reads the WAL from byte zero and re-applies every record in order, fully reconstructing the in-memory state. This is the same pattern Postgres uses the WAL is your safety net between crashes.
The third layer is bbolt, an embedded B-tree database that writes a single memory-mapped file to disk. Every 30 seconds, a background goroutine snapshots the in-memory map and flushes it into bbolt in a single atomic transaction. After a successful flush, the WAL is truncated its job is done, because bbolt now holds a durable checkpoint. On the next Open(), the store loads from bbolt first, then replays any WAL records written after the last checkpoint. Together they guarantee no data is ever lost.
I chose bbolt specifically because it fits the checkpoint role perfectly. It has no server process, no network overhead, and no configuration just a single file and a battle-tested B-tree. It handles its own crash recovery internally via copy-on-write pages, so I don't need to think about partial writes at the bbolt layer at all. The WAL handles the gap between flushes; bbolt handles everything else.
The most instructive bug I hit was in the WAL's Append function. The 9-byte header correctly encoded klen and vlen, but the key bytes were never actually written to disk only the value was. Replay() would read klen bytes expecting the key and instead get the first bytes of the value, producing silent garbage. The fix was a single missing file.Write([]byte(key)) call. What made it subtle was that delete operations which have an empty key and empty value happened to pass their tests anyway, since reading zero bytes for both is always correct. Only recovery tests with real keys exposed it.