ckvstore: A Crash-Safe Key-Value Store Built on Append-Only Logs

A Raspberry Pi-based industrial controller manages a bank of temperature sensors on a factory floor. Each sensor has a unique ID (a short string) and a calibration record (a binary blob: gain coefficient, offset, timestamp of last calibration). The controller stores these in a simple key-value store. Power can be cut at any instant—a tripped breaker, a forklift clipping a cable. When power comes back, the operator needs the last successfully written calibration data to be exactly intact, with no corruption and no partial writes silently accepted as good data. At Hamkee, we developed ckvstore precisely for these environments, where data integrity is not a feature, but a prerequisite.

We built this tool to address the fundamental need for a simple, reliable, and crash-safe key-value store with a minimal footprint, suitable for embedded Linux and IoT applications. The core problem we focused on was ensuring data survival and consistency during unexpected power loss. The solution lies in a classic but powerful computer science technique: the append-only log.

The Append-Only Log: A Foundation of Reliability

Modern databases often perform complex in-place updates to data files. While efficient for some workloads, this approach introduces significant risk in a power-loss scenario. If the system crashes mid-write, a data file can be left in a corrupted, half-written state, a condition that can be difficult or impossible to recover from without complex journaling or write-ahead logging (WAL) systems.

We chose a simpler, more robust primitive for ckvstore: the append-only log. The principle is straightforward: new data is never written over old data. Instead, every write operation, whether for a new key or an update to an existing one, is appended to the end of a single data file.

This design is inherently crash-safe. The existing, valid data is immutable and is never touched. When our Raspberry Pi controller writes a new calibration record, the operation happens at the current end-of-file. If power is cut during this write, one of two outcomes is guaranteed:

The write never made it to the disk. The file remains in its last known-good state. A partial, incomplete record was written to the disk. In neither case is the original, valid data harmed. As we will see, detecting and discarding a partial record on the next startup is a trivial operation. This design eliminates the entire category of data corruption errors caused by in-place updates.

The On-Disk Format: A Contract for Integrity

To make the append-only log recoverable, ckvstore defines a strict binary format for every record written to disk. Each key-value pair is wrapped in a header containing metadata essential for recovery.

Conceptually, each record looks like this on disk:

[CHECKSUM] [KEY_LEN] [VALUE_LEN] [KEY_DATA] [VALUE_DATA] Our implementation in C defines a structure for the header that is serialized to the file:

#include <stdint.h>

// Represents the on-disk header for a single ckvstore entry.
typedef struct {
    uint32_t checksum; // CRC32 checksum of the key and value data.
    uint32_t key_len;  // Length of the key in bytes.
    uint32_t val_len;  // Length of the value in bytes.
} ckv_entry_header_t;

When a write is requested for a new sensor calibration:

We calculate a CRC32 checksum of the key and value data. This checksum is our guarantee of integrity. We populate the ckv_entry_header_t struct with the checksum and the data lengths. We write the header, the key, and the value to the data file in a single sequence. This self-contained format is critical. Any record read from the disk can be independently verified by re-calculating its checksum and comparing it to the stored value. A mismatch means the record is corrupt.

Crash Recovery in Action

When a ckvstore database is opened, it assumes a crash may have occurred. The recovery process is synonymous with the startup process and is deterministic and reliable:

The database file is opened. The file is read sequentially from the beginning, record by record. For each record, we read the header and then use key_len and val_len to read the associated data. We compute the CRC32 checksum of the key-value data we just read. This computed checksum is compared against the checksum field in the header. If they match, the record is valid. We update our in-memory index (a hash table) to map the key to the byte offset of this record in the data file. If the key already exists in the index, its offset is simply updated to this newer location. If they do not match, the record is corrupt. This indicates a torn write from a previous power failure. We immediately stop the scanning process. Because we only ever append, we know that any corruption can only exist in the very last record. By stopping the scan, we effectively truncate the log to its last known-good state. When the Raspberry Pi controller reboots, ckvstore performs this scan. The partial write from the power cut fails its checksum check, and the recovery process stops right before it, ensuring the in-memory index is built using only complete, valid calibration data.

The Compaction Problem and Atomic Solution

The primary trade-off of an append-only log is that the data file grows indefinitely. When a key is updated, the old data is not deleted; it is simply orphaned, occupying space. To reclaim this space, we must perform compaction.

However, the compaction process itself must be crash-safe. A crash during compaction is just as dangerous as a crash during a regular write. Our design goal was to ensure that a power failure at any point during compaction would either leave the original data file perfectly intact or result in a successfully completed compaction. There is no intermediate, corrupt state.

ckvstore implements a two-file atomic compaction strategy:

Create a Temporary File: A new data file is created, e.g., ckv.db.compact. All new writes are temporarily paused or buffered. Write Only Live Data: We iterate through the in-memory index, which already contains the location of the latest value for every key. For each key, we read its data from the old file and write it into the new .compact file. Stale data is never read and is thus left behind. Force to Disk: Once all live data has been written to the new file, we issue a critical fsync() call. This system call instructs the operating system to flush all buffered data for the .compact file from memory to the physical storage device. After fsync() completes, we have a complete, consistent, and durable copy of the compacted data. Atomic Rename: The final step is a single rename("ckv.db.compact", "ckv.db") system call. On POSIX-compliant filesystems, rename() is an atomic operation. It is guaranteed to either happen completely or not at all. This sequence is the cornerstone of ckvstore's safety. If power is lost before the rename, the .compact file is simply an orphaned temporary file that is discarded on the next startup, and the original ckv.db remains untouched. If power is lost after the rename is complete, the compaction was a success. The atomicity of the rename call ensures there is no moment where the database is in an inconsistent state.

A simplified code sketch of the final compaction steps illustrates the principle:

// ... after writing all data to temp_file_descriptor ...

// Step 3: Ensure all data is physically on disk.
if (fsync(temp_file_descriptor) != 0) {
    // Handle error: couldn't guarantee data is on disk.
    // Abort compaction, clean up temp file.
    close(temp_file_descriptor);
    unlink(temp_path);
    return -1;
}
close(temp_file_descriptor);

// Step 4: Atomically swap the new file for the old one.
if (rename(temp_path, db_path) != 0) {
    // Handle error: the final atomic step failed.
    // The old DB file is still intact.
    unlink(temp_path);
    return -1;
}

// Compaction successful.
return 0;

Conclusion

The design of ckvstore demonstrates a core Hamkee engineering principle: building reliable systems through deliberate architectural choices. By leveraging an append-only log, a strict on-disk wire format, and an atomic compaction process, we created a key-value store that provides strong guarantees of data integrity in the face of unpredictable hardware and power environments. For embedded systems like the industrial controller on the factory floor, this is not a luxury; it is the foundation of a dependable application.

ckvstore was developed by the engineering team at Hamkee, where we specialize in high-performance unix/linux solutions. We invite you to explore the repository, examine the implementation, and contribute to its development.