Back to Writing Dolt: What If Your Database Had a Git History?

Dolt: What If Your Database Had a Git History?

There is a thought experiment I keep coming back to.

You are debugging a model regression. The accuracy was fine last week. Nothing changed in the code. But something changed — you can feel it. You grep through migration files, diff some CSVs you exported by hand three weeks ago, and eventually give up and re-run training from scratch. Two hours gone.

Now imagine instead you ran this:

SELECT * FROM dolt_diff_training_data
WHERE commit_hash = 'last_week'

And got back every row that changed, the old value, the new value, and who committed it.

That is Dolt.

Contents


Git Semantics. MySQL Compatibility. One Binary.

Dolt is a relational database engine where version control is not a bolt-on feature — it is the storage model. Every table is a Merkle DAG. Every INSERT, UPDATE, and DELETE is a potential commit. Every commit has a hash, an author, and a message.

It is also fully MySQL-compatible. You connect with any MySQL client (up to 8.4), write standard SQL, and get all of this for free:

Git Command Dolt Equivalent
git init dolt init
git add call dolt_add('table_name')
git commit -m "msg" call dolt_commit('-m', 'msg')
git log SELECT * FROM dolt_log
git diff SELECT * FROM dolt_diff_employees
git branch dolt checkout -b feature-branch
git merge dolt merge feature-branch
git blame dolt blame employees
git clone dolt clone remote/db
git push/pull dolt push/pull

The mapping is almost 1-to-1 because the underlying model is Git's content-addressable storage — adapted for tables instead of files. The team at DoltHub spent years building a new storage format (prolly trees) that gives you row-level diffs without scanning entire table dumps.


The Feature Nobody Else Has: Cell-Level Lineage

Standard databases let you audit with triggers and audit tables. Dolt makes lineage a first-class query:

-- Full history of a single row across all commits
SELECT * FROM dolt_history_employees
WHERE id = 42
ORDER BY commit_date;

-- Who last changed each row in the employees table?
-- (CLI)
dolt blame employees

The dolt_history_* system tables exist automatically for every table you create. No triggers. No audit schema. No middleware. You do not think about it until you need it — and then it is just there.

This is the detail that converts skeptics. The first time you genuinely need to answer "what was this value three deployments ago?" and the answer is a SQL query instead of a support ticket, you start seeing version-control-shaped problems everywhere.


Real Teams, Real Problems

Dolt is not a toy. It has paying customers solving concrete problems:

Game configuration at scale. Scorewarrior (the studio behind Total Battle) stores all game config in Dolt — character stats, quest data, dialog trees for millions of active players. They use branches to stage content changes before release. Dolt's JSON diffing outperforms Postgres and MySQL on large documents.

ML reproducibility. Flock Safety (AI-assisted public safety) and Turbine (AI-driven cancer drug discovery) both use Dolt for the same reason: when a model produces a different result, they need to diff the exact training data between runs. Flock Safety's engineers put it plainly:

"We chose Dolt solely for reproducibility, and then over time discovered uses for version control we hadn't anticipated."

Network configuration. Nautobot integrates Dolt for branch-and-merge on network config. Every change goes through human review with full diff and rollback. When a single typo can take down infrastructure, immutable history is not a nice-to-have.

Compliance. Public companies subject to SOX can use Dolt out of the box as an audit trail. Every data change records who, what, and when — with no extra middleware.


The Angle I Did Not Expect: AI Agent Memory

The Dolt README has a line that stopped me cold:

"Dolt is the best database for agent memory, especially as you move up the ladder to multi-agent and multi-machine workflows."

Think about what multi-agent systems actually need from a persistence layer. Agents need shared state. They need to experiment without polluting each other. They need to merge learnings from parallel runs. They need a full audit trail of what each agent decided and when.

That is a version control problem dressed up as a database problem.

Beads is an agent memory system built directly on Dolt. Gas Town (Steve Yegge's agent framework) uses it as its persistence layer. The DoltHub blog in early 2026 has been heavy on agent content — "Agents Need Tests: DoltLite Edition," "Multi-Agent with Dolt Remotes," "Dueling Agents: Claude and Codex." The pattern is clear: as we move from single-agent prototypes to multi-agent production systems sharing state across machines, Dolt is becoming the persistence layer teams reach for.

An agent that can dolt branch its memory context before a risky operation, dolt merge learnings from parallel worker agents, and dolt rollback when something goes wrong — that is a qualitatively different kind of agent than one writing to a flat key-value store.


The Ecosystem Around Dolt

Dolt has grown well beyond the core engine:

DoltHub — A GitHub for databases. Browse, fork, and clone public datasets. Push your own. Free for public hosting.

DoltLab — Self-hosted DoltHub. Run it inside your own infrastructure.

Hosted Dolt — Managed cloud deployment on AWS and Azure. Spin up a version-controlled database in minutes.

Doltgres — The same model, but Postgres-compatible. Currently in Beta — worth watching if you are Postgres-first.

DoltLite — A local-first sync engine using Git-style merges instead of CRDTs. The pitch: "redefines what is possible for local-first software." Early days, but the architecture is interesting.

164 contributors. Active daily commits. v2.0.1 shipped May 8, 2026. This is not an abandoned research project — it is a production system with real engineering investment behind it.


Try It in Five Minutes

The binary is 103MB. On macOS:

brew install dolt

On Linux:

sudo bash -c 'curl -L https://github.com/dolthub/dolt/releases/latest/download/install.sh | bash'

Configure it exactly like Git:

dolt config --global --add user.email [email protected]
dolt config --global --add user.name "Your Name"

Initialize a database and start a MySQL-compatible server:

mkdir mydb && cd mydb
dolt init
dolt sql-server

Connect with any MySQL client, create tables, insert data. Then:

# Stage and commit, just like Git
dolt add -A
dolt commit -m "initial data load"

Make some changes, commit again, and then:

-- See every row that changed between commits
SELECT * FROM dolt_diff_your_table;

-- See the full history of one row
SELECT * FROM dolt_history_your_table WHERE id = 1 ORDER BY commit_date;

The "oh" moment usually comes within the first ten minutes. The frustration that follows is realizing how long you have been working without this.


Why It Matters

The dominant approach to data versioning today is: export dumps, diff files, pray. Migration tools help with schema, but they say nothing about the data itself. Audit tables require forethought and discipline. S3 snapshots are blunt instruments. None of it gives you cell-level lineage.

Dolt's thesis is that the gap between how we version code and how we version data is not a minor inconvenience — it is a category error that costs teams real hours and real decisions. Version control changed software development not because it prevented disasters (though it did) but because it changed the economics of experimentation. You could try things, branch freely, and merge with confidence.

That same shift is available for data. It has 22,600 GitHub stars and a paying customer list that spans game studios, ML infrastructure teams, and network engineers.

The only question is when your stack will start looking unversioned by comparison.


Dolt on GitHub · DoltHub · Documentation · Use Cases Deep-Dive

Share this article