Data DAOs

The Question

What if the people who care about a dataset could collectively pay to keep it alive?

Not a foundation. Not a company. Not a single maintainer running a server on goodwill and donations. A smart contract that collects funds from anyone who cares, and pays storage providers to keep the data durable on a p2p network. No governance tokens. No weekly votes. Just: this data stays alive, and here’s the money to make it happen.

The Problem

Data collections die when their maintainers can’t afford hosting. Myrient maintains one of the largest retro game archives on the internet. The Internet Archive has faced existential legal and financial threats. Countless smaller archives, research datasets, cultural collections, community-maintained repositories, disappear when one person loses interest or runs out of money.

Centralized hosting is a single point of failure in two dimensions: infrastructure and funding. The server goes down, or the credit card expires. Either way, the data is gone.

Decentralized storage solves the infrastructure problem. Archivist distributes data across unreliable nodes and uses erasure coding and cryptographic proofs to keep it durable. But the funding problem remains. Someone still has to pay the storage providers. The question is who, and how.

Evidence

Myrient retro game archive: myrient.erista.me

Internet Archive legal/financial threats (Hachette v. Internet Archive): Wikipedia

Archivist storage protocol: archivist.storage

What Vitalik Said

Vitalik published a post arguing we need “different and better DAOs”: not the governance-maximalist structures that dominated 2021, but lightweight coordination tools for specific problems.

Two of his categories map directly to data preservation. First: DAOs to “get projects off the ground quickly,” a group willing to pool funds for a task too short in duration to justify forming a legal entity. Second: DAOs for “long-term project maintenance,” communities that keep infrastructure running after original teams move on.

He also frames the problem through his concave vs. convex distinction. Convex decisions benefit from decisive leadership. One strong call beats the average of many. Concave decisions benefit from robustness and averaging. The compromise is better than the gamble. Data preservation is deeply concave. You don’t want a brilliant leader making bold bets about which data to keep. You want steady, redundant, averaged-out funding that keeps everything alive.

There’s a third point worth pulling from the DAO discussion: decision fatigue. Most DAOs collapse under the weight of constant governance: proposals, votes, quorum requirements, delegate drama. A Data DAO sidesteps this entirely. The “decision” is pre-made: keep this dataset alive. No weekly votes needed. No governance tokens to farm. The contract accepts funds and pays storage providers. That’s it.

Evidence

Vitalik on DAOs: x.com/VitalikButerin/status/2013145235447042067

“Moving beyond coin voting governance” (concave vs. convex): vitalik.eth.limo

The DAO We Already Tried

I built a DAO before. Dad DAO started with Michael Trosen and Demetrick Ferguson as a paid learning community and incubator on Algorand. We built NFTs on ARC-69, created ARC-333 for NFT-controlled governance. Never launched a token. Never launched the collection. The coordination overhead was real. The automation required was deeply intertwined with a philosophical problem of not being able to actually run one.

Griff Green went through this at a much larger scale. His interview on The Bitcoin Podcast covers The DAO’s history: the original 2016 experiment that raised $150M and then collapsed. The lesson isn’t that DAOs don’t work. It’s that DAOs with broad mandates and complex governance don’t work. The more decisions a DAO has to make, the more surface area for failure.

A Data DAO makes almost no decisions. That’s the point.

Evidence

Dad DAO (ARC-333, ARC-69, learning community): Dad-DAO GitHub, r/AlgorandOfficial

Griff Green on The Bitcoin Podcast, The DAO history: thebitcoinpodcast.com

The DAO (2016): Wikipedia

Data DAO: The Simplest DAO

A Data DAO is a smart contract that does three things:

Accepts donations from anyone
Periodically pays Archivist storage providers to keep a specific dataset’s storage request funded
Lets anyone see the dataset’s health and funding status

No governance tokens. No voting on direction. No delegation. No quorum. The “decision” was made at contract deployment: keep this data alive. Everything after that is maintenance: automated, transparent, and verifiable on-chain.

This is Vitalik’s concave case made concrete. You want robustness and averaging, not decisive leadership. A thousand people each contributing $1/month is more durable than one benefactor contributing $1,000/month. The benefactor can disappear. The crowd averages out.

Durability Labs built Archivist to make storage durable at the protocol level. A Data DAO makes it durable at the funding level. The two layers together close the loop: the data is redundantly stored across unreliable nodes, and the payment for that storage is redundantly funded across unreliable donors.

The Economics

Archivist targets approximately $10/TB/month for durable, erasure-coded, proof-verified storage. That’s the baseline cost to keep data alive across a decentralized network of storage providers.

A 100TB archive, roughly the scale of a comprehensive retro game collection, costs $1,000/month. Split across 1,000 donors, that’s $1/person/month. Split across 10,000, it’s $0.10. The per-person cost drops to noise long before the community runs out of people who care.

The smart contract holds a funding buffer, say 6 months of storage costs. When the buffer drops below a threshold, the contract’s status page shows the dataset is underfunded. Community members top it up. No emails, no fundraising campaigns, no single point of contact. Just a public number that anyone can see and anyone can increase.

Evidence

Archivist whitepaper (marketplace, pricing model): docs.archivist.storage/learn/whitepaper

The Leaderboard

Funding is one layer. Participation is another.

Archivist Desktop lets anyone run a storage provider node on consumer hardware: a NUC, a laptop, whatever has spare disk. A Data DAO can add a community participation layer on top: install the app, contribute bandwidth and storage to a specific collection’s p2p network, bind your Discord handle, show up on a public leaderboard.

The leaderboard adds a participation layer. It’s not governance. It’s recognition. Top contributors get visibility, not voting power. The incentive is social proof and community standing, not token accumulation.

I’m building this for the Data DAO landing page, a place where contributors can see the collection’s health, install the app, and track their standing.

Evidence

Archivist Desktop (consumer hardware storage nodes): pilot.archivist.storage

Archivist targets consumer hardware (NUCs, laptops): docs.archivist.storage/learn/whitepaper

Threshold Funding

A Data DAO doesn’t need to be open-ended. Riff.cc introduced a two-stage funding model called threshold funding that maps cleanly onto data preservation.

The first stage is the production threshold. A campaign sets a funding goal: the amount needed to cover, say, one year of storage for a specific dataset. Supporters pledge funds. The smart contract holds contributions in escrow until the goal is met. If the deadline passes without reaching the threshold, everyone gets refunded automatically. No trust required. This is the Kickstarter model made trustless.

The second stage is the commons threshold. After a dataset has earned enough through access fees, ongoing donations, or sustained community funding, its access terms flip. The data becomes freely available. The smart contract records this commitment immutably. Riff.cc frames this as “all rights reserved, then some rights reserved.” A dataset might start with contributor-only access, then at $250k in cumulative funding, the access controls drop and it enters the public commons.

Data preservation is the perfect use case for both thresholds. The production threshold answers a concrete question: is there enough demand to justify storing this? If a retro game archive can’t find 500 people willing to pledge $2/month, maybe the community isn’t there yet. No money wasted, no half-funded storage deals. The commons threshold answers the follow-up: when has the community funded enough to make it public? Both checks are simple numeric comparisons. No governance votes, no delegate negotiations, no quorum drama. The contract counts money and flips a boolean.

Evidence

Riff.cc threshold funding model (production + commons thresholds): riff.cc/docs/concepts/threshold-funding

The Architecture

A single Data DAO contract works for one dataset. A platform that lets anyone spin up a Data DAO for any dataset needs something more general.

The pattern is a factory. One master contract, the Factory, deploys cheap copies of a Data DAO template. Each copy is a mini DAO: isolated funds, isolated roles, but shared and audited logic. The EIP-1167 minimal proxy standard makes each clone cost a fraction of a full deployment. A frontend can spin up a new Data DAO in one transaction.

Each mini DAO instance is a thin kernel. It holds a role registry (who can withdraw, who can add modules), references to its active campaign modules, and a treasury. No complex governance by default. A multisig or even a single admin works fine for most data preservation campaigns. The kernel coordinates. It doesn’t decide.

Campaign modules plug into the kernel. Threshold funding is one module. Others could include revenue splits for datasets with multiple contributors, access control with encrypted data and key release on funding, or scheduled payouts to storage providers on a fixed cadence. Modules attach to the kernel without changing its core logic. Swap one out, add another, the kernel doesn’t care.

The factory pattern makes this recursive. A Music DAO could spawn Album DAOs, each running its own threshold campaign. A Research DAO could spawn Dataset DAOs for individual studies. Each level inherits the same audited template from the factory. Nesting costs almost nothing because each clone is a minimal proxy pointing at shared bytecode.

Evidence

EIP-1167 minimal proxy / clone pattern: RareSkills EIP-1167 guide

Riff.cc threshold funding (modular campaign model): riff.cc/docs/concepts/threshold-funding

TODO: expand with…

Integration details with Archivist’s storage request API
Real leaderboard data once the backend is live
Comparison to existing data preservation DAOs (if any emerge)
Legal analysis: does a Data DAO need entity wrapping?

jessie broke

Explorer