Digital
Archive
Transparency
now
tamper-evident logs
built for the long term
tomorrow
provable indexes
of the recorded past

Tamper-evident logs for long-term digital preservation.
Creating tomorrow's trusted historical records.

Last Chance to Act

Everything digital is a copy of a copy.

Tomorrow's history is built with today's online artifacts, often stored on ephemeral storage media, without guarantees against backdated metadata or unobserved modifications.

Today's capabilities to generate large amounts of convincing fakes are improving fast, making it impossible to distinguish historical documents from forged artifacts.

Bad actors will try to leverage this for their own profit, putting pressure on current digital preservation efforts, sowing irreparable doubt into the genuineness of the recorded past.

Now is our last chance to build robust foundations for tomorrow's trust.

Why This Matters

Current digital archives are very successful at recording our present at scale. These document repositories are part of the backbone of the productive economy as anchors of trust online.

Generative technologies will soon be employed to wage economic war. Perfect operational security is extremely difficult and digital archives cannot afford to become a target.

Modern public key infrastructures and software supply chains have already solved similar problems, ensuring that billions of artifacts are tamper-evident every day.

We aim to integrate these technologies into current archival practices. We cannot store every document in existence, but we can distribute tamper-evident indexes at scale.

We build software to support existing digital archives.

Our Solution

We are building a proof of past provenance.

We design for the century scale and we only use plain text for storage.

We use simple cryptographic primitives to convincingly claim that future users will be able to check that our index is genuine. We aim for future threat resilience and cryptographic agility.

We believe in a trust model with public entities signing all online digital artifacts. Tamper-evident transparency logs have proven at scale that such a trust ecosystem can be built.

We build for local-first offline use as this is the best way to get these metadata indexes in lots of different places. We optimize for file storage instead of databases.

We aim to empower users today using modern AI and vector embeddings, as we believe that software needs to be immediately useful to everyone to last for decades to come.

This is a long way to go. We are moving forward right now.

Project Roadmap

Research Prototype
Core architecture & software
(we are here)
First Public Records
Indexing billons of documents
AI-enabled tooling
Web Archiving Integration
Interoperability with existing tools
New open standards
Large Scale Deployments
Adoption by existing actors
(trust will be here)

Target Features

📜

Provable History

Resilient digital preservation with immutable indexes backed by trusted timestamping and simple cryptography.

🔐

Tamper-Evident

Leverage transparency logs to deter bad behavior and build trust through distributed governance.

💣

Threat Resilience

Anticipate tomorrow's threats with quantum-ready digital signatures and forward-compatible features.

🌍

Long-Term Trust

Design for decades of reliability. Use open standards and libre software to ensure future access.

💾

Local-First Tools

Built to keep data offline right there on your hard drive, using current web archiving technology.

🛠️

Everyday Useful

Comprehensive embeddings for efficient discovery with AI-enabled tooling and open knowledge distribution.

Help Build the Future

We are building software for digital archive transparency. Our mission is to create tamper-evident systems that ensure the integrity of humanity's recorded history. Join our efforts now.

This is critical infrastructure required to build lasting trust in online document repositories. We are looking for partners ready to act for verifiable digital preservation.

Digital Archive Transparency is essential for creating the backbone of tomorrow's online trust.

Your support will directly fund development, research, and deployment of this foundational technology.

Current Funding
35%
Backed by private individuals
Current Target
100%
Full research prototype
Strategic Partnerships
Open
Co-development opportunities

Learn more

Is this about trust online?

Yes, but not only.

We are used to relying on reputable sources online. However, for many legitimate reasons, these may delete, modify or move the various documents they are vouching for. Long term, we end up relying on third-party repositories of unknown trust, redistributing the same information.

We trust documents we find online simply because they look authentic or because we found them in a website that seems legitimate. We are unable to distinguish genuine documents stored by a honest third-party from convincing fakes with backdated metadata.

This is a weakness that sooner or later, will be exploited.

Why such a sense of urgency?

Because we are all late to the party.

The need for authentication in the web archival landscape has long been known. The folks at LOCKSS drafted their threat model two decades ago, Perma.cc has been doing authenticated web capture for a decade and Webrecorder integrated trusted timestamping several years ago.

However, most data online today is still not obviously authenticated. Digital archiving at scale is hard, budgets are small and issues like copyright are limits that stop content-oriented approaches. Today bad actors have all the capabilities to leverage these weaknesses.

We aim for a metadata-oriented approach that integrates on top of existing document repositories. We want anyone to be able to build today a tamper-evident index of their digital artifacts, for cheap, and in a way fit for long-term storage and large scale distribution.

Why need digital archives?

This is about safeguarding our immediate future.

We all are used to going online to retrieve information important to our jobs. The availability of online sources is key to our productive economy and digital archives are used every day as backup when these sources temporarily fail, change or disappear for many different reasons.

There already exist today pervasive incentives to create chaos online to disrupt economic actors. Generative technologies as well as tense international relations are expected to put increased pressure on digital archives which are a critical part of the trust infrastructure.

It may be that, tomorrow, the integrity of our recorded past and the trust put in the genuineness of historical documents will become the unintended casualties of future ways to wage conflict. And the only fix is to act now, before any permanent damage is done.

What kind of technologies?

We believe that the ideas underlying Certificate Transparency, Sigsum and Sigstore have a role to play in building tamper-evident historical records. They are tracking every year billions of records in transparency logs, a scale comparable to existing digital archives.

Web archiving today is built with WARC files together with CDX index files matching metadata with SHA-1 or MD5 checksums. There exists many more standards for digital archives, each pairing artifacts with their metadata. We do not aim at proposing a new competing standard.

We propose to append supplemental index files, text-based, capturing just enough metadata together with cryptographic hashes, signatures and trusted timestamps. We want to empower anyone to build and share these indexes, which integrates transparency logs in their structure.

We are deliberately trying to keep things simple then later add value.

What kind of ecosystem?

We are not a root of trust.

We believe that distributed governance is key. We use transparency logs to distribute trust between national institutions, non-profits, and private individuals.

We design to integrate with the existing transparency ecosystem. We believe that logs only need to monitor that new entries added are never backdated. We aim to provide recovery paths from partial log corruption, as well as several backup plans against catastrophic log failure.

This will require transparency-enforcing tools and clients. We want to enable anyone to write their own, to create software diversity. This will require open standards and we aim to assist the wider archiving community at making fast progress on these issues.

We want to enable both the small scale and the large scale by enabling clients to interact offline with slices of the index. We want to enable anyone to operate their own little archive, while providing proof of inclusion of the artifacts they care about in tamper-evident logs.

What kind of IA-enabled?

The output of digital archive transparency can be described as a large index of hashes and metadata of documents that people want to keep around, with a proof of past provenance.

This can be leveraged by third-parties to build content addressable storage as well as, at the cost of generating and storing document embeddings, efficient retrieval with full-text search.

We believe that this can open a path to search through digital archives at scale without interacting with the archive itself. This can both enable lower costs for digital archives, as well as great value for users and opportunities for third-parties to build online services.

What about the century scale?

The problems surrounding digital archives can be described as century-scale.

We do not aim to solve everything revolving around digital archives, but we do aim to bring into the long-term innovations like tamper-evident transparency logs.

Scaling file formats and protocols to several decades is not a trivial task as we cannot predict the future. This is in some fundamental way a guess at what will last, or not.

We still believe that some meaningful choices are possible to make, from using text files as our preferred way of storing metadata, to designing for cryptographic agility, carefully planning forward-compatibility, etc.

Is this blockchain or not?

Call it a ledger if you want!

Blockchain technologies spawned many innovations making full use of modern cryptography, however, we are aiming for a more minimalistic approach. Yes, we are using append-only immutable trees of hashes. No, we are not building a blockchain.

We believe here that our limiting factor is not good cryptography.

We are making opinionated choices specific to long-term digital preservation and believe that transparency logs are a better fit for this. We also believe that blockchain-based OpenTimestamps services will have some role to play.

Is this a work in progress?

We are still in the early stages of the project.

We are currently working on a research prototype to experiment with the design space available to us. The short term goal is to build a small-scale deployment of a million documents.

We invite you to contact us to help us scale to billions of digital records!

Get Involved

Interested in supporting resilient digital preservation?

Whether you are a future partner, public institution, or private individual, we would love to hear from you.

Looking for sponsors & funding