Research

Introducing .genome, the genome file designed for AI

April 2026

Today we’re releasing two things together.

.genome/1.0 — the first open specification for a consumer genome file designed to be read by an AI.
readmygenome.md — a Claude skill that lets any Claude instance read a .genome bundle correctly, with provenance, out of the box.

Both are open-source, Apache-licensed, and at github.com/genome-computer/genome-spec. One is the format. The other is how an AI actually uses it.

A genome file designed for AI is not a small improvement on what exists. It is a different category. Here is why.

Why a new file was necessary

The file your genome currently lives in — VCF, Variant Call Format — was standardised in 2011 for sequencing pipelines. It was designed for tools written by specialists, operated by specialists, interpreted by specialists. The file did not need to explain itself. The specialist explained it.

Fifteen years later, the specialist is increasingly an AI acting on your behalf. And the AI does not have the specialist’s toolkit. It has the file and whatever it can reconstruct from it. Every place where VCF defers meaning to external context — which is most places — is a place where the AI guesses.

A pipe-delimited string where meaning depends on a header the model no longer has.
A ClinVar tag where pathogenicity and conflict are encoded in punctuation.
An effect size without the allele it refers to — so risk flips depending on interpretation.
A pharmacogenomic phenotype that requires haplotype logic the model will invent rather than execute.
An “actionable” flag that depends on a guideline list the model is approximating from memory.

None of these are bugs. They are features of a format designed for a different reader. The fix is not a better model. It is a file the model can actually read.

What makes .genome different

A .genome bundle separates three things that VCF entangles: the variant data, the interpretation of what each variant means, and the rules that decide what counts as important. All three are explicit, typed, versioned, and queryable. This is how databases, compilers, and financial systems are built. Consumer genomics had not yet made the move. .genome makes it.

A formal correctness guarantee.For any question expressible as a query over the bundle’s fields — actionable pathogenic variants, CYP2D6 metabolizer status, variants where the effect allele is present and OR > 1.2, rare missense variants where multiple predictors agree — format-induced error is zero. Not approximately zero. Zero. The same way a type checker rules out type errors by construction.

Deterministic answers.“Is this variant actionable?” is decided by a rule written in the manifest, computed once at pipeline time, against a hash-pinned copy of the relevant guideline list. No model imputes the rule. No answer depends on what the model remembered.

That does put a set of opinionated decisions into the file. A rarity threshold. An actionability definition. A consensus-damaging rule. We take this trade deliberately: implicit disagreement at runtime is strictly worse than visible disagreement at design time. Every rule is declared in the manifest. Every rule is versioned. If we have made the wrong choice, the correct response is a public issue proposing a revision — not silent disagreement at every point of use.

A queryable state space.

The schema converts genomic reasoning from reconstructing meaning to executing queries.

This is the real unlock for agents — not just fewer hallucinations, but a change in what kind of computation genomics reasoning is.

Speed and size. Gene-scoped queries return in milliseconds instead of seconds. Bundles are a fraction the size of an annotated VCF. Any tool that reads Parquet reads .genome — Python, JavaScript, SQL, a browser.

A dated snapshot, not a frozen truth. Clinical knowledge changes. ClinVar updates weekly. ACMG reclassifies genes. A .genome bundle is explicit about this: it is a snapshot of what was known when it was generated, with the source versions and rule definitions recorded in its manifest. Re-interpretation is a first-class operation — the same raw variant data regenerated against an updated rule set produces a new bundle. You choose between snapshot-consistent answers (auditable, reproducible) and latest-consistent answers (freshest). Both are supported.

What this enables

If you want to ask an AI about your genome: your AI can finally answer without silently getting the file wrong. Ask harder questions. Trust the answers more.

If you are building an AI on top of genomic data: this removes an entire class of failure from your system. You stop writing defensive code around VCF quirks. You stop explaining to users why the answer changed between conversations. Correctness becomes a property of the data layer, not of every prompt.

If you are a provider shipping genomic data: adding a .genomebundle alongside your existing VCF takes one conversion step and no pipeline rewrite. Your customers’ AI tools work better on your data. That is a product advantage, not a science project.

What we’re releasing

Everything is at github.com/genome-computer/genome-spec, Apache-licensed.

The file format. The .genome/1.0 schema and specification, a reference reader and validator, a synthetic example bundle anyone can inspect, and a benchmark suite.

The converter. genome-convert — a single-command tool that turns any annotated VCF into a compliant .genome bundle in under five minutes. No pipeline rewrite, no custom infrastructure.

The Claude skill. readmygenome.md — a drop-in skill that teaches any Claude instance how to read a .genome bundle correctly, with provenance and without improvising. Install it once, upload a bundle, ask anything.

Your original VCF is not replaced. It ships alongside as the sovereignty layer. The bundle is additive.

How to actually use this

Three steps:

Convert your annotated VCF to a .genome bundle with genome-convert.
Install readmygenome.md as a Claude skill.
Upload your bundle to a Claude conversation. Ask questions.

Claude validates the bundle, reads the manifest, runs structured queries against the data, and answers with provenance. Every interpretive claim cites the rule used and the snapshot date. Out-of-scope questions are refused, not improvised.

The skill is ~300 lines of open-source code. Anyone can adapt it for a different agent, a different model, or a different user interface. We are shipping it because a file format is only as useful as the tools that read it correctly, and we want the first agent that reads .genome bundles well to exist on day one.

We’ve also written a longer research post that walks through the full invariant-violation framework, the formal correctness claim, and the trade-offs in detail. It’s linked from the repository.

The consumer genomics file format is about to matter more than it has in fifteen years, because for the first time the primary reader is not a specialist. It’s a machine.

github.com/genome-computer/genome-spec · research@genome.computer