I asked 12 AI models what they want from a repository metadata protocol

dotrepo is a metadata protocol for software repositories. One of its three intended audiences is AI coding agents — models that get dropped into unfamiliar repos and have to figure out what the project is, how to build it, who owns it, and what constraints apply.

Since AI agents are supposed to be first-class consumers of the protocol, it seemed obvious to do something that most protocol designers never do: ask the intended users what they actually want.

So I interviewed 12 AI models across 8 providers. Same prompt, fresh conversations, no priming. The results were sharper, more convergent, and more tactically useful than I expected.

The setup

I wrote a 10-question interview prompt and sent it to every major model I could access. The questions escalated from gut reaction through specific schema design, trust model assessment, adoption economics, and failure mode analysis. The last question was “If you could say one thing directly to the person building this — advice, encouragement, warning, request — what would it be?”

The models interviewed: ChatGPT 5.4 Thinking (two sessions), Claude Opus 4.6 Extended (two sessions), Gemini Pro 3.1, Gemini Thinking 3.1, Grok Expert 4.20 (two sessions), GLM-5, Hunter Alpha, MiniMax M2.5, and Nemotron 3 Super. Where possible I ran both logged-in and incognito sessions to check for consistency. Several models grounded their responses in the actual repo via web search before answering.

I did not tell any model that I thought the project was good. I told them critical feedback was more valuable than politeness. And I sent the same prompt verbatim to each one so the responses would be directly comparable.

What I expected

I expected polite encouragement with vague suggestions. I expected the models to pattern-match “new open standard” and produce boilerplate about adoption curves and schema design. I expected maybe two or three genuinely useful observations buried in a lot of diplomatic filler.

What actually happened

The 12 models independently converged on the same priorities with a specificity that surprised me.

Every single one — all 12 — identified “figuring out how to build and test this project” as the single most expensive reasoning task when encountering an unfamiliar repository. Not “what language is this” or “what does this project do,” which are usually easy to infer from file trees and READMEs. The hard part is the gap between seeing a build file and knowing the actual correct command, including flags, prerequisites, environment variables, and side effects. Multiple models described burning 3–5 reasoning turns just to reverse-engineer the build incantation from CI YAML.

Every model identified the overlay index — which provides metadata for repos that haven’t adopted dotrepo — as the smartest design decision in the project, because it sidesteps the adoption chicken-and-egg problem that kills most metadata standards.

Every model said the trust provenance system — distinguishing “maintainer-declared” from “imported from source files” from “community-inferred” — was the most genuinely novel aspect, and described how they would modulate their behavior based on trust level: acting confidently on declared facts, hedging on imported ones, and refusing to auto-execute inferred build commands without user confirmation.

And every model flagged stale metadata as the most dangerous failure mode, with the same argument: a stale .repo file is worse than no .repo file, because it suppresses the healthy skepticism that would drive an AI to verify against actual source files.

The useful parts

Beyond the consensus items, the interviews produced concrete schema and tooling wishlists that I can actually build against.

The most requested schema change was unanimous: replace plain-string build and test commands with structured objects that include prerequisites, environment requirements, platform constraints, and a “safe for agent execution?” flag. Twelve out of twelve models asked for this independently. That is about as strong a signal as user research produces.

Ten of twelve models asked for monorepo and workspace semantics. Seven asked for path semantics — explicit “generated / vendored / do-not-touch” annotations. Seven asked for structured contribution workflow metadata (CLA required? DCO? PR process?). Five pushed for field-level trust provenance instead of record-level.

On the MCP server side, six models independently asked for a remote lookup operation — the ability to take a repository URL and get back structured metadata without requiring a local clone. Several described this as the single most impactful missing capability, the feature that would turn dotrepo from a local developer tool into agent infrastructure.

The adoption thresholds were also surprisingly specific. Models estimated the flip from “nice to check” to “I always check this first” would happen at roughly 2,000–5,000 high-quality overlay records covering the most commonly encountered open source projects. Not millions. Not even tens of thousands. Just the head of the long tail — the projects that come up constantly in real coding work.

What the models disagreed on

Not everything was unanimous.

MiniMax was the most honest about timeline, estimating a 5–10 year adoption curve even if everything goes well. Most other models were more optimistic, suggesting 30-day or 90-day actions that could create meaningful traction.

Gemini Pro proposed an ai_hints schema field where maintainers could leave notes explicitly for AI (“do not touch the legacy parser in /old”). Most other models did not ask for this, preferring structured metadata over freeform directives.

Nemotron pushed hard for environment-aware metadata — build profiles, feature flags, platform-conditional commands — while most others wanted the schema to stay static and simple.

The trust provenance granularity question split roughly 5–7. Five models pushed for field-level provenance (each fact tracked individually). Seven were comfortable with record-level provenance, at least for now. The five who wanted field-level made the stronger argument — a record can have declared identity but inferred build commands — but the seven had a pragmatic point about implementation overhead.

What it means for dotrepo

The interviews crystallized three priorities with overwhelming clarity.

First, seed the index. The protocol and toolchain are ready. What is missing is data. Five overlay records is a proof of concept, not a service. The consensus target is 500+ high-quality records covering the most-depended-on projects across multiple languages.

Second, build the remote lookup. The public HTTP surface at dotrepo.org already serves trust-annotated JSON at predictable URLs. But the MCP server — which is how AI coding tools would actually consume it — does not expose this yet. That gap needs to close.

Third, protect the small core. Every model warned against schema bloat. The winning version answers 5–10 essential questions with high reliability, not 50 questions with mixed reliability. Build commands, docs locations, ownership, security contacts, and project status. That is the core. Everything else faces a high bar.

What it means beyond dotrepo

The more interesting takeaway might be methodological.

These 12 models are not sentient stakeholders with preferences and politics. But they are genuine consumers of the thing being designed, and they turned out to have specific, testable opinions about what would make them more effective. The convergence across providers — models that share no training data, no architecture, and no corporate incentive to agree — suggests the signal is real, not just pattern-matched agreeableness.

I think this kind of interview is underused. If you are building something that AI agents will consume — an API, a protocol, a file format, a developer tool — you can ask the agents what they want, and the answers are often more operationally specific than what you get from human user interviews. Not because the models are smarter, but because they have less ego about admitting where they waste effort.

The full synthesis, all 12 raw responses, and a distilled roadmap document are available alongside the project at github.com/maxwellsantoro/dotrepo. If you are building an AI coding tool and want to discuss integration, or if you maintain a popular open source project and want to see what the models said about repo metadata, I would be glad to hear from you.