Fluent isn't true.
AI systems generate fluent output. Fluent isn't true. This page names what must stay visible, what counts as evidence, and why a system that sounds confident isn't the same as a system you can trust.
Legibility.
A system is legible when you can see what it's doing and why — in language a human can actually read, not in a 4,000-line log file that requires a developer to interpret.
That sounds obvious. It almost never holds in production. Most AI systems are observable but not legible: the data is there, the traces are there, the dashboards are there. But ask "what did the model actually use to answer this?" and the answer is a transcript nobody is going to read.
Legibility is the difference between we have the data and we can answer the question.
Admissibility.
Admissibility is what counts as evidence for a claim about the system.
If a model says "I retrieved this from your knowledge base," that's a claim. The admissibility question is: what counts as proof? A confident sentence? A summary in the response? A citation that may or may not exist? A trace in the underlying retrieval call?
Most AI systems collapse these. The response includes a citation, and the citation is treated as evidence. But the citation was generated by the same model that's claiming to have used it. That's not admissible. That's just fluent.
The Verse view: claims about system state need evidence that isn't generated by the system making the claim.
Source vs derivative.
A primary source is the thing that actually happened. A derivative is a description of the thing.
When summaries, paraphrases, or fluent recompositions get treated as primary evidence — when the polished retelling outranks the original — the system can drift indefinitely without anyone noticing. The summary sounds right. The actual record might say something else. Nobody's reading the actual record.
Governance starts when the original record outranks the summary. The fluent version is reference, not evidence.
Current vs historical.
The current state of the system and the historical record of how it got there are two different objects.
This sounds pedantic until it's load-bearing. When "what the system thought yesterday" and "what the system thinks today" are stored in the same retrieval surface, the model will confidently surface yesterday's answer as today's truth — and it won't tell you, because to the model, they're the same kind of object.
Keep them separate. Always.
The anti-bullshit spine.
Most AI failures aren't loud errors. They're confident sentences with no underlying truth. The system isn't lying; it's producing fluent output without the constraints that would make the output trustworthy.
We build against this on purpose. The rule we use internally — No-Fake-Success — is that no claim of completion, success, or state change can stand without verification appropriate to the claim. "I wrote the file" requires reading the file back. "I deployed" requires the deploy verifier passing. "The system is working" requires the test the system is being asked to pass.
That same discipline scales outward. If your AI system can produce a confident sentence that nobody downstream can verify, that sentence becomes the basis for a decision. Eventually one of those decisions matters. Eventually one of them costs money. Eventually one of them ends up in a deposition.
What this looks like in practice.
A governed system, on this dimension, does specific things:
- Citations point to retrievable sources, not model-generated text.
- Current state and historical narrative are different read surfaces.
- Confidence is calibrated, not performed.
- "I don't know" is a legitimate response, not a failure mode.
- Claims about the system are verifiable by something other than the system.
If you can't do these, you don't have governance. You have a system that sounds governed.