Holonomy
In 1827, Gauss proved something odd about curved surfaces.
Take a vector — just an arrow — and lay it flat against a sphere. Slide it along some closed path. Two rules: never twist the vector, and never lift it off the surface. Let the local geometry dictate its direction after every small step.
The oddity: when you return to where you started, the arrow might be pointing somewhere new. Nothing happened to it. No force acted on it. The rotation accumulated slowly, through the transport itself — a quantity called holonomy. This accumulated rotation is, in fact, a good measure of the global curvature of the surface itself! Surface curvature isn't a local property. You can't see it by staring at a point. You can only detect it by going around a loop and noticing what changed.
Metaphorically, the vector isn't a passive observer. It goes around the loop pointing one way, and comes back pointing another. Traversing the surface on a closed path fundamentally modifies the orientation of the vector, whose new angle now encodes a summary of both the surface it passed over and the path it took.
We named the company after this idea: going in loops can update your orientation.
The Intuition
Work with data systems long enough and you develop an uncomfortable intuition: the most important properties of a system are never visible from a single vantage point. Your CEO asks a question. You produce a Dashboard Of Valuable Business Insight. You move on, the organization ETLs into more tables, life continues. Whatever you learned from building that dashboard is lost to time. This is organizational curvature. Your warehouse, your dbt models, your team's tribal knowledge about which columns mean what — these form a surface, and it's curved in ways no schema diagram will show you. The curvature lives in the gaps: joins that silently fan out. Metrics that mean one thing to product and another to finance. Churn models that work beautifully until someone changes an upstream ETL job on a Tuesday night, with commit message: "tiny fix, don't worry".
Existing data tools and agent harnesses treat this gap purely as a documentation and context problem. Write more markdown. Add more tests. Build a semantic layer. And to be fair, these techniques do usually help. They are also, by construction, brittle and temporally local instruments. They cannot take advantage of knowledge that emerges only from the accumulation of many closed investigation loops.
So: we don't map the curvature from a single point, instead choosing to measure the holonomy directly. We instrument the loops, collect the residues, and let the wrinkly geometry of the space reveal itself.
Holonomic
Our first product is `holonomic` — a collaborative, self-improving agentic system for data question-answering and analytics. Self-hosted and private by default.
On the surface, it presents as a natural-language analytics tool: ask questions of your business data → get interactive notebooks, dashboards, and applications as output. That part is increasingly table stakes, getting better and better as foundation models improve. What makes holonomic different is what happens after.
Every exploration session produces what we call an agent trace — not a log, but a structured record of a complete traversal through your data environment. What the agent tried. What it got wrong. What it got right. The implicit quality signals embedded in how your team responded to the results.
Over time, these traces accumulate. We aggregate and analyze them to drive improvement through three distinct mechanisms:
Context Augmentation
The system reads your session transcripts and understands, structurally, what went wrong and why. By reading chain-of-thought reasoning token output, it learns to identify the join that's consistently confusing, the column name that misleads, and the metric definition that has drifted between Finance and Product. These surface as actionable suggestions — for your data team, not just for the model. Think of it as what a very attentive, very patient analyst would notice after sitting in on a hundred user sessions. Explicit, interpretable feedback for your underlying context layer.
Weight Adaptation
In addition to context improvements, we also use your sessions — plus implicit usage signals and explicit feedback signals — to bootstrap a reinforcement learning pipeline for the underlying LLM to mold its behavior to your organizational context and desired outcomes. Not prompt engineering, or RAG with better retrieval, but true end-to-end weight adaptation: your schemas, your naming conventions, your dialect of SQL, your team's analytical habits. The off-the-shelf model becomes your own. By treating agent traces as post-training data, we can do something few teams are doing seriously: RL the model and the harness together, in your specific environment, using your own usage as the training signal. The accumulated residue of every loop becomes the curriculum.
Residual Stream Activations
Context space and weights space both rely on behavioral signal — what the model did, what the user corrected. But there's a third source of signal, and it's the one we're most excited about exploring: what the model "felt" while it was doing it.
Recent work in mechanistic interpretability has shown that transformer internals are far more legible than people assumed. The residual stream — the running sum that passes between layers — acts as a shared communication channel where attention heads read and write structured information. Features in this stream can correspond to recognizable things: concepts, uncertainty, retrieval states, even something abstract, like "hesitation".
We enable you to train lightweight attention probes — small classifiers that sit on top of the residual stream at inference time — to detect when the model is confused, by using labeled examples from your traces. Not confused in the sense of producing a wrong answer (evaluations can handle that), but confused in the mechanistic sense: attention heads distributing weight diffusely across candidate schemas, the residual stream carrying competing representations of a column's meaning, the model's internal state consistent with "I am guessing" rather than "I know."
This is a fundamentally different kind of signal. Behavioral traces and evals tell you the model was wrong after the fact. Residual stream probes may be able to tell you that the model was uncertain in the moment, even when it happened to get lucky. A prompt that returns correct results but triggers high probe activation is arguably more valuable for training than one that fails outright, because it reveals the frontier of the model's knowledge: the region where it has learned just enough to get by but not enough to be reliable. We feed this signal back into the RL pipeline. Probe activations become a component of the reward model — penalizing confident-but-shaky reasoning, rewarding genuine comprehension.
The effect is something like training on the model's epistemic state rather than just its outputs: optimizing not only for what it knows, but for the quality of how it knows it.
What We Believe
The exploration loop is the correct abstraction, not the query. One question answered correctly is useful. A thousand questions — answered, corrected, and re-analyzed — is a map. We build primitives for the map.
Privacy is paramount. Your agent traces can be sensitive: they reveal what you're worried about, what you're measuring, and what you don't yet understand. Holonomic can be self-hosted and run on your own infrastructure with zero outbound network connectivity if you want. Full stop.
The best data tool makes itself less necessary. As holonomic works, context improvements feed back into your real infrastructure — better naming, better modeling, better documentation — until the system has less to correct. The goal here is to give you a ratcheting ascender to hill-climb with, not a dependency.
Messy is interesting. Flat spaces have trivial holonomy, but curved spaces are where traversal teaches you something. The more complex your data ecosystem, the more there is to learn from instrumenting the loops — and the organizations that learn fastest from that complexity will outperform those trying to eliminate it.
The world does not need more ChatGPT for SQL and Jupyter. We are building a continuous learning system for analytics; with different affordances and hooks, to meet you where you already are.
Talk to Us
Everyone is already "training a mental model" on their interactions with their analytical environment. They're just throwing the dataset away.
Holonomic Labs is based in San Francisco. We are building towards this vision. If any of this resonates, we'd like to hear from you.