Tech
Enterprise AI agents keep failing because they forget what they learned
RAG architectures are good at one thing: surfacing semantically relevant documents. That’s also where they stop.
A framework called a decision context graph addresses that gap by giving agents structured memory, time-aware reasoning, and explicit decision logic. Rippletide, a startup in the Neo4j ecosystem, has built one. The key capability: agents that are non-regressive, able to freeze validated sequences of actions and compound on them over time.
“The key point you want is non-regressivity: How do you make sure that, when the agent will generate something new, you can compound on the previous discoveries?” said Yann Bilien, Rippletid’s co-founder and chief scientific officer.
Why RAG doesn’t go far enough
Enterprise context is sprawled across ERP tools, logs, databases, vector stores, and policy documents. Generative AI tools can retrieve from all of it — through keyword search, SQL queries, or full RAG pipelines — but retrieval has a ceiling.
Notably, data retrieved may not be relevant to the decision at hand (thus causing hallucinations); and, even if agents do pull the right data, they often lack guidance to make decisions backed by a strong rationale.
That is, RAG retrieves documents, not decision context. “Everyone starts with RAG: Pull relevant docs, stuff them in the prompt, let the model figure it out,” said Wyatt Mayham of Northwest AI Consulting.
While that works fine for chatbots, it “breaks immediately” for agents that need to make decisions and take actions, he pointed out. “The biggest thing builders struggle with is the gap between retrieval and applicability.”
A retrieved document doesn’t tell the agent whether it still applies, whether it’s been superseded, or whether there’s a conflicting rule that takes priority, Mayham said. “Agents need decision context, not just information.”
In construction (the human world), that might mean knowing that a pricing exception expired, that a safety policy only applies in certain jurisdictions, or that a standard operating procedure was updated a month prior. “Miss any of that, and the agent confidently does the wrong thing,” Mayham said.
Without structured decision context, agents combine incompatible rules, invent constraints to fill gaps, and rely on what Bilien calls “probabilistic guesses over unbounded data.” Errors are difficult to reproduce because builders can’t trace why the agent made a given choice.
The compounding error problem is real, too, Mayham said: A small miss rate per step becomes “catastrophic” across a multi-step workflow. “That’s the main reason most enterprise agents never leave the pilot phase.”
How decision context graphs get to the relevant answer
A decision context graph solves this by encoding a structured map of what is applicable, what the rules are, and when they apply.
The framework is optimized for one question: “Given this situation, which context applies right now?” Time is treated as a first-class dimension; every rule, decision, and exception is scoped to when it is valid.
“The goal is to explicitly address missing, incoherent, or contradictory data when building the graph to avoid probabilistic [errors] once the agent is running,” Bilien said.
The system is built around three principles:
-
Applicability: Logic is explicitly encoded so the agent knows what rules to remember and apply in a given situation. Context is returned only when it is relevant to the situation.
-
Time‑aware memory: Every rule, decision, and exception is time-scoped. This allows agents to reason about “What was true then versus what is true now,” then reproduce or explain its decisions.
-
Decision paths: The system can explain how it got from A to B and the “why” behind its rationale (for instance, why one piece of context was included and another was not). Agents are given “decision path” examples of how similar cases were handled before.
At setup, unstructured data is ingested and structured into an ontology: what entities exist, what rules apply, what counts as an exception. Neuro-symbolic AI handles the pattern recognition and encodes formal, machine-readable logic. Over time, the system refines its knowledge base as new decisions are made.
“Neuro-symbolic brings two parts: A neuronal part giving a large autonomy to agents and a symbolic part to reduce the number of data needed and bring control,” Bilien said.
The agent is tested at build time (pre-production) to validate its behaviors or pinpoint improvements. This reduces risks as well as computation needs during inferencing, he noted.
Agents learning, rather than regressing
When it comes to non-regression, the key piece is compounding both on intelligence (models) and on knowledge (shared between agents), Bilien said. It’s important that agents can explore; when they don’t know how to accomplish a task, they can attempt different possibilities, typically in a controlled environment or simulation (like a support bot trying multiple response patterns).
Then, “once a solution is evaluated as satisfactory, the graph freezes that sequence of actions,” Bilien said. Future exploration then starts from this “stable base of validated behaviors” to prevent newly-acquired skills from overwriting previously learned good behavior.
Before an agent acts or affects a customer, it checks against the graph: Is it violating a rule? Hallucinating? Staying within constraints? Can it generalize the solution across similar cases?
At a macro level, the system assesses outcomes: Did the behavior improve long-term performance? Did it generalize across similar contexts? Did it preserve previous capabilities?
“This determinism is key for agents to run reliability at scale,” Bilien said. It leads to behavior that is more consistent, predictable, explainable, and allowing for stronger control and auditability.
“You want your agents to be able to learn by themselves when they face something they don’t know,” he said. “You want them to be able to explore and find new solutions.”
Getting beyond “episodic” memory
While the team initially assumed it would deploy RL everywhere, “that actually proved very difficult in an enterprise setting,” Bilien said. “Data are scarce for some specific use cases and messy for others.”
Typically, using raw data for reliable predictions has been a manual and time-consuming challenge, but “now with agents we entered a new era where building ontologies is possible automatically,” Bilien said.
Classic supervised fine-tuning methods can lead to oscillations, when models forget the last skill they learned while learning the next tone. Overall, learning is not compounded, compression is “dramatic,” and models improve “episodically” rather than continuously, leading them to continually fail on new or unseen tasks.
As Bilien noted: “You will never have a fully self-learning model if you are regressing every time.”
In enterprise use cases — like banking where millions of transactions are processed a day — a high level of reliability is critical, he noted. “One question I ask all customers: Is 95% enough? In a lot of use cases, it’s not. You need 99.999%. 1% off is way too much.”
Decision context graphs can close that gap, he contends: When the same customer support question is asked repeatedly, the agent will return a “satisfactory” answer predictably and without regression, all while retaining autonomy.
Encoding applicability and temporal validity into a structured graph — rather than relying on an LLM to infer it — is a “sound approach” to a real limitation in existing retrieval frameworks, Mayham said. The open question is whether the automatic ontology generation holds up against the messy, diverse data that enterprises actually have. “That’s always the hard part,” he said.
You must be logged in to post a comment Login