Tech

Google PM open-sources Always On Memory Agent, ditching vector databases for LLM-driven persistent memory

Published

on

Google senior AI product manager Shubham Saboo has turned one of the thorniest problems in agent design into an open-source engineering exercise: persistent memory.

This week, he published an open-source “Always On Memory Agent” on the official Google Cloud Platform Github page under a permissive MIT License, allowing for commercial usage.

It was built with Google’s Agent Development Kit, or ADK introduced last Spring in 2025, and Gemini 3.1 Flash-Lite, a low-cost model Google introduced on March 3, 2026 as its fastest and most cost-efficient Gemini 3 series model.

The project serves as a practical reference implementation for something many AI teams want but few have productionized cleanly: an agent system that can ingest information continuously, consolidate it in the background, and retrieve it later without relying on a conventional vector database.

Advertisement

For enterprise developers, the release matters less as a product launch than as a signal about where agent infrastructure is headed.

The repo packages a view of long-running autonomy that is increasingly attractive for support systems, research assistants, internal copilots and workflow automation. It also brings governance questions into sharper focus as soon as memory stops being session-bound.

What the repo appears to do — and what it does not clearly claim

The repo also appears to use a multi-agent internal architecture, with specialist components handling ingestion, consolidation and querying.

But the supplied materials do not clearly establish a broader claim that this is a shared memory framework for multiple independent agents.

Advertisement

That distinction matters. ADK as a framework supports multi-agent systems, but this specific repo is best described as an always-on memory agent, or memory layer, built with specialist subagents and persistent storage.

Even at this narrower level, it addresses a core infrastructure problem many teams are actively working through.

The architecture favors simplicity over a traditional retrieval stack

According to the repository, the agent runs continuously, ingests files or API input, stores structured memories in SQLite, and performs scheduled memory consolidation every 30 minutes by default.

A local HTTP API and Streamlit dashboard are included, and the system supports text, image, audio, video and PDF ingestion. The repo frames the design with an intentionally provocative claim: “No vector database. No embeddings. Just an LLM that reads, thinks, and writes structured memory.”

Advertisement

That design choice is likely to draw attention from developers managing cost and operational complexity. Traditional retrieval stacks often require separate embedding pipelines, vector storage, indexing logic and synchronization work.

Saboo’s example instead leans on the model to organize and update memory directly. In practice, that can simplify prototypes and reduce infrastructure sprawl, especially for smaller or medium-memory agents. It also shifts the performance question from vector search overhead to model latency, memory compaction logic and long-run behavioral stability.

Flash-Lite gives the always-on model some economic logic

That is where Gemini 3.1 Flash-Lite enters the story.

Google says the model is built for high-volume developer workloads at scale and priced at $0.25 per 1 million input tokens and $1.50 per 1 million output tokens.

Advertisement

The company also says Flash-Lite is 2.5 times faster than Gemini 2.5 Flash in time to first token and delivers a 45% increase in output speed while maintaining similar or better quality.

On Google’s published benchmarks, the model posts an Elo score of 1432 on Arena.ai, 86.9% on GPQA Diamond and 76.8% on MMMU Pro. Google positions those characteristics as a fit for high-frequency tasks such as translation, moderation, UI generation and simulation.

Those numbers help explain why Flash-Lite is paired with a background-memory agent. A 24/7 service that periodically re-reads, consolidates and serves memory needs predictable latency and low enough inference cost to avoid making “always on” prohibitively expensive.

Google’s ADK documentation reinforces the broader story. The framework is presented as model-agnostic and deployment-agnostic, with support for workflow agents, multi-agent systems, tools, evaluation and deployment targets including Cloud Run and Vertex AI Agent Engine. That combination makes the memory agent feel less like a one-off demo and more like a reference point for a broader agent runtime strategy.

Advertisement

The enterprise debate is about governance, not just capability

Public reaction shows why enterprise adoption of persistent memory will not hinge on speed or token pricing alone.

Several responses on X highlighted exactly the concerns enterprise architects are likely to raise. Franck Abe called Google ADK and 24/7 memory consolidation “brilliant leaps for continuous agent autonomy,” but warned that an agent “dreaming” and cross-pollinating memories in the background without deterministic boundaries becomes “a compliance nightmare.”

ELED made a related point, arguing that the main cost of always-on agents is not tokens but “drift and loops.”

Those critiques go directly to the operational burden of persistent systems: who can write memory, what gets merged, how retention works, when memories are deleted, and how teams audit what the agent learned over time?

Advertisement

Another reaction, from Iffy, challenged the repo’s “no embeddings” framing, arguing that the system still has to chunk, index and retrieve structured memory, and that it may work well for small-context agents but break down once memory stores become much larger.

That criticism is technically important. Removing a vector database does not remove retrieval design; it changes where the complexity lives.

For developers, the tradeoff is less about ideology than fit. A lighter stack may be attractive for low-cost, bounded-memory agents, while larger-scale deployments may still demand stricter retrieval controls, more explicit indexing strategies and stronger lifecycle tooling.

ADK broadens the story beyond a single demo

Other commenters focused on developer workflow. One asked for the ADK repo and documentation and wanted to know whether the runtime is serverless or long-running, and whether tool-calling and evaluation hooks are available out of the box.

Advertisement

Based on the supplied materials, the answer is effectively both: the memory-agent example itself is structured like a long-running service, while ADK more broadly supports multiple deployment patterns and includes tools and evaluation capabilities.

The always-on memory agent is interesting on its own, but the larger message is that Saboo is trying to make agents feel like deployable software systems rather than isolated prompts. In that framing, memory becomes part of the runtime layer, not just an add-on feature.

What Saboo has shown — and what he has not

What Saboo has not shown yet is just as important as what he’s published.

The provided materials do not include a direct Flash-Lite versus Anthropic Claude Haiku benchmark for agent loops in production use.

Advertisement

They also do not lay out enterprise-grade compliance controls specific to this memory agent, such as: deterministic policy boundaries, retention guarantees, segregation rules or formal audit workflows.

And while the repo appears to use multiple specialist agents internally, the materials do not clearly prove a larger claim about persistent memory shared across multiple independent agents.

For now, the repo reads as a compelling engineering template rather than a complete enterprise memory platform.

Why this matters now

Still, the release lands at the right time. Enterprise AI teams are moving beyond single-turn assistants and into systems expected to remember preferences, preserve project context and operate across longer horizons.

Advertisement

Saboo’s open-source memory agent offers a concrete starting point for that next layer of infrastructure, and Flash-Lite gives the economics some credibility.

But the strongest takeaway from the reaction around the launch is that continuous memory will be judged on governance as much as capability.

That is the real enterprise question behind Saboo’s demo: not whether an agent can remember, but whether it can remember in ways that stay bounded, inspectable and safe enough to trust in production.

Source link

Advertisement

Leave a Reply

Your email address will not be published. Required fields are marked *

Trending

Exit mobile version