This Week in AI #018: Agents Get Memory, Open Models Close the Gap

The Big Story: Agents That Remember

The most significant development this week isn’t a new model — it’s infrastructure. Multiple teams shipped persistent memory systems for AI agents, moving beyond the “goldfish brain” problem where every conversation starts from scratch.

Why it matters: An agent that remembers your preferences, past decisions, and ongoing projects is qualitatively different from one that doesn’t. This is the difference between a helpful stranger and a capable colleague. The technical challenge is retrieval — knowing what to remember when — and the early implementations are surprisingly good.

What shipped:

LangChain released langgraph-memory, a library for adding episodic and semantic memory to agent graphs
OpenAI expanded custom instructions into a full memory API for GPT-based agents
Several open-source frameworks added memory backends (Redis, Postgres, vector stores)

The privacy implications are real. Persistent memory means persistent data. Expect this to become a major consideration for enterprise deployments.

Open Models Close the Gap

Llama 3.3’s 70B model posted scores within 2% of GPT-4o on major benchmarks this week, and Mistral’s new Medium model matched Claude 3.5 Sonnet on coding tasks. The gap between open and proprietary isn’t gone, but it’s narrower than ever.

What this means practically:

Fine-tuning open models for specific domains can now match or beat frontier APIs
Local deployment is viable for more use cases
The moat for proprietary models is increasingly about ecosystem, not raw capability

EU AI Act: First Enforcement

The European AI Office issued its first formal requests for compliance documentation to three companies deploying general-purpose AI systems in the EU. The companies weren’t named, but the signal is clear: the AI Act isn’t just a document anymore.

Key requirements under scrutiny:

Technical documentation about training data and methodology
Copyright compliance for training data
Energy consumption reporting
Systemic risk assessments

Companies have 30 days to respond. Fines for non-compliance can reach 3% of global revenue.

Research Worth Reading

“Scaling Test-Time Compute Optimally” — New research from DeepMind shows that letting models “think longer” at inference time follows predictable scaling laws. More compute at test time = better results, but with diminishing returns that vary by task type. Reasoning tasks benefit most; factual recall barely benefits at all.

“Constitutional Classifiers” — Anthropic published a method for training lightweight classifiers using constitutional AI principles. Instead of expensive fine-tuning, you can train a small classifier to filter outputs based on explicit rules. Useful for content moderation and safety layers.

“Mixture of Agents” — A paper showing that routing queries to specialized expert models (not expert layers within a model) consistently outperforms single-model approaches. The orchestration overhead is minimal with modern frameworks.

Tools & Launches

Cursor 1.0 officially launched, dropping the beta label with improved multi-file editing and background agents
Windsurf added voice-to-code, letting you describe changes verbally while reading code
Hugging Face Spaces now supports one-click deployment of agents with tool use
Vercel AI SDK 4.2 shipped with built-in agent memory and state management

Quick Takes

Google’s Gemini 2.0 Pro scored first place on the WebArena benchmark for browser agents — the first time a proprietary model leads this benchmark
NVIDIA reported that inference workload revenue now exceeds training revenue for the first time
Apple’s on-device models improved significantly in the iOS 19.4 beta, particularly for summarization
The average cost per million tokens across major API providers dropped 40% year-over-year

What to Watch Next Week

Anthropic’s rumored Claude 4.5 announcement (or is it Claude 5?)
Meta’s expected release of Llama 4 architecture details
The AI Safety Summit in Seoul continues with technical working groups