🟢 Essential 7 min read

This Week in AI #018: Agents Get Memory, Open Models Close the Gap

AI agents are getting persistent memory, open-weight models are matching proprietary benchmarks, and the EU AI Act's first enforcement actions arrive. Here's what mattered this week.

View all this week in ai depths →

The Big Story: Agents That Remember

The most significant development this week isn’t a new model — it’s infrastructure. Multiple teams shipped persistent memory systems for AI agents, moving beyond the “goldfish brain” problem where every conversation starts from scratch.

Why it matters: An agent that remembers your preferences, past decisions, and ongoing projects is qualitatively different from one that doesn’t. This is the difference between a helpful stranger and a capable colleague. The technical challenge is retrieval — knowing what to remember when — and the early implementations are surprisingly good.

What shipped:

  • LangChain released langgraph-memory, a library for adding episodic and semantic memory to agent graphs
  • OpenAI expanded custom instructions into a full memory API for GPT-based agents
  • Several open-source frameworks added memory backends (Redis, Postgres, vector stores)

The privacy implications are real. Persistent memory means persistent data. Expect this to become a major consideration for enterprise deployments.

Open Models Close the Gap

Llama 3.3’s 70B model posted scores within 2% of GPT-4o on major benchmarks this week, and Mistral’s new Medium model matched Claude 3.5 Sonnet on coding tasks. The gap between open and proprietary isn’t gone, but it’s narrower than ever.

What this means practically:

  • Fine-tuning open models for specific domains can now match or beat frontier APIs
  • Local deployment is viable for more use cases
  • The moat for proprietary models is increasingly about ecosystem, not raw capability

EU AI Act: First Enforcement

The European AI Office issued its first formal requests for compliance documentation to three companies deploying general-purpose AI systems in the EU. The companies weren’t named, but the signal is clear: the AI Act isn’t just a document anymore.

Key requirements under scrutiny:

  • Technical documentation about training data and methodology
  • Copyright compliance for training data
  • Energy consumption reporting
  • Systemic risk assessments

Companies have 30 days to respond. Fines for non-compliance can reach 3% of global revenue.

Research Worth Reading

“Scaling Test-Time Compute Optimally” — New research from DeepMind shows that letting models “think longer” at inference time follows predictable scaling laws. More compute at test time = better results, but with diminishing returns that vary by task type. Reasoning tasks benefit most; factual recall barely benefits at all.

“Constitutional Classifiers” — Anthropic published a method for training lightweight classifiers using constitutional AI principles. Instead of expensive fine-tuning, you can train a small classifier to filter outputs based on explicit rules. Useful for content moderation and safety layers.

“Mixture of Agents” — A paper showing that routing queries to specialized expert models (not expert layers within a model) consistently outperforms single-model approaches. The orchestration overhead is minimal with modern frameworks.

Tools & Launches

  • Cursor 1.0 officially launched, dropping the beta label with improved multi-file editing and background agents
  • Windsurf added voice-to-code, letting you describe changes verbally while reading code
  • Hugging Face Spaces now supports one-click deployment of agents with tool use
  • Vercel AI SDK 4.2 shipped with built-in agent memory and state management

Quick Takes

  • Google’s Gemini 2.0 Pro scored first place on the WebArena benchmark for browser agents — the first time a proprietary model leads this benchmark
  • NVIDIA reported that inference workload revenue now exceeds training revenue for the first time
  • Apple’s on-device models improved significantly in the iOS 19.4 beta, particularly for summarization
  • The average cost per million tokens across major API providers dropped 40% year-over-year

What to Watch Next Week

  • Anthropic’s rumored Claude 4.5 announcement (or is it Claude 5?)
  • Meta’s expected release of Llama 4 architecture details
  • The AI Safety Summit in Seoul continues with technical working groups

Simplify

← This Week in AI #017 — March 18, 2026

Go deeper

This Week in AI #019: Enterprise AI Gets Boring (That's Good), Video Models Find Their Niche →

Related reads

this-week-in-ainewsagentsopen-sourcememory

Stay ahead of the AI curve

Weekly insights on AI — explained at the level that's right for you. No hype, no jargon, just what matters.

No spam. Unsubscribe anytime. We respect your inbox.