This Week in AI #015 — March 16, 2026
Google drops Gemini 2.5 Ultra, open-source reasoning models close the gap, and the EU AI Act's first enforcement actions arrive.
View all this week in ai depths →Depth ladder for this topic:
This Week in AI #015 — March 16, 2026
A week of big releases and bigger questions. Google launched Gemini 2.5 Ultra with claims of human-expert-level reasoning, the open-source community responded with a 32B model that punches above its weight, and the EU started enforcing its AI Act for real. Let’s get into it.
The Big Story: Gemini 2.5 Ultra Lands
Google released Gemini 2.5 Ultra on Tuesday, and benchmarks aside, the real story is the multimodal reasoning capability. The model processes video, audio, images, and text simultaneously in a single context window of 2 million tokens — and actually does useful things with all of it.
The demo that got everyone talking: uploading a 45-minute lecture video and getting a structured study guide with timestamps, key concept explanations, and practice questions — all in about 20 seconds. Not revolutionary on paper, but the quality and speed together crossed a usability threshold.
Benchmark highlights:
- GPQA Diamond: 78.3% (up from 69.4% on 2.0 Ultra)
- MATH-500: 97.1% (approaching saturation)
- Multimodal reasoning suite: 84.2% (new benchmark, so comparisons are limited)
The API is available now at $15/M input tokens, $60/M output — expensive, but competitive with GPT-5 for comparable capability.
Our take: The numbers matter less than the UX. Google is clearly betting that multimodal-native experiences will differentiate Gemini from text-first competitors. So far, they might be right.
Open Source Closes the Gap: QwQ-32B-v2
Alibaba’s Qwen team released QwQ-32B-v2, a reasoning-focused model that matches GPT-4.5 on several benchmarks while running on a single consumer GPU (with quantization).
Key results:
- AIME 2026: 62.4% (vs GPT-4.5’s 65.1%)
- LiveCodeBench: 71.8% (vs GPT-4.5’s 73.2%)
- GPQA: 67.9% (vs GPT-4.5’s 71.4%)
The model uses a chain-of-thought approach similar to DeepSeek-R1 but with a more efficient architecture. Weights are fully open under Apache 2.0.
What makes this significant isn’t just the benchmarks — it’s that you can run this model locally with a 24GB GPU using 4-bit quantization via llama.cpp. The reasoning quality is good enough for most coding and analysis tasks.
Community reaction: Within 48 hours of release, GGUF quantizations were available on HuggingFace, Ollama had it in their library, and people were reporting surprisingly good results for agent workflows.
EU AI Act: First Enforcement Actions
The European AI Act’s first compliance deadline passed on March 1st, and this week we saw the first real enforcement actions. Three companies received formal notices from EU regulators:
- A major hiring platform was flagged for using emotion detection in video interviews without adequate disclosure — a banned practice under the Act.
- A content recommendation system was cited for insufficient transparency about how AI-generated content is labeled.
- A financial services firm received notice for deploying a high-risk credit scoring system without the required conformity assessment.
No fines yet — these are formal notices with 90-day remediation windows. But the message is clear: enforcement is real, not theoretical.
What this means for builders: If you’re serving EU users, the AI Act compliance timeline is no longer “eventually.” Review your systems against the prohibited practices list (Article 5) and high-risk categories (Annex III). The companies that got noticed this week reportedly thought they had more time.
Research Corner
Sparse Attention at Scale
A team from UC Berkeley published “MegaSparse,” demonstrating that models can maintain quality with up to 95% sparse attention patterns at inference time. The practical implication: 10-20x throughput improvement on long-context workloads without fine-tuning. The technique works by dynamically identifying which attention heads are actually contributing to the current generation step.
This could significantly reduce the cost of long-context inference — currently the most expensive part of running models like Gemini 2.5 Ultra or GPT-5 at their full context length.
Synthetic Data Gets More Rigorous
DeepMind published a framework for evaluating synthetic training data quality, introducing metrics for “data diversity coverage” and “concept boundary testing.” The key finding: most synthetic data pipelines produce data that looks diverse but clusters tightly in embedding space. Their proposed diversity-aware generation approach improved downstream model performance by 8-12% on reasoning benchmarks compared to naive synthetic data.
Tools and Releases
-
Cursor 0.50 shipped with background agents that can run multi-file refactors asynchronously. You describe the change, Cursor works on it in the background, and presents a diff when done. Early reports suggest it works well for straightforward refactors but struggles with cross-cutting architectural changes.
-
LangChain v0.4 dropped, with a significantly simplified API. The core chain abstraction is now optional — you can use individual components without buying into the full framework. This addresses the most common criticism from the past two years.
-
Ollama 0.6 added native support for speculative decoding with draft models, yielding 2-3x speedup for compatible model pairs. Setup is a single config flag.
-
HuggingFace launched Inference Endpoints v3 with automatic model sharding across multiple GPUs and built-in A/B testing for model versions.
Numbers of the Week
- $2.1B: Total AI infrastructure investment announced this week across three major cloud providers
- 47%: Percentage of Fortune 500 companies now using AI coding assistants, per a new Gartner survey (up from 31% in Q3 2025)
- 150M: Weekly active users on ChatGPT, per OpenAI’s latest disclosure
- 32: Days since the last major AI model leak — a new record for the industry
What to Watch Next Week
- NVIDIA GTC keynote on Tuesday — expect announcements around Blackwell Ultra and inference-optimized hardware
- Anthropic’s scheduled model update — Claude 4.5 Opus has been rumored for weeks
- Apple WWDC preview event — on-device AI capabilities for iOS 20 expected to be teased
That’s the week. If Gemini 2.5 Ultra, open-source reasoning models, or EU enforcement affect your work, the linked deep-dives below are worth your time.
Simplify
← This Week in AI #014 - Reasoning Models Go Small, Open Weights Hit a Milestone
Go deeper
This Week in AI #016 — March 17, 2026 →
Related reads
Stay ahead of the AI curve
Weekly insights on AI — explained at the level that's right for you. No hype, no jargon, just what matters.
No spam. Unsubscribe anytime. We respect your inbox.