AI Glossary: Reasoning and Planning Edition
Reasoning and planning are the hottest topics in AI right now. Here's every term you'll encounter — defined clearly, with context for why it matters.
View all ai glossary depths →Depth ladder for this topic:
AI reasoning and planning capabilities have exploded in the past year. The vocabulary has expanded with them. Here’s your reference guide.
Core Reasoning Terms
Chain of Thought (CoT): A prompting technique where the model is asked to show its reasoning step by step before giving a final answer. Dramatically improves performance on math, logic, and multi-step problems. First demonstrated by Wei et al. (2022) at Google.
Zero-Shot CoT: Adding “Let’s think step by step” to a prompt triggers chain-of-thought reasoning without providing examples. Surprisingly effective for a four-word addition.
Tree of Thought (ToT): Extension of CoT where the model explores multiple reasoning paths (branches), evaluates which are most promising, and can backtrack. More computationally expensive but better for problems where the first reasoning path isn’t always correct.
Reasoning Tokens: Tokens generated by the model during its “thinking” process that aren’t shown in the final output. Models like o1 and Claude with extended thinking generate these internally. You pay for them in compute but don’t see them directly.
Test-Time Compute: The idea that spending more computation at inference time (generating more reasoning tokens, exploring more paths) improves answer quality. Contrasts with the traditional approach of spending compute only during training.
Scaling Test-Time Compute: Using more reasoning steps, longer chains of thought, or multiple attempts at inference time. The key insight: for hard problems, it’s often more cost-effective to think longer than to train a bigger model.
Planning Terms
Task Decomposition: Breaking a complex task into smaller, manageable subtasks. Critical for agent systems. A planning agent might decompose “book a trip to Tokyo” into: research flights, compare prices, check hotel availability, verify visa requirements, book flights, book hotel.
ReAct (Reasoning + Acting): A framework where the model alternates between reasoning (“I need to find the current stock price”) and acting (“search: AAPL stock price”). The reasoning step makes the action choice more deliberate and interpretable.
Plan-and-Execute: An agent architecture where one LLM call creates a plan (list of steps) and subsequent calls execute each step. Separates planning from execution, which often improves reliability.
Reflexion: A technique where the agent reflects on its previous attempts, identifies what went wrong, and tries again with that insight. Similar to how humans learn from mistakes. Requires maintaining a memory of past attempts.
World Model: An internal representation of how the environment works. A model with a good world model can predict what will happen if it takes an action, enabling better planning. Current LLMs have implicit world models but they’re incomplete and inconsistent.
Search and Exploration
Monte Carlo Tree Search (MCTS): A search algorithm that explores possible action sequences by randomly sampling paths and keeping statistics on which paths lead to good outcomes. Used in game-playing AI (AlphaGo) and increasingly applied to LLM reasoning.
Beam Search: Instead of generating one token at a time (greedy decoding), beam search maintains the top-k most promising partial sequences and extends all of them. Better for finding globally good sequences at the cost of k× more computation.
Best-of-N Sampling: Generate N independent responses and pick the best one (using a reward model or verifier). Simple but effective. If each response has a 70% chance of being correct, generating 5 responses gives you a 99.8% chance of at least one being correct.
Majority Voting (Self-Consistency): Generate multiple chain-of-thought responses and take the most common final answer. Works because different reasoning paths that converge on the same answer are more likely to be correct.
Verifier / Outcome Reward Model (ORM): A separate model trained to evaluate whether a solution is correct. Used to score candidates in best-of-N sampling or to guide search.
Process Reward Model (PRM): Like a verifier, but scores each intermediate reasoning step rather than just the final answer. Provides denser feedback for training and search. More expensive to train (requires step-level labels) but more effective.
Reasoning Architectures
System 1 / System 2 Thinking: Borrowed from Kahneman’s framework. System 1 is fast, intuitive, automatic (standard LLM generation). System 2 is slow, deliberate, effortful (extended reasoning, chain of thought). Modern reasoning models try to implement System 2 thinking on top of System 1 foundations.
Inference-Time Training: Fine-tuning or adapting the model during inference on the specific problem at hand. Blurs the line between training and inference. Still experimental but shows promise for few-shot adaptation.
Mixture of Reasoning: Using different reasoning strategies for different types of problems. Simple factual questions get fast System 1 responses. Complex math problems get extended chain of thought. Classification of problem difficulty happens first.
Reasoning Distillation: Training a smaller model to replicate the reasoning behavior of a larger model. The large model generates chain-of-thought traces, and the small model is fine-tuned on them. Transfers reasoning ability at lower inference cost.
Agent-Specific Terms
Observation-Action Loop: The core agent cycle: observe the environment, decide on an action, execute it, observe the result, repeat.
Tool Use / Function Calling: The model’s ability to invoke external tools (search, calculator, code execution, APIs) as part of its reasoning process. Extends the model’s capabilities beyond text generation.
Grounding: Connecting the model’s reasoning to external sources of truth. A grounded response cites specific sources; an ungrounded response relies only on the model’s parametric knowledge.
Hallucination Detection: Identifying when the model’s reasoning includes fabricated facts or unsupported conclusions. Critical for reliable agent systems. Methods include self-consistency checks, retrieval verification, and trained classifiers.
Cognitive Architecture: The overall design of an agent’s reasoning system — how memory, planning, tool use, and reflection are organized and connected. Examples: ReAct, Plan-and-Execute, Reflexion, and custom architectures.
Evaluation Terms
Pass@k: The probability that at least one of k generated solutions is correct. Standard metric for code generation benchmarks.
Accuracy@Compute: Performance as a function of total inference compute (tokens generated). Used to compare reasoning strategies: is it better to generate one long chain of thought or ten short ones?
Reasoning Trace: The full sequence of reasoning steps the model took. Used for debugging, evaluation, and training. Some providers expose these; others keep them hidden.
Faithfulness: Whether the model’s stated reasoning actually reflects its decision-making process. A model might generate a plausible-sounding chain of thought that doesn’t correspond to how it actually arrived at the answer. This is an active research problem.
Reasoning terminology is evolving fast. This glossary reflects the state of the field as of March 2026. Expect new terms as architectures and techniques continue to develop.
Simplify
← AI Glossary: Production & Operations Edition
Go deeper
AI Glossary: Safety and Alignment Edition →
Related reads
Stay ahead of the AI curve
Weekly insights on AI — explained at the level that's right for you. No hype, no jargon, just what matters.
No spam. Unsubscribe anytime. We respect your inbox.