Note: This article is the result of executing a Plan-and-Solve type agent that was implemented during a tutorial. The tutorial itself is part of a post discussing autonomous agents and their design. I do not guarantee the content’s quality (I am not really fun of it). I have posted because I found the idea of, a solve-and-plan agent generating a post about itself, funny.
Plan-and-Solve AI Agents: How Planning Meets Execution
Introduction:
AI agents are systems that perceive, decide, and act to achieve goals. Plan-and-solve agents make this process explicit: they first construct a plan (a sequence of subgoals or actions) and then execute and adapt that plan to solve the task. By separating “what to do” (planning) from “how to do it” (solving/execution), they offer a transparent, reliable way to handle complex, multi-step problems.
Why this approach matters:
- Reliability and transparency: Explicit plans are inspectable, debuggable, and auditable.
- Generalization: Plans compose reusable skills to handle new tasks and changing goals.
- Efficiency: Search and constraint-solving reduce trial-and-error common in purely reactive systems.
- Safety and control: Clear constraints and checkpoints enable oversight and safe interventions.
Problems it addresses:
- Long-horizon tasks with dependencies (e.g., robotics workflows, multi-hop tool use).
- Combinatorial optimization (e.g., logistics, scheduling, routing).
- Non-stationary environments requiring monitoring and replanning.
- Multi-step reasoning in domains like code generation, game AI, and operations.
In what follows, we’ll unpack the core ideas behind plan-and-solve agents, show minimal code to illustrate planning and execution loops, and highlight practical applications
Theory: What are plan-and-solve AI agents?
Definition:
– Plan-and-solve agents are model-based agents that explicitly construct a plan using a world model and then execute it while monitoring the environment and replanning as needed. They differ from purely reactive agents (which map observations to actions directly) and from end-to-end learned policies (which typically do not expose an explicit plan).
Core loop:
- Sense: Perceive/estimate current state.
- Plan: Search/optimize over a model to produce a plan or policy to reach a goal under constraints.
- Execute: Carry out actions, often via a low-level controller.
- Monitor: Compare expected vs observed outcomes.
- Replan: Update state/belief and replan on deviations or new information.
Problem formalization:
- State space S, actions A, transition model T (deterministic or stochastic), costs/rewards, and goal set G or terminal utility.
- Deterministic planning: search for a sequence a1:k minimizing cumulative cost to reach G (e.g., A*, D*, SAT/SMT, MILP).
- Stochastic planning: plan a policy π(s) in an MDP; under partial observability maintain a belief b(s) and plan in a POMDP.
- Constraints: hard (safety, resources) and soft (preferences), encoded in logic, linear constraints, or task schemas.
Key components:
- World model: Dynamics, action preconditions/effects, cost/reward, constraints; represented symbolically (e.g., STRIPS/PDDL), probabilistically, or differentiably.
- Planner: Search or optimization (A*, heuristic search, graph planning, SATPlan, HTN, MPC, sampling-based motion planners).
- Hierarchy: High-level task planning (discrete) with low-level skills/controllers (continuous) for tractability.
- Executor: Turns plan steps into concrete actions; handles timing, concurrency, and feedback control.
- Monitor/replanner: Detects deviations, updates belief/state, triggers replanning (e.g., model predictive control/receding horizon).
How they work (flow):
- Define the planning problem: initial state (or belief), goals, model, and constraints.
- Generate a plan/policy using search/optimization guided by heuristics or value estimates.
- Execute a prefix of the plan while monitoring outcomes.
- If the world or goals change, or predicted/observed diverge, update and replan.
Variants and integrations:
- Hierarchical Task Networks (HTN) and options/skills for long-horizon decomposition.
- Task and Motion Planning (TAMP) to couple symbolic task plans with geometric/kinodynamic feasibility.
- Learning-enhanced planning: learn models, heuristics, or skills; use RL for low-level control under a high-level planner.
- LLM-assisted planning: language models propose task decompositions/plans; external tools verify/solve and execution monitors close the loop.
Advantages over other approaches:
- Handles long-horizon, multi-step tasks with explicit constraints.
- Interpretability and verifiability of plans; easier safety checks.
- Sample efficiency via model-based reasoning; strong generalization across goals with the same model.
- Composability: reuse of skills and subplans; clean integration of domain knowledge and hard constraints.
Trade-offs:
- Requires a sufficiently accurate model; model errors can mislead plans.
- Computational overhead for large, continuous, or highly uncertain spaces.
- Needs robust monitoring and replanning to handle non-stationarity and partial observability.
Minimal Working Examples (Code):
Snippet 1 — Generic A* planner
import heapq
from typing import Any, Callable, Dict, Iterable, List, Optional, Tuple
Action = Any
State = Any
def a_star(
start: State,
is_goal: Callable[[State], bool],
neighbors: Callable[[State], Iterable[Tuple[Action, State, float]]],
heuristic: Callable[[State], float],
) -> Optional[List[Action]]:
"""Return a list of actions from start to a goal, or None if no path."""
open_heap = []
g: Dict[State, float] = {start: 0.0}
came_from: Dict[State, Tuple[State, Action]] = {}
counter = 0
heapq.heappush(open_heap, (heuristic(start), counter, start))
closed = set()
while open_heap:
_, _, s = heapq.heappop(open_heap)
if is_goal(s):
return _reconstruct_actions(came_from, s)
if s in closed:
continue
closed.add(s)
for a, s2, cost in neighbors(s):
new_g = g[s] + cost
if s2 not in g or new_g < g[s2]:
g[s2] = new_g
came_from[s2] = (s, a)
counter += 1
heapq.heappush(open_heap, (new_g + heuristic(s2), counter, s2))
return None
def _reconstruct_actions(came_from: Dict[State, Tuple[State, Action]], goal: State) -> List[Action]:
actions: List[Action] = []
s = goal
while s in came_from:
prev, a = came_from[s]
actions.append(a)
s = prev
actions.reverse()
return actions
Snippet 2 — Grid domain (planning model)
from typing import Iterable, Tuple, List, Set
Pos = Tuple[int, int]
Action = str # \'U\',\'D\',\'L\',\'R\'
MOVES = {
\'U\': (0, -1),
\'D\': (0, 1),
\'L\': (-1, 0),
\'R\': (1, 0),
}
def manhattan(a: Pos, b: Pos) -> int:
return abs(a[0] - b[0]) + abs(a[1] - b[1])
def make_neighbors(width: int, height: int, known_blocked: Set[Pos]) -> callable:
def neighbors(s: Pos) -> Iterable[Tuple[Action, Pos, float]]:
for a, (dx, dy) in MOVES.items():
nx, ny = s[0] + dx, s[1] + dy
s2 = (nx, ny)
if 0 <= nx < width and 0 <= ny < height and s2 not in known_blocked:
yield (a, s2, 1.0)
return neighbors
# Example planning call:
# width, height = 6, 4
# start, goal = (0,0), (5,3)
# known_obstacles = {(2,1)}
# plan = a_star(
# start=start,
# is_goal=lambda s: s == goal,
# neighbors=make_neighbors(width, height, known_obstacles),
# heuristic=lambda s: manhattan(s, goal),
# )
# print("Plan:", plan)
Snippet 3 — Plan-and-solve loop with replanning on surprises
from typing import Set, Optional, List
class GridEnv:
"""The real world; may contain obstacles unknown to the agent initially."""
def __init__(self, width: int, height: int, true_obstacles: Set[Pos]):
self.w, self.h = width, height
self.obstacles = set(true_obstacles)
def blocked(self, p: Pos) -> bool:
x, y = p
if not (0 <= x < self.w and 0 <= y < self.h):
return True
return p in self.obstacles
def apply_action(s: Pos, a: Action) -> Pos:
dx, dy = MOVES[a]
return (s[0] + dx, s[1] + dy)
def plan_once(start: Pos, goal: Pos, width: int, height: int, known_blocked: Set[Pos]) -> Optional[List[Action]]:
return a_star(
start=start,
is_goal=lambda s: s == goal,
neighbors=make_neighbors(width, height, known_blocked),
heuristic=lambda s: manhattan(s, goal),
)
def execute_with_replanning(
env: GridEnv,
start: Pos,
goal: Pos,
known_blocked: Set[Pos],
) -> Optional[List[Action]]:
"""Execute plan; if an action fails due to unknown obstacle, update model and replan."""
s = start
full_trace: List[Action] = []
while s != goal:
plan = plan_once(s, goal, env.w, env.h, known_blocked)
if plan is None:
print("No path found. Giving up.")
return None
for a in plan:
s2 = apply_action(s, a)
if env.blocked(s2):
# Surprise! Update model and replan from current state.
known_blocked.add(s2)
print(f"Discovered obstacle at {s2}. Replanning...")
break
# Action succeeded; move to next state.
s = s2
full_trace.append(a)
if s == goal:
return full_trace
else:
# We consumed the plan without surprises but haven\'t reached goal (shouldn\'t happen).
# Loop will replan; safeguard to avoid infinite loop.
if s == goal:
return full_trace
return full_trace
# Demo
if __name__ == "__main__":
width, height = 6, 4
start, goal = (0, 0), (5, 3)
# Agent\'s initial belief (known obstacles)
known_obstacles: Set[Pos] = {(2, 1)}
# Real environment contains extra, unknown obstacles
true_obstacles: Set[Pos] = {(2, 1), (3, 0), (3, 1)}
env = GridEnv(width, height, true_obstacles)
result = execute_with_replanning(env, start, goal, set(known_obstacles))
print("Final action sequence:", result)
What this shows:
- Planning: A* searches using a heuristic and a neighbor function derived from the agent’s current model.
- Solving: An execution loop runs the plan. When execution reveals a previously unknown obstacle, the agent updates its model and replans from the current state, embodying the plan-and-solve pattern.
Practical applications of plan-and-solve AI agents
- Robotics (task and motion planning)
- Planner: Chooses symbolic actions (pick, place, open, move) to achieve a goal under preconditions/effects.
- Solver: Computes feasible motions/grasps/IK for each action and checks collisions.
- Example: A kitchen robot plans “open-fridge → grasp-bottle → place-on-tray,” then solves each step with RRT*/IK; if a door is blocked, it replans with an alternative path or order.
- Warehouse fulfillment and multi-robot coordination
- Planner: Assigns orders to robots and sequences picks/placements (HTN/PDDL).
- Solver: Multi-agent path finding (e.g., CBS) generates collision-free routes and timing.
- Example: Dozens of AMRs get aisle-level tasks; the solver schedules paths to avoid deadlocks and meets SLAs; when congestion spikes, the planner reorders tasks and triggers re-routing.
- Autonomous driving
- Planner: High-level route and behavioral plan (lane changes, merges, yields).
- Solver: Model predictive control or trajectory optimization respecting vehicle dynamics and constraints.
- Example: The car plans “merge after truck passes,” then solves a safe trajectory; a pedestrian detected unexpectedly triggers plan revision and a new solved trajectory.
- Logistics and fleet routing
- Planner: Allocates shipments, selects depots, sequences stops (VRP/VRPTW).
- Solver: MILP/CP-SAT computes routes with capacity, time windows, and driver rules.
- Example: A carrier plans daily routes, then solves optimal assignments; a road closure forces incremental re-optimization for affected vehicles only.
- Game AI (GOAP/HTN)
- Planner: Generates goal-oriented action plans for NPCs (e.g., “flank → suppress → retreat”).
- Solver: Pathfinding (A*), cover selection, combat micro-tactics.
- Example: An enemy NPC plans to flank the player, solves a path avoiding line-of-sight, and if the player blocks the route, the plan is repaired mid-execution.
- Manufacturing and scheduling
- Planner: Orders jobs, machine assignments, and changeovers to meet due dates.
- Solver: Constraint programming or MILP for detailed schedules with setup times and maintenance windows.
- Example: A fab plans a batch sequence; the solver fits it into finite-capacity schedules; a tool failure triggers localized rescheduling without scrapping the whole plan.
- Cloud operations and incident response
- Planner: Runbook-level steps (triage, rollback, canary, scale-out).
- Solver: Executes diagnostics, queries telemetry, and applies changes via APIs with verification gates.
- Example: During an outage, the agent plans “roll back → verify error rate → re-enable,” solves each step via scripts, and halts if SLOs fail to recover.
- Healthcare rostering and OR scheduling
- Planner: Staff assignments, operating room blocks, patient sequencing under policies.
- Solver: CP/MILP enforces skills, rest, equipment, and emergency buffers.
- Example: A hospital schedules the week; when an emergency case arrives, the agent replans swaps and solves a minimally disruptive new schedule.
- Scientific automation (labs)
- Planner: Designs experiment sequences and controls variables (DoE/HTN).
- Solver: Executes protocols on robots; optimizes parameters via BO/MPC.
- Example: An agent plans a synthesis route, runs liquid handler steps, and iteratively solves for yield-improving conditions based on assay feedback.
- Enterprise workflows and RPA
- Planner: Decomposes multi-step business processes (KYC, claims, onboarding).
- Solver: Fills forms, queries services, and verifies constraints.
- Example: For an insurance claim, the agent plans documents to gather, solves data extraction and validation steps, and re-plans when a document is missing.
Why it works:
The planner provides long-horizon structure and goal alignment; the solver ensures each step is feasible under real-world constraints. Together, they enable robust, adaptive behavior in dynamic, high-stakes environments.
Conclusion
Plan-and-solve agents pair an explicit planner (to search over action sequences using a model of the world) with an executor that monitors progress and replans as needed. This separation makes them strong on long-horizon, constraint-heavy tasks where transparency, correctness, and recoverability matter.
Key takeaways:
- Structure: Represent state/actions/goals → plan (e.g., A*, PDDL planner) → execute/monitor → replan on mismatch.
- Strengths: Sample efficiency, interpretability, global reasoning, safety via constraint checking.
- Trade-offs: Need a decent model, computational overhead, sensitivity to model errors.
- When to use: Robotics, logistics/scheduling, game AI, automated operations—especially when constraints and guarantees are important.
- Practical tip: Start simple (A* over a clean state space), add heuristics, then introduce uncertainty handling (replanning, MCTS, MPC) and learning for models/heuristics.