Plan-and-Solve AI Agents: How Planning Meets Execution (with code)

Note: This article is the result of executing a Plan-and-Solve type agent that was implemented during a tutorial. The tutorial itself is part of a post discussing autonomous agents and their design. I do not guarantee the content’s quality (I am not really fun of it). I have posted because I found the idea of, a solve-and-plan agent generating a post about itself, funny.

Plan-and-Solve AI Agents: How Planning Meets Execution

Introduction:

AI agents are systems that perceive, decide, and act to achieve goals. Plan-and-solve agents make this process explicit: they first construct a plan (a sequence of subgoals or actions) and then execute and adapt that plan to solve the task. By separating “what to do” (planning) from “how to do it” (solving/execution), they offer a transparent, reliable way to handle complex, multi-step problems.

Why this approach matters:

Reliability and transparency: Explicit plans are inspectable, debuggable, and auditable.
Generalization: Plans compose reusable skills to handle new tasks and changing goals.
Efficiency: Search and constraint-solving reduce trial-and-error common in purely reactive systems.
Safety and control: Clear constraints and checkpoints enable oversight and safe interventions.

Problems it addresses:

Long-horizon tasks with dependencies (e.g., robotics workflows, multi-hop tool use).
Combinatorial optimization (e.g., logistics, scheduling, routing).
Non-stationary environments requiring monitoring and replanning.
Multi-step reasoning in domains like code generation, game AI, and operations.

In what follows, we’ll unpack the core ideas behind plan-and-solve agents, show minimal code to illustrate planning and execution loops, and highlight practical applications

Theory: What are plan-and-solve AI agents?

Definition:

– Plan-and-solve agents are model-based agents that explicitly construct a plan using a world model and then execute it while monitoring the environment and replanning as needed. They differ from purely reactive agents (which map observations to actions directly) and from end-to-end learned policies (which typically do not expose an explicit plan).

Core loop:

Sense: Perceive/estimate current state.
Plan: Search/optimize over a model to produce a plan or policy to reach a goal under constraints.
Execute: Carry out actions, often via a low-level controller.
Monitor: Compare expected vs observed outcomes.
Replan: Update state/belief and replan on deviations or new information.

Problem formalization:

State space S, actions A, transition model T (deterministic or stochastic), costs/rewards, and goal set G or terminal utility.
Deterministic planning: search for a sequence a1:k minimizing cumulative cost to reach G (e.g., A*, D*, SAT/SMT, MILP).
Stochastic planning: plan a policy π(s) in an MDP; under partial observability maintain a belief b(s) and plan in a POMDP.
Constraints: hard (safety, resources) and soft (preferences), encoded in logic, linear constraints, or task schemas.

Key components:

World model: Dynamics, action preconditions/effects, cost/reward, constraints; represented symbolically (e.g., STRIPS/PDDL), probabilistically, or differentiably.
Planner: Search or optimization (A*, heuristic search, graph planning, SATPlan, HTN, MPC, sampling-based motion planners).
Hierarchy: High-level task planning (discrete) with low-level skills/controllers (continuous) for tractability.
Executor: Turns plan steps into concrete actions; handles timing, concurrency, and feedback control.
Monitor/replanner: Detects deviations, updates belief/state, triggers replanning (e.g., model predictive control/receding horizon).

How they work (flow):

Define the planning problem: initial state (or belief), goals, model, and constraints.
Generate a plan/policy using search/optimization guided by heuristics or value estimates.
Execute a prefix of the plan while monitoring outcomes.
If the world or goals change, or predicted/observed diverge, update and replan.

Variants and integrations:

Hierarchical Task Networks (HTN) and options/skills for long-horizon decomposition.
Task and Motion Planning (TAMP) to couple symbolic task plans with geometric/kinodynamic feasibility.
Learning-enhanced planning: learn models, heuristics, or skills; use RL for low-level control under a high-level planner.
LLM-assisted planning: language models propose task decompositions/plans; external tools verify/solve and execution monitors close the loop.

Advantages over other approaches:

Handles long-horizon, multi-step tasks with explicit constraints.
Interpretability and verifiability of plans; easier safety checks.
Sample efficiency via model-based reasoning; strong generalization across goals with the same model.
Composability: reuse of skills and subplans; clean integration of domain knowledge and hard constraints.

Trade-offs:

Requires a sufficiently accurate model; model errors can mislead plans.
Computational overhead for large, continuous, or highly uncertain spaces.
Needs robust monitoring and replanning to handle non-stationarity and partial observability.

Minimal Working Examples (Code):

Snippet 1 — Generic A* planner

Python

import heapq
from typing import Any, Callable, Dict, Iterable, List, Optional, Tuple

Action = Any
State = Any

def a_star(
    start: State,
    is_goal: Callable[[State], bool],
    neighbors: Callable[[State], Iterable[Tuple[Action, State, float]]],
    heuristic: Callable[[State], float],
) -> Optional[List[Action]]:
    """Return a list of actions from start to a goal, or None if no path."""
    open_heap = []
    g: Dict[State, float] = {start: 0.0}
    came_from: Dict[State, Tuple[State, Action]] = {}
    counter = 0

    heapq.heappush(open_heap, (heuristic(start), counter, start))
    closed = set()

    while open_heap:
        _, _, s = heapq.heappop(open_heap)
        if is_goal(s):
            return _reconstruct_actions(came_from, s)

        if s in closed:
            continue
        closed.add(s)

        for a, s2, cost in neighbors(s):
            new_g = g[s] + cost
            if s2 not in g or new_g < g[s2]:
                g[s2] = new_g
                came_from[s2] = (s, a)
                counter += 1
                heapq.heappush(open_heap, (new_g + heuristic(s2), counter, s2))
    return None

def _reconstruct_actions(came_from: Dict[State, Tuple[State, Action]], goal: State) -> List[Action]:
    actions: List[Action] = []
    s = goal
    while s in came_from:
        prev, a = came_from[s]
        actions.append(a)
        s = prev
    actions.reverse()
    return actions

import heapq
from typing import Any, Callable, Dict, Iterable, List, Optional, Tuple

Action = Any
State = Any

def a_star(
    start: State,
    is_goal: Callable[[State], bool],
    neighbors: Callable[[State], Iterable[Tuple[Action, State, float]]],
    heuristic: Callable[[State], float],
) -> Optional[List[Action]]:
    """Return a list of actions from start to a goal, or None if no path."""
    open_heap = []
    g: Dict[State, float] = {start: 0.0}
    came_from: Dict[State, Tuple[State, Action]] = {}
    counter = 0

    heapq.heappush(open_heap, (heuristic(start), counter, start))
    closed = set()

    while open_heap:
        _, _, s = heapq.heappop(open_heap)
        if is_goal(s):
            return _reconstruct_actions(came_from, s)

        if s in closed:
            continue
        closed.add(s)

        for a, s2, cost in neighbors(s):
            new_g = g[s] + cost
            if s2 not in g or new_g < g[s2]:
                g[s2] = new_g
                came_from[s2] = (s, a)
                counter += 1
                heapq.heappush(open_heap, (new_g + heuristic(s2), counter, s2))
    return None

def _reconstruct_actions(came_from: Dict[State, Tuple[State, Action]], goal: State) -> List[Action]:
    actions: List[Action] = []
    s = goal
    while s in came_from:
        prev, a = came_from[s]
        actions.append(a)
        s = prev
    actions.reverse()
    return actions

Snippet 2 — Grid domain (planning model)

Python

from typing import Iterable, Tuple, List, Set

Pos = Tuple[int, int]
Action = str  # \'U\',\'D\',\'L\',\'R\'

MOVES = {
    \'U\': (0, -1),
    \'D\': (0,  1),
    \'L\': (-1, 0),
    \'R\': (1,  0),
}

def manhattan(a: Pos, b: Pos) -> int:
    return abs(a[0] - b[0]) + abs(a[1] - b[1])

def make_neighbors(width: int, height: int, known_blocked: Set[Pos]) -> callable:
    def neighbors(s: Pos) -> Iterable[Tuple[Action, Pos, float]]:
        for a, (dx, dy) in MOVES.items():
            nx, ny = s[0] + dx, s[1] + dy
            s2 = (nx, ny)
            if 0 <= nx < width and 0 <= ny < height and s2 not in known_blocked:
                yield (a, s2, 1.0)
    return neighbors

# Example planning call:
# width, height = 6, 4
# start, goal = (0,0), (5,3)
# known_obstacles = {(2,1)}
# plan = a_star(
#     start=start,
#     is_goal=lambda s: s == goal,
#     neighbors=make_neighbors(width, height, known_obstacles),
#     heuristic=lambda s: manhattan(s, goal),
# )
# print("Plan:", plan)

from typing import Iterable, Tuple, List, Set

Pos = Tuple[int, int]
Action = str  # \'U\',\'D\',\'L\',\'R\'

MOVES = {
    \'U\': (0, -1),
    \'D\': (0,  1),
    \'L\': (-1, 0),
    \'R\': (1,  0),
}

def manhattan(a: Pos, b: Pos) -> int:
    return abs(a[0] - b[0]) + abs(a[1] - b[1])

def make_neighbors(width: int, height: int, known_blocked: Set[Pos]) -> callable:
    def neighbors(s: Pos) -> Iterable[Tuple[Action, Pos, float]]:
        for a, (dx, dy) in MOVES.items():
            nx, ny = s[0] + dx, s[1] + dy
            s2 = (nx, ny)
            if 0 <= nx < width and 0 <= ny < height and s2 not in known_blocked:
                yield (a, s2, 1.0)
    return neighbors

# Example planning call:
# width, height = 6, 4
# start, goal = (0,0), (5,3)
# known_obstacles = {(2,1)}
# plan = a_star(
#     start=start,
#     is_goal=lambda s: s == goal,
#     neighbors=make_neighbors(width, height, known_obstacles),
#     heuristic=lambda s: manhattan(s, goal),
# )
# print("Plan:", plan)

Snippet 3 — Plan-and-solve loop with replanning on surprises

Python

from typing import Set, Optional, List

class GridEnv:
    """The real world; may contain obstacles unknown to the agent initially."""
    def __init__(self, width: int, height: int, true_obstacles: Set[Pos]):
        self.w, self.h = width, height
        self.obstacles = set(true_obstacles)

    def blocked(self, p: Pos) -> bool:
        x, y = p
        if not (0 <= x < self.w and 0 <= y < self.h):
            return True
        return p in self.obstacles
def apply_action(s: Pos, a: Action) -> Pos:
    dx, dy = MOVES[a]
    return (s[0] + dx, s[1] + dy)

def plan_once(start: Pos, goal: Pos, width: int, height: int, known_blocked: Set[Pos]) -> Optional[List[Action]]:
    return a_star(
        start=start,
        is_goal=lambda s: s == goal,
        neighbors=make_neighbors(width, height, known_blocked),
        heuristic=lambda s: manhattan(s, goal),
    )

def execute_with_replanning(
    env: GridEnv,
    start: Pos,
    goal: Pos,
    known_blocked: Set[Pos],
) -> Optional[List[Action]]:
    """Execute plan; if an action fails due to unknown obstacle, update model and replan."""
    s = start
    full_trace: List[Action] = []
    while s != goal:
        plan = plan_once(s, goal, env.w, env.h, known_blocked)
        if plan is None:
            print("No path found. Giving up.")
            return None
        for a in plan:
            s2 = apply_action(s, a)
            if env.blocked(s2):
                # Surprise! Update model and replan from current state.
                known_blocked.add(s2)
                print(f"Discovered obstacle at {s2}. Replanning...")
                break
            # Action succeeded; move to next state.
            s = s2
            full_trace.append(a)
            if s == goal:
                return full_trace
        else:
            # We consumed the plan without surprises but haven\'t reached goal (shouldn\'t happen).
            # Loop will replan; safeguard to avoid infinite loop.
            if s == goal:
                return full_trace
    return full_trace

# Demo
if __name__ == "__main__":
    width, height = 6, 4
    start, goal = (0, 0), (5, 3)

    # Agent\'s initial belief (known obstacles)
    known_obstacles: Set[Pos] = {(2, 1)}

    # Real environment contains extra, unknown obstacles
    true_obstacles: Set[Pos] = {(2, 1), (3, 0), (3, 1)}
    env = GridEnv(width, height, true_obstacles)

    result = execute_with_replanning(env, start, goal, set(known_obstacles))
    print("Final action sequence:", result)

from typing import Set, Optional, List

class GridEnv:
    """The real world; may contain obstacles unknown to the agent initially."""
    def __init__(self, width: int, height: int, true_obstacles: Set[Pos]):
        self.w, self.h = width, height
        self.obstacles = set(true_obstacles)

    def blocked(self, p: Pos) -> bool:
        x, y = p
        if not (0 <= x < self.w and 0 <= y < self.h):
            return True
        return p in self.obstacles
def apply_action(s: Pos, a: Action) -> Pos:
    dx, dy = MOVES[a]
    return (s[0] + dx, s[1] + dy)

def plan_once(start: Pos, goal: Pos, width: int, height: int, known_blocked: Set[Pos]) -> Optional[List[Action]]:
    return a_star(
        start=start,
        is_goal=lambda s: s == goal,
        neighbors=make_neighbors(width, height, known_blocked),
        heuristic=lambda s: manhattan(s, goal),
    )

def execute_with_replanning(
    env: GridEnv,
    start: Pos,
    goal: Pos,
    known_blocked: Set[Pos],
) -> Optional[List[Action]]:
    """Execute plan; if an action fails due to unknown obstacle, update model and replan."""
    s = start
    full_trace: List[Action] = []
    while s != goal:
        plan = plan_once(s, goal, env.w, env.h, known_blocked)
        if plan is None:
            print("No path found. Giving up.")
            return None
        for a in plan:
            s2 = apply_action(s, a)
            if env.blocked(s2):
                # Surprise! Update model and replan from current state.
                known_blocked.add(s2)
                print(f"Discovered obstacle at {s2}. Replanning...")
                break
            # Action succeeded; move to next state.
            s = s2
            full_trace.append(a)
            if s == goal:
                return full_trace
        else:
            # We consumed the plan without surprises but haven\'t reached goal (shouldn\'t happen).
            # Loop will replan; safeguard to avoid infinite loop.
            if s == goal:
                return full_trace
    return full_trace

# Demo
if __name__ == "__main__":
    width, height = 6, 4
    start, goal = (0, 0), (5, 3)

    # Agent\'s initial belief (known obstacles)
    known_obstacles: Set[Pos] = {(2, 1)}

    # Real environment contains extra, unknown obstacles
    true_obstacles: Set[Pos] = {(2, 1), (3, 0), (3, 1)}
    env = GridEnv(width, height, true_obstacles)

    result = execute_with_replanning(env, start, goal, set(known_obstacles))
    print("Final action sequence:", result)

What this shows:

Planning: A* searches using a heuristic and a neighbor function derived from the agent’s current model.
Solving: An execution loop runs the plan. When execution reveals a previously unknown obstacle, the agent updates its model and replans from the current state, embodying the plan-and-solve pattern.

Practical applications of plan-and-solve AI agents

Robotics (task and motion planning)
- Planner: Chooses symbolic actions (pick, place, open, move) to achieve a goal under preconditions/effects.
- Solver: Computes feasible motions/grasps/IK for each action and checks collisions.
- Example: A kitchen robot plans “open-fridge → grasp-bottle → place-on-tray,” then solves each step with RRT*/IK; if a door is blocked, it replans with an alternative path or order.

Warehouse fulfillment and multi-robot coordination
- Planner: Assigns orders to robots and sequences picks/placements (HTN/PDDL).
- Solver: Multi-agent path finding (e.g., CBS) generates collision-free routes and timing.
- Example: Dozens of AMRs get aisle-level tasks; the solver schedules paths to avoid deadlocks and meets SLAs; when congestion spikes, the planner reorders tasks and triggers re-routing.

Autonomous driving
- Planner: High-level route and behavioral plan (lane changes, merges, yields).
- Solver: Model predictive control or trajectory optimization respecting vehicle dynamics and constraints.
- Example: The car plans “merge after truck passes,” then solves a safe trajectory; a pedestrian detected unexpectedly triggers plan revision and a new solved trajectory.

Logistics and fleet routing
- Planner: Allocates shipments, selects depots, sequences stops (VRP/VRPTW).
- Solver: MILP/CP-SAT computes routes with capacity, time windows, and driver rules.
- Example: A carrier plans daily routes, then solves optimal assignments; a road closure forces incremental re-optimization for affected vehicles only.

Game AI (GOAP/HTN)
- Planner: Generates goal-oriented action plans for NPCs (e.g., “flank → suppress → retreat”).
- Solver: Pathfinding (A*), cover selection, combat micro-tactics.
- Example: An enemy NPC plans to flank the player, solves a path avoiding line-of-sight, and if the player blocks the route, the plan is repaired mid-execution.

Manufacturing and scheduling
- Planner: Orders jobs, machine assignments, and changeovers to meet due dates.
- Solver: Constraint programming or MILP for detailed schedules with setup times and maintenance windows.
- Example: A fab plans a batch sequence; the solver fits it into finite-capacity schedules; a tool failure triggers localized rescheduling without scrapping the whole plan.

Cloud operations and incident response
- Planner: Runbook-level steps (triage, rollback, canary, scale-out).
- Solver: Executes diagnostics, queries telemetry, and applies changes via APIs with verification gates.
- Example: During an outage, the agent plans “roll back → verify error rate → re-enable,” solves each step via scripts, and halts if SLOs fail to recover.

Healthcare rostering and OR scheduling
- Planner: Staff assignments, operating room blocks, patient sequencing under policies.
- Solver: CP/MILP enforces skills, rest, equipment, and emergency buffers.
- Example: A hospital schedules the week; when an emergency case arrives, the agent replans swaps and solves a minimally disruptive new schedule.

Scientific automation (labs)
- Planner: Designs experiment sequences and controls variables (DoE/HTN).
- Solver: Executes protocols on robots; optimizes parameters via BO/MPC.
- Example: An agent plans a synthesis route, runs liquid handler steps, and iteratively solves for yield-improving conditions based on assay feedback.

Enterprise workflows and RPA
- Planner: Decomposes multi-step business processes (KYC, claims, onboarding).
- Solver: Fills forms, queries services, and verifies constraints.
- Example: For an insurance claim, the agent plans documents to gather, solves data extraction and validation steps, and re-plans when a document is missing.

Why it works:

The planner provides long-horizon structure and goal alignment; the solver ensures each step is feasible under real-world constraints. Together, they enable robust, adaptive behavior in dynamic, high-stakes environments.

Conclusion

Plan-and-solve agents pair an explicit planner (to search over action sequences using a model of the world) with an executor that monitors progress and replans as needed. This separation makes them strong on long-horizon, constraint-heavy tasks where transparency, correctness, and recoverability matter.

Key takeaways:

Structure: Represent state/actions/goals → plan (e.g., A*, PDDL planner) → execute/monitor → replan on mismatch.
Strengths: Sample efficiency, interpretability, global reasoning, safety via constraint checking.
Trade-offs: Need a decent model, computational overhead, sensitivity to model errors.
When to use: Robotics, logistics/scheduling, game AI, automated operations—especially when constraints and guarantees are important.
Practical tip: Start simple (A* over a clean state space), add heuristics, then introduce uncertainty handling (replanning, MCTS, MPC) and learning for models/heuristics.