How to Build an AI-Powered Constraint Optimizer with GPT-4o, FastAPI, and WebSockets
- 13 hours ago
- 12 min read

You have a scheduling problem. Twenty nurses across three shifts, ten constraints about who can't work back-to-back, four nurses who need weekends off, and a legal requirement that no one works more than 48 hours. You could model it in Excel, fight OR-Tools for a weekend, or pay for a commercial solver licence you'll use once.
Or you could type the problem in plain English and let an AI system figure it out.
That is exactly what the AI Constraint Optimizer does. It is a full-stack application that accepts natural language descriptions of optimization problems, uses GPT-4o to parse them into a structured representation, selects the best algorithm from a library of eight solvers, runs the solver in real-time while streaming a live narrated explanation to the browser, and then provides an interactive Q&A interface so users can interrogate the solution.
Real-world use cases it handles out of the box:
Shift scheduling — assign employees to morning/afternoon/night with fairness and regulatory constraints
Resource allocation — knapsack packing, budget distribution, task-to-worker assignment
Constraint satisfaction — graph colouring, seating arrangements, exam timetabling
Linear programming — minimising cost or maximising throughput on continuous variables
Educational demonstrations — watch a genetic algorithm evolve in real-time with narrated explanations
Rapid prototyping — explore a constraint model before investing in a full commercial solver
This post covers the architecture, the key design decisions, the implementation phases, and the non-obvious engineering challenges that took weeks to get right. It does not include the full source code — that is in the course at labs.codersarts.com.
📄 Before you dive in — grab the free PRD template that maps out this entire system: architecture, API spec, sprint plan, and system prompt. [Download the free PRD]
How It Works: Core Concept
The fundamental problem with constraint optimization
Constraint satisfaction and optimization problems have been studied for decades. The tooling (OR-Tools, CPLEX, Gurobi, scipy) is mature. But all of it assumes you can write a formal model: decision variables, constraint functions, an objective. That requires mathematical fluency. A hotel manager who knows perfectly well that "the breakfast chef can't start before 6 am and needs four hours between shifts" is not going to write an integer linear program.
The naive approach — just ask an LLM to solve the optimization directly — fails badly. LLMs hallucinate feasibility, ignore constraints they haven't been explicitly prompted about, and produce solutions that look plausible but violate the very constraints the user stated. They are language models, not constraint solvers.
The solution: LLM as translator, deterministic solver as engine
The Constraint Optimizer treats the LLM as a translator, not a solver. The LLM's job is to convert unstructured English into a structured Intermediate Representation (IR): a formal list of variables, their domains, the constraints between them, and the objective to optimise. Once the IR exists, a deterministic, seeded solver runs the actual optimisation. The LLM then narrates what the solver is doing in real-time, and explains the solution afterwards.
Think of it like a skilled interpreter at a UN meeting: the interpreter doesn't make the decisions — they translate with precision so the decision-making process can run reliably.
ASCII data-flow diagram
USER INPUT (natural language problem description)
│
▼
┌─────────────────────┐
│ PARSER AGENT │ GPT-4o (JSON mode)
│ NL → IR │ Extracts: variables, domains,
│ │ constraints, objective
└────────┬────────────┘
│ IR (Intermediate Representation)
▼
┌─────────────────────┐
│ SELECTOR AGENT │ GPT-4o analyzes IR structure
│ IR → Algorithm │ Recommends 1 of 8 solvers
└────────┬────────────┘
│ Solver name + reasoning
▼
┌────────────────────────────────────────────────────┐
│ EXECUTION ENGINE │
│ │
│ Thread Pool (sync solver) ←──────────────────┐ │
│ │ yields SolveEvents │ │
│ ▼ │ │
│ asyncio.Queue ──── Narration throttle ────────┘ │
│ │ (every Nth iteration) │
│ ▼ │
│ WebSocket stream → browser │
└────────────────────────────────────────────────────┘
│ SolutionResult
▼
┌─────────────────────┐
│ EXPLAINER AGENT │ GPT-4o streaming
│ Solution → prose │ + interactive Q&A
└─────────────────────┘
│
▼
BROWSER UI
(Narration feed, solution tabs, chat)
System Architecture Deep Dive
Architecture overview
The system is divided into five layers:
Frontend (Next.js / React) — A two-column layout with a problem input panel on the left and a narration feed + solution display on the right. A custom React hook (useOptimizer) owns all WebSocket state and exposes clean callbacks to components. Three solution tabs show grouped assignments, raw table, and constraint satisfaction matrix.
Backend API (FastAPI) — Handles both REST endpoints (health, solver listing, parse, select) and a WebSocket endpoint that drives the full pipeline end to end.
AI Agent Layer (OpenAI GPT-4o) — Four agents: Parser (NL → IR), Selector (IR → algorithm), Narrator (events → prose), Explainer (solution → explanation + Q&A). Each agent has a focused system prompt and operates independently.
Solver Engine (Pure Python) — Eight deterministic algorithms, all seeded, all yielding structured SolveEvent objects. The ConstraintChecker and ObjectiveEvaluator are shared utilities used by every solver.
Async/Sync Bridge (asyncio + ThreadPoolExecutor) — The most architecturally interesting layer: solvers are synchronous generators; the WebSocket handler is async. The bridge runs the solver in a thread pool and relays events through an asyncio.Queue using thread-safe calls.
Component reference table
Component | Role | Technology Options |
Problem input UI | Accept NL text, solver picker, example loader | React / Svelte / Vue |
State management | WebSocket state, streaming text accumulation | React hooks / Zustand / Jotai |
WebSocket client | Real-time bidirectional messaging | Native WS / socket.io-client |
REST API | Health, solver listing, parse, select endpoints | FastAPI / Flask / Django |
WebSocket server | Full pipeline orchestration, event dispatch | FastAPI / Starlette / Tornado |
Parser agent | NL → structured IR via JSON-mode LLM | GPT-4o / Claude / Gemini |
Selector agent | IR analysis → algorithm recommendation | GPT-4o / rule-based / ensemble |
Solver registry | 8 algorithm implementations, factory pattern | Pure Python / OR-Tools / PuLP |
Narrator agent | Solver events → streaming prose | GPT-4o streaming / local LLM |
Explainer agent | Solution → summary + Q&A | GPT-4o / Claude / RAG |
Data flow walkthrough
User submits a problem description (e.g., "Schedule 8 nurses across 3 shifts, no nurse works two consecutive days, minimum 2 nurses per shift").
WebSocket opens and the frontend sends {"type": "solve", "problem": "...", "solver": "auto"}.
Parser agent receives the text and calls GPT-4o in JSON mode. The model returns a structured IR with variables (nurse names), domains (["morning", "afternoon", "night"]), constraints (type all_different for nurses per shift, type max_count_per_value for minimums), and objective (minimize imbalance).
Backend emits {"type": "ir", "data": {...}} so the frontend can display the parsed structure.
Selector agent analyzes the IR — variable count, constraint types, domain sizes — and recommends a solver (e.g., forward_checking for small CSPs).
Backend emits {"type": "solver_selected", "solver": "forward_checking", "reason": "..."}.
Executor spins up the solver in a thread pool. The solver yields SolveEvent objects (STARTED, ITERATION, IMPROVEMENT, COMPLETED) into an asyncio.Queue.
Narration throttle checks each event against a configurable interval (default: narrate every 15th ITERATION plus all IMPROVEMENT events). Significant events trigger a streaming LLM narration call.
Browser receives narration_chunk messages that render as a live text feed, plus progress events that update iteration counters.
Solver completes and emits a SolutionResult — the full assignment, constraint satisfaction report, objective score, and feasibility flag.
Explainer agent streams a post-solve explanation, then enters Q&A mode where the user can ask follow-up questions about specific assignments.
Two non-obvious design decisions
Decision 1: Composite scoring (obj_value + violations × 1000). Every solver uses the same composite_score = objective_value + violations × 1000 formula. The large penalty constant ensures any feasible solution outscores any infeasible one, regardless of objective scale. This single framework lets you swap algorithms without changing scoring logic, and it prevents metaheuristics from converging on infeasible "good-looking" solutions.
Decision 2: Narration throttle at the executor level. Rather than building throttling into each solver or each narration call, the executor decides what is worth narrating based on event type and iteration count. This keeps solvers pure (no knowledge of narration), keeps the narrator pure (no knowledge of throttling), and makes the throttle interval configurable without touching either.
Tech Stack Recommendation
Stack A — Beginner / Prototype (build in a weekend)
Layer | Technology | Why |
Frontend | Next.js 15 + Tailwind CSS | Full-stack React, zero config, great TypeScript support |
Backend | FastAPI + Uvicorn | Async-native, automatic docs, tiny boilerplate |
LLM | OpenAI GPT-4o | Best JSON-mode reliability for parsing; pay-per-use |
Solvers | Pure Python (no dependencies) | No extra packages; backtracking + greedy for most CSPs |
Communication | Native WebSockets (FastAPI) | Built in, no extra infrastructure |
Storage | None (stateless) | Skip the database for V1; solutions are ephemeral |
Deployment | localhost / Render free tier | Free, instant; no Docker needed for prototype |
Estimated monthly cost (prototype): $5–$20 (OpenAI API calls only, assuming low usage)
Stack B — Production-ready (designed to scale)
Layer | Technology | Why |
Frontend | Next.js 15 + TypeScript + Tailwind | Type safety, ISR, optimized builds |
Backend | FastAPI + Uvicorn (multi-worker) | Concurrent WebSocket connections, async solvers |
LLM | OpenAI GPT-4o + prompt caching | Cache system prompts to cut token costs by 60–80% |
Solvers | Pure Python + NumPy | Simplex requires NumPy; keeps scipy out of the bundle |
Communication | WebSocket + REST (FastAPI) | WS for streaming, REST for stateless queries |
Auth | Clerk / Auth0 | JWT-protected endpoints, per-user rate limits |
Queue | Redis / asyncio.Queue | Per-connection queues; Redis for multi-server deployments |
Deployment | Docker + Railway / AWS ECS | Container isolation, horizontal scaling |
Monitoring | Sentry + Datadog | Error tracking + WebSocket connection metrics |
Storage | PostgreSQL (optional) | Session history, user problem library |
Estimated monthly cost (production, moderate traffic): $80–$200/month (compute + OpenAI API)
Implementation Phases
Phase 1: Problem Parsing Pipeline
Build the foundation: a FastAPI backend with a single POST endpoint that accepts a natural language problem description and returns a structured IR. The key work here is prompt engineering the parser. You need to design a JSON schema for the IR that covers all the constraint types you want to support (all_different, sum constraints, count-range constraints, forbidden pairs) and instruct GPT-4o to populate it reliably.
Key decisions:
Schema design: how granular should constraint types be?
How do you handle ambiguous or contradictory constraints?
Do you validate the IR or trust the model output?
How do you map natural language patterns ("no more than three") to constraint parameters?
Prompt engineering for reliable structured IR extraction — including the exact JSON schema, system prompt, and validation fallbacks — is covered in detail in the full course with working, tested code.
Phase 2: Solver Library and Constraint Framework
Implement the eight solvers and the shared ConstraintChecker / ObjectiveEvaluator utilities. Each solver must be a Python generator that yields SolveEvent objects — this contract is what lets the execution engine treat all solvers identically. Start with backtracking (the canonical CSP algorithm), then add greedy (useful baseline), then the metaheuristics.
Key decisions:
What event types should solvers yield, and with what frequency?
How do you implement partial constraint checking without full assignment?
What is the right penalty weight for the composite scoring formula?
How do you seed metaheuristics for reproducibility?
Implementing the composite scoring formula, the ConstraintChecker partial-check logic, and all eight solver generators — including the pure-NumPy simplex implementation — is covered in detail in the full course with working, tested code.
Phase 3: Async Execution Engine and WebSocket Streaming
This is the hardest phase. The solver runs synchronously (it is a generator in a loop); the WebSocket handler is async. You must bridge them without blocking the event loop. The solution uses Python's loop.run_in_executor() to run the solver in a thread pool, and loop.call_soon_threadsafe() to push events from the solver thread into an asyncio.Queue that the async handler drains.
Key decisions:
How do you signal end-of-events safely across the thread boundary?
What happens if the solver raises an exception in the thread pool?
How do you cancel a long-running solver when the WebSocket closes?
What is the right queue depth to prevent memory pressure?
The exact thread-pool/asyncio bridge pattern, including safe cancellation and exception propagation, is covered in detail in the full course with working, tested code.
Phase 4: Narration and AI Agent Integration
Wire in the narrator, explainer, and selector agents. The narrator must not block the solver — wrap every narration call in try/except, emit a narration_end sentinel even on failure, and always let the solver continue. The selector should run before the solver starts and its output should be visible to the user (they may want to override it).
Key decisions:
How do you throttle narration to avoid spending $5 narrating a 10,000-iteration solve?
How do you stream LLM output through WebSocket as it arrives?
How do you maintain context across Q&A turns without re-sending the full solution?
What system prompt makes the narrator explain algorithm behavior accessibly?
The full narration throttle logic, streaming explainer implementation, and multi-turn Q&A context management are covered in detail in the full course with working, tested code.
Phase 5: Frontend and Solution Visualization
Build the Next.js frontend with three solution views (grouped, table, constraint matrix), a live narration feed, and a chat interface. The most complex piece is state management: a single WebSocket connection sends many event types, and the React state must update incrementally as chunks arrive. A custom hook (useOptimizer) owns the WebSocket and exposes clean typed callbacks to components.
Key decisions:
How do you accumulate streaming text chunks without re-rendering the entire feed?
How do you show constraint satisfaction status with visual pass/fail indicators?
How do you allow solver selection (dropdown) without blocking the automatic recommendation?
How do you handle WebSocket reconnection on network drops?
The full frontend implementation — including the useOptimizer hook, streaming text accumulation, and the three solution view components — is covered in detail in the full course with working, tested code.
Common Challenges
1. The async/sync solver bridge freezes the event loop
Problem: Calling a synchronous generator directly inside an async WebSocket handler blocks the event loop. The entire server hangs while the solver runs — no other connections are served, no messages can be sent.
Root cause: Python's asyncio event loop is single-threaded. Blocking calls (including tight loops in synchronous generators) prevent await from yielding control.
Fix: Use loop.run_in_executor(None, runsolver_thread) to run the solver in a thread pool. Communicate events back via asyncio.Queue using loop.call_soon_threadsafe(queue.put_nowait, event). The async handler drains the queue with await queue.get().
2. LLM returns malformed or partial IR
Problem: GPT-4o occasionally returns JSON that is syntactically valid but semantically incomplete — missing constraint parameters, empty variable lists, or constraint types the solver does not recognize.
Root cause: JSON mode guarantees valid JSON, not valid schema. Ambiguous problem descriptions give the model room to guess.
Fix: Validate the parsed IR against a strict Pydantic schema before passing it to the selector. Return a user-facing error that echoes the missing fields so the user can clarify their problem description.
3. Narration calls time out and stall the solve
Problem: An OpenAI API call inside the narration logic times out or raises a rate-limit error. If not handled, this propagates up and crashes the entire solve pipeline.
Root cause: The narrator is called synchronously (in async context), and unhandled exceptions bubble up through async for.
Fix: Wrap the entire narration block in try/except Exception: pass. Always emit a narration_end sentinel in a finally block. The solver and solution delivery are completely independent of narration success.
4. Metaheuristics give different results each run
Problem: Genetic and simulated annealing solvers use random number generators. The same problem input produces different solutions on different runs, making debugging and testing unreliable.
Root cause: Unseeded RNGs sample from the system entropy pool.
Fix: Every metaheuristic solver accepts a seed parameter and initializes via np.random.default_rng(seed). Same seed + same input = identical output. Default seed can be a hash of the IR for deterministic-by-default behavior.
5. Constraint partial checking misses edge cases
Problem: Backtracking explores branches that are obviously infeasible, wasting iterations. Early pruning (partial constraint checking) helps, but implementing it incorrectly prunes valid branches and returns no solution when one exists.
Root cause: Partial checks must be sound (never prune valid branches) but need not be complete (they can miss some invalid ones). Implementing the wrong logic inverts this: it prunes valid branches while allowing invalid ones through.
Fix: Each partial constraint check must return False (provably violated) only when the current partial assignment already guarantees a violation regardless of future assignments. When in doubt, return True (cannot yet determine). Test extensively against problems with known solutions.
6. The simplex solver diverges on unbounded problems
Problem: A user describes a problem without upper bounds on variables. The simplex tableau pivots indefinitely (or until the iteration cap).
Root cause: Unbounded linear programs have no finite optimal — the objective can be improved forever.
Fix: Detect the unbounded case during the pivot step (no valid pivot column exists that improves the objective without unbounded growth). Return a FAILED event with message "Problem is unbounded — add upper bound constraints".
7. Frontend text chunks render out of order
Problem: Streaming narration_chunk messages arrive via WebSocket. React state updates are batched. A fast solver generates many events quickly, and the rendered text appears scrambled — chunks arrive but are appended in the wrong order.
Root cause: React 18+ batches setState calls inside event handlers. Multiple rapid state updates resolve in a single render pass, and intermediate appends are lost.
Fix: Use the functional form of setState (prev => prev + chunk) so each update is based on the latest state, not a stale closure capture. Accumulate chunks in a ref and sync to state at render boundaries for high-frequency updates.
Solving these issues took us over 60 hours of testing across all eight solvers and a dozen problem types — the course walks you through each fix with working code, tests, and the exact prompts that made the LLM agents reliable.
Ready to Build This Yourself?
Understanding the architecture is the easy part. The hard part is the 60+ hours of debugging: getting GPT-4o to return a schema-valid IR every time, wiring the async/sync bridge without freezing the event loop, calibrating the narration throttle, and making the frontend accumulate streaming chunks without losing state.
The full course at labs.codersarts.com/constraint-optimizer closes that gap. Here is exactly what you get:
✅ Complete source code — every file, every solver, every agent, every component
✅ Step-by-step video tutorials — build each phase from scratch with explanations
✅ All eight solver implementations — backtracking, forward-checking, branch-and-bound, simplex, greedy, local-search, genetic, simulated annealing
✅ Prompt engineering guide — the exact system prompts for Parser, Selector, Narrator, and Explainer agents
✅ Docker + deployment walkthrough — Dockerfiles, environment configuration, Railway/Render deploy steps
✅ Tested IR schema — Pydantic models covering all constraint types with validation
✅ Lifetime access + updates — all future solver additions and LLM upgrades included
✅ Community support — Discord channel with Codersarts mentors
$30. Everything above.
Need the full system built for your team, or want a guided walkthrough on your specific optimization problem? Book a 1:1 session with the Codersarts team for $20/hour — we will pair-program through your use case end to end.
Conclusion
The AI Constraint Optimizer demonstrates a pattern that will become common: LLM as language-to-logic translator, deterministic algorithms as the execution engine. The LLM handles what it is genuinely good at (understanding natural language, explaining reasoning, selecting approaches); the solver handles what the LLM cannot do reliably (consistent, verifiable, optimal constraint satisfaction).
The simplest viable stack to start with is Python + FastAPI + GPT-4o for the backend, Next.js for the frontend, and the backtracking solver for your first CSP. Once that pipeline works end to end, adding the remaining solvers and narration is a matter of extending a clean interface.
The full source code, prompts, and video tutorials are available at labs.codersarts.com/constraint-optimizer. Start there and have a working optimizer running by the end of the weekend.



Comments