How to Build a Unit Test Generation Agent with LangGraph, AST Parsing, and a Validation Loop

21 hours ago
17 min read

Every development team knows their test coverage is not good enough. They know it the same way they know the docs are out of date and the TODO comments in the legacy module should have been fixed six months ago — with the specific combination of clarity and resignation that comes from problems that are genuinely important but economically easy to defer.

Unit tests lose to feature development because the payoff is deferred. The benefit of a well-tested codebase is felt in the refactor you can make without fear, in the regression you catch before it reaches production, in the engineer who joins six months later and can actually understand what a function is supposed to do. None of these benefits appear on the sprint board.

The economic argument for testing has not changed. What has changed is the cost of writing tests.

The Unit Test Generation Agent is a LangGraph-powered autonomous pipeline that scans a codebase, extracts every function signature using AST parsing, generates a complete unit test suite tailored to the detected language and framework, runs the tests in a sandboxed subprocess, iteratively refines the failures, and produces a coverage gap report — all without human involvement beyond pointing it at a directory.

The generated tests are not stubs. They are complete: happy path cases with type-correct arguments, edge cases for boundary values (empty lists, zero, None, max integers), and expected exception cases where the function signature or docstring indicates error conditions should be tested. External dependencies — database clients, HTTP libraries, cloud SDK calls — are mocked automatically so tests pass without any network access or infrastructure.

Real-world use cases this application handles:

Python developers auto-generating pytest suites for new or legacy modules before a refactor
JavaScript/TypeScript developers generating Jest or Vitest tests for utility functions and React components
Engineering team leads raising test coverage across a legacy codebase to meet a pre-release coverage threshold
AI engineers studying LangGraph's validation loop pattern and conditional retry routing
Technical founders building a proprietary code-quality tool for internal use or commercial distribution
DevOps engineers integrating automated test generation into CI/CD pipelines with coverage threshold gating
CS students learning how AST parsing, LLM code generation, and test runner orchestration interact in a production-quality system

This article covers the core concept, the AST parsing and mock generation patterns, the LangGraph validation loop, the implementation phases, and the most common challenges. Full source code is available in the complete course at labs.codersarts.com.

📄 Before you dive in — grab the free PRD template that maps out this entire system: architecture, API spec, sprint plan, and system prompt. [Download the free PRD]

How It Works: Core Concept

The concept powering this system is a stateful generation-validation-refinement loop where each iteration produces better tests than the last.

Most AI code generation tools are single-shot: prompt in, code out. The problem with single-shot generation for unit tests is that the generated code must not just look plausible — it must actually run. A test that imports the wrong module, passes an argument of the wrong type, or misunderstands an async function signature will fail. The only way to know if a generated test is correct is to run it.

LangGraph provides the stateful graph structure for the iterative loop. The test_runner node executes the generated tests and writes the results — per-test pass/fail/error status — to graph state. The router node evaluates the results: if all tests pass, it routes to coverage_analyzer; if failures remain and the retry count has not been exhausted, it routes to refinement_node. The refinement_node receives the failing test, the error message, and the original FunctionSignature, and generates a corrected version. The corrected test replaces the original in the test file, and the loop continues.

Why AST parsing instead of asking the LLM to read the file. Sending a full source file to the LLM and asking it to generate tests is expensive (tokens × file count), unreliable (the LLM may hallucinate function names or miss methods), and untraceable (you cannot audit which functions were tested and which were not). Structured AST extraction produces a FunctionSignature record — name, parameters with type annotations, return type, docstring, decorators, async flag, line range — for every function in the codebase, programmatically and verifiably. The LLM receives a clean, structured JSON representation of what to test, not a raw source file.

Why mock generation is a separate node. A test that calls a function that calls a database will fail in the validation loop unless the database is mocked. If you rely on the LLM to infer the mock requirements from the function body, it will sometimes miss them and sometimes mock things that do not need to be mocked. The mock_generator node analyses the function's import graph programmatically — it knows exactly which external libraries are imported — and generates the correct mock definitions before the test generator runs. The test generator then uses the provided mock definitions rather than guessing.



FULL GENERATION PIPELINE:

  User provides: target_path + framework (pytest / jest / vitest)
          │
          ▼
  [CODEBASE SCANNER NODE]
  Recursive directory traversal
  Python: ast.parse → FunctionSignature extraction
  JS/TS: ts-morph compiler API → FunctionSignature extraction
  Excluded: test files, node_modules, __pycache__, .venv, dist
  Output: list[FunctionSignature] + import_graph
          │
          ▼
  [MOCK GENERATOR NODE]
  Analyse import_graph for external deps:
    SQLAlchemy, psycopg2, httpx, requests, boto3,
    OpenAI client, Stripe, Prisma, Mongoose, axios, fetch
  Generate: unittest.mock.patch (Python) / jest.mock() (JS)
  Output: mock_definitions keyed by function_id
          │
          ▼
  [TEST GENERATOR NODE]
  Batch: 10 FunctionSignatures per LLM call
  Framework template: loaded from YAML registry
  Generates per function: happy path + edge case + exception case
  Pre-validation: ast.parse before writing to disk
  Output: test files written to {module}_test.py / {module}.test.ts
          │
          ▼
  [TEST RUNNER NODE]
  Execute: pytest / npx jest / npx vitest
  Subprocess timeout: 60 seconds
  Parse output: pass / fail / error / timeout per test case
  Output: test_results → graph state
          │
          ▼
  [ROUTER NODE]
  ├── all passing → [COVERAGE ANALYZER]
  │                  coverage.py / c8 → CoverageReport + gap list
  │                  → TestGenReport → END
  │
  └── failures + retry_count < max_retries (default: 3)
          │
          ▼
  [REFINEMENT NODE]
  Input: failing_test + failure_message + FunctionSignature
         + full failure history (all prior attempts)
  gpt-4o (temperature 0.1)
  Generate: corrected test replacing the failing version
          │
          └──────────────────────→ [TEST RUNNER NODE]  (retry loop)

  After max_retries: mark as UNRESOLVED → [COVERAGE ANALYZER]

System Architecture Deep Dive

The Unit Test Generation Agent has eight layers. The key design principle is that the LLM is in the generation and refinement path only — all parsing, execution, and coverage measurement is done by deterministic tools.

Layer 1 — Next.js 15 Web UI. Repository path input, real-time streaming generation log (function discovered → tests generated → test result → coverage complete), coverage treemap visualisation built with D3 showing file-level coverage percentages, test file download links, and session history.

Layer 2 — FastAPI + WebSocket Gateway. Session management, WebSocket streaming of StreamEvent objects to the UI, subprocess orchestration (launching pytest/Jest in isolated subprocesses), file I/O for writing generated test files, and LangGraph invocation.

Layer 3 — LangGraph Orchestration Engine. The StateGraph with 7 nodes, the conditional retry edge between test_runner and refinement_node, the SqliteSaver checkpointer (or PostgresSaver in production), and stream_mode="values" for real-time UI streaming.

Layer 4 — Scanner Nodes. Two scanners: Python's ast module for .py files (zero external dependency, ships with Python), and ts-morph for .ts / .tsx / .js / .jsx files (TypeScript Compiler API wrapper). Both produce the same FunctionSignature schema. The import graph builder extracts all import statements per file to feed the mock generator.

Layer 5 — Generation Nodes. The test generator (LLM batch, 10 functions per call), the mock generator (rule-based + import graph analysis), and the refinement node (LLM correction with full failure history in context).

Layer 6 — Validation Nodes. The test runner subprocess manager, output parser (structured regex + LLM fallback for complex tracebacks), the coverage runner (subprocess calling coverage.py or c8), and the CoverageReport builder.

Layer 7 — AI Layer (OpenAI gpt-4o). Test generation (temperature 0.2), refinement (temperature 0.1). Framework-specific system prompts loaded from a YAML template registry — one prompt per framework (pytest, unittest, jest, vitest), cached across all calls in a session.

Layer 8 — Data Layer. SQLite (LangGraph checkpointer + session history), file system (generated test files written alongside source), JSON exports (TestGenReport, CoverageReport).

Architecture Table

Layer	Component	Role
1	Next.js 15 Web UI	Path input, streaming log, coverage treemap, file download
2	FastAPI + WebSocket	Session management, subprocess orchestration, event streaming
3	LangGraph StateGraph	7-node graph, conditional retry loop, checkpointer
4	Scanner Nodes	Python ast, ts-morph, import graph builder
5	Generation Nodes	Test generator (LLM batch), mock generator, refinement node
6	Validation Nodes	Test runner subprocess, output parser, coverage runner
7	OpenAI gpt-4o	Test generation and failure correction
8	SQLite + File System	Session state, generated test files, JSON reports

The AST Parsing Pattern

The codebase_scanner node is the foundation of the entire system. Every subsequent node — the mock generator, the test generator, the coverage analyser — works from the FunctionSignature records it produces.



import ast
from pathlib import Path
from dataclasses import dataclass, field

@dataclass
class Parameter:
    name: str
    type_annotation: str | None
    default_value: str | None
    is_optional: bool

@dataclass
class FunctionSignature:
    id: str
    file_path: str
    module_name: str
    function_name: str
    class_name: str | None
    parameters: list[Parameter]
    return_type: str | None
    docstring: str | None
    decorators: list[str]
    is_async: bool
    line_start: int
    line_end: int
    language: str = "python"
    detected_dependencies: list[str] = field(default_factory=list)

def extract_signatures(file_path: Path) -> list[FunctionSignature]:
    """
    Parse a Python source file and extract all function/method signatures.
    Returns a FunctionSignature for every top-level function and class method.
    """
    source = file_path.read_text(encoding="utf-8")
    try:
        tree = ast.parse(source, filename=str(file_path))
    except SyntaxError as e:
        # Log and skip — do not crash the scanner on one bad file
        logger.warning(f"SyntaxError in {file_path}: {e}")
        return []

    signatures = []
    for node in ast.walk(tree):
        if isinstance(node, (ast.FunctionDef, ast.AsyncFunctionDef)):
            sig = FunctionSignature(
                id=f"{file_path}:{node.name}:{node.lineno}",
                file_path=str(file_path),
                module_name=file_path.stem,
                function_name=node.name,
                class_name=_get_class_name(node, tree),
                parameters=_extract_parameters(node),
                return_type=_annotation_to_str(node.returns),
                docstring=ast.get_docstring(node),
                decorators=[ast.unparse(d) for d in node.decorator_list],
                is_async=isinstance(node, ast.AsyncFunctionDef),
                line_start=node.lineno,
                line_end=node.end_lineno,
            )
            signatures.append(sig)

    return signatures

ast.unparse() converts decorator AST nodes back to their string representation — essential for detecting @pytest.mark.skip, @staticmethod, @property, and custom decorators that affect how tests should be structured.

The is_async field drives a mandatory instruction in the test generation prompt: async functions must receive async test cases with pytest.mark.anyio or await in the test body. This is one of the most common single-shot generation failures, and catching it at the FunctionSignature level prevents it entirely.

The Validation Loop Pattern

The validation loop is the LangGraph pattern that separates this system from a single-shot code generator. The key implementation details are the conditional edge and the failure history accumulation.

The Conditional Edge



def route_after_test_runner(state: TestGenState) -> str:
    failing = [r for r in state["test_results"] if r.status in ("FAIL", "ERROR", "TIMEOUT")]
    if not failing:
        return "coverage_analyzer"
    if state["retry_count"] >= state["max_retries"]:
        # Mark all remaining failures as UNRESOLVED before coverage analysis
        state["unresolved"] = [r.function_id for r in failing]
        return "coverage_analyzer"
    return "refinement_node"

graph.add_conditional_edges("router", route_after_test_runner, {
    "coverage_analyzer": "coverage_analyzer",
    "refinement_node": "refinement_node",
})

Failure History in Refinement Context

The most important implementation detail in the refinement node is including the full history of prior attempts. Without it, the loop oscillates:



async def refinement_node(state: TestGenState) -> dict:
    failing_tests = [r for r in state["test_results"]
                     if r.status in ("FAIL", "ERROR", "TIMEOUT")]

    corrected = []
    for failure in failing_tests:
        sig = get_signature(state, failure.function_id)

        # Build full failure history — prevents oscillation
        prior_attempts = [
            f"Attempt {i+1}: {h['error']}"
            for i, h in enumerate(failure.history)
        ]

        prompt = f"""
The following test for `{sig.function_name}` has failed {len(prior_attempts)} time(s).

FUNCTION SIGNATURE:
{sig.to_json()}

CURRENT FAILING TEST:
{failure.current_test_code}

CURRENT ERROR:
{failure.error_message}

PRIOR FAILED ATTEMPTS (do NOT repeat these approaches):
{chr(10).join(prior_attempts)}

Generate a corrected test that avoids all prior failure modes.
If the function cannot be tested due to an unfixable dependency issue,
output exactly: MARK_UNRESOLVED
"""
        response = await llm.generate(prompt, temperature=0.1)
        if response.strip() == "MARK_UNRESOLVED":
            failure.status = "UNRESOLVED"
        else:
            failure.current_test_code = response
            corrected.append(failure)

    # Write corrected tests back to disk
    rewrite_test_files(corrected, state["test_file_paths"])
    return {"retry_count": state["retry_count"] + 1, "test_results": state["test_results"]}

The MARK_UNRESOLVED escape hatch is important. Some functions genuinely cannot be tested in isolation: functions that require a running database and have no mockable boundary, functions with side effects that are structurally untestable, or functions that wrap C extensions without Python introspection. The LLM should be empowered to recognise this and stop wasting iterations rather than generating increasingly creative but still-failing tests.

Implementation Phases

Phase 1: AST Scanner and FunctionSignature Extraction

Implement the Python AST scanner as a standalone module before touching LangGraph. The scanner is the most testable component in the system — you can validate its output against known Python files without any LLM or subprocess involvement. Write scanner unit tests first, using a fixture directory of Python files covering edge cases: nested functions, class methods, async def, lambda expressions (excluded from generation — they have no name), decorated functions, and functions with args / *kwargs.

Key decisions to make:

Exclusion patterns: test_*.py, *_test.py, conftest.py, init.py (unless they contain substantive functions), setup.py, migration files — define the exclusion list explicitly and make it configurable via .testgen.yaml
Nested function handling: scan nested functions only if they are named and longer than the configured minimum LOC threshold; trivial inner helpers are poor test targets
ast.parse version handling: Python 3.10+ introduces match statements and X | Y union type annotations; use libcst as a fallback parser for files that fail ast.parse to avoid dropping files from the scan

Building the scanner benchmark — 50 Python files, 500 functions, verified FunctionSignature output — is the gate for Sprint 1 in the full course.

Phase 2: Mock Generator and Test Generator

Build the mock generator before the test generator. The mock generator is deterministic — it does not call an LLM — so you can validate its output against known import patterns before the test generator depends on it.

Key decisions to make:

Import graph analysis depth: analyse only direct imports of the module under test (shallow), not transitive imports across the entire codebase; transitive analysis is O(codebase) and produces too many false positives for mock generation
Known dependency matching: maintain a YAML registry of known external libraries and their mock patterns; SQLAlchemy → unittest.mock.patch("sqlalchemy.orm.Session"), httpx → unittest.mock.patch("httpx.AsyncClient.get"), boto3 → unittest.mock.patch("boto3.client") — this registry is extensible for organisation-specific internal clients
Batch size for the test generator: 10 functions per LLM call is the calibrated default; validate the token count for each batch before sending — if a batch of 10 large functions exceeds 4,000 tokens, split it into two batches of 5

Building the YAML mock registry and the batch size validator that splits oversized batches is covered in detail in the full course.

Phase 3: Validation Loop and Refinement Node

The validation loop is the most architecturally interesting part of the system. Wire the LangGraph conditional edge, implement the subprocess runner with timeout enforcement, and build the output parser before writing the refinement node.

Key decisions to make:

Output parsing strategy: pytest and Jest produce structured output in JSON mode (pytest --json-report, jest --json); prefer JSON output over text parsing for reliability; fall back to regex-based parsing for frameworks that do not support JSON output
Syntactic pre-validation: run ast.parse() (Python) or ts.transpileModule() (TypeScript) on the generated file before launching the subprocess; a SyntaxError in the test file produces a confusing subprocess error that is harder to parse in the refinement prompt than a clean "SyntaxError at line 23"
Refinement failure history format: include the error message from each prior attempt in a numbered list in the refinement prompt; the LLM must see all prior failures to avoid repeating them
Oscillation detection: if retry_count >= 2 and the test is still failing on the same error type (same exception class), switch to a "strip and simplify" strategy — remove the complex assertion and replace with a basic assert callable(function_name) — rather than attempting another full correction

Testing the oscillation detection against a set of benchmark functions that reliably trigger multi-attempt loops is covered in detail in the full course.

Phase 4: Coverage Analysis and CLI Interface

The coverage integration is straightforward once the validation loop is working — both coverage.py and c8 produce JSON output that maps directly to the CoverageReport schema. The CLI is the highest-impact delivery surface for most developers.

Key decisions to make:

Exit code design: exit code 0 (all tests pass, coverage threshold met) is the only "green" state for CI; exit code 1 (tests pass but coverage below threshold) must be distinct from exit code 2 (test failures after max retries) because they require different responses in a CI pipeline
Coverage threshold default: 80% function coverage is the default; branches and lines are also reported but the threshold gate is applied to function coverage because it is the most actionable metric for this tool (uncovered functions become the gap report)
Gap report prioritisation: sort uncovered functions by lines of code descending; a 200-line function with no tests is a higher priority than a 3-line helper; surface the top 10 uncovered functions in the CLI summary with their file path and LOC

Building the GitHub Action template that posts a coverage delta comment to pull requests is covered in detail in the full course.

Phase 5: JavaScript/TypeScript Support and Dashboard

Adding JS/TS support is primarily a scanner problem — ts-morph handles the AST extraction, and the jest/vitest framework templates are loaded from the YAML registry. The Next.js dashboard adds the visual layer that makes the tool accessible to non-CLI users.

Key decisions to make:

JSX in .js files: detect JSX syntax with a pre-scan regex before passing the file to ts-morph; configure {allowJs: true, jsx: ts.JsxEmit.React} on the ts-morph Project for JSX files to prevent parse errors
ESM vs CommonJS mock syntax: jest.mock() works differently in ESM and CommonJS environments; detect the module system from package.json ("type": "module" indicates ESM) and generate the appropriate mock syntax
Coverage treemap: a D3 treemap where each rectangle is a source file, area encodes LOC, and colour encodes coverage percentage (red = 0–50%, amber = 50–80%, green = 80–100%); clicking a rectangle filters the gap report to that file

Building the D3 coverage treemap component and wiring it to the LangGraph COVERAGE_COMPLETE stream event is covered in detail in the full course.

Common Challenges

1. Generated test imports a module not installed in the test environment.

Root cause: The LLM generates a syntactically correct import for a library that appears in the source file but is not installed in the Docker test execution container (e.g. an optional dependency installed only in production). The subprocess fails with ModuleNotFoundError before running a single test.

Fix: Before writing the generated test file to disk, validate every import statement using importlib.util.find_spec() (Python) or require.resolve() (Node.js). Replace any unresolvable import with the corresponding mock definition from the mock registry. Include this as an explicit rule in the system prompt: "Only import modules that appear in the AVAILABLE_IMPORTS list."

2. The refinement loop oscillates between two failure modes.

Root cause: Iteration 1 fixes an ImportError by adding a mock but introduces an AssertionError. Iteration 2 fixes the AssertionError but removes the mock, reintroducing the ImportError. The loop cycles without net progress.

Fix: Include the complete failure history in every refinement prompt (all prior error messages, numbered). After two iterations with the same error class, switch strategy: remove the complex assertion and substitute with assert {function_name} is not None (Python) or expect({function_name}).toBeDefined() (Jest). This "simplify and pass" fallback produces a test that is less useful but does not block coverage analysis.

3. Async function tests pass trivially without await.

Root cause: The LLM generates a synchronous test for an async def function. When run, the test assigns the coroutine object (not its return value) to the result variable and asserts against the coroutine, which is truthy. The test passes but tests nothing.

Fix: The is_async field in FunctionSignature is the gate. Include a mandatory instruction in the system prompt: "If is_async is true, generate an async test function using @pytest.mark.anyio (Python) or async () => { await ... } (Jest). Never test an async function with a synchronous test case." Add a post-generation validator that scans for async def in the source signature and def test_ (non-async) in the generated test — flag this as a generation error before the subprocess runs.

4. Large files with 100+ functions exceed the token budget.

Root cause: A monolithic Python module with 120 functions in a single class generates a batch prompt that exceeds the configured 4,000-token batch budget, causing the LLM call to fail or return truncated output.

Fix: Enforce a hard batch size of 10 functions per LLM call. Sort functions by LOC ascending within each batch so simpler functions are processed first. For very large classes, generate tests for public methods only (exclude methods prefixed with _) in Phase 1; add private method coverage in Phase 2 via the incremental mode.

5. Test runner subprocess hangs on a generated test that enters an infinite loop.

Root cause: A generated test for a recursive function accidentally constructs an input that triggers infinite recursion, or a test for a polling function calls the real function without mocking the sleep/wait dependency. The subprocess hangs indefinitely.

Fix: The 60-second subprocess timeout is the primary safeguard. On timeout, kill the subprocess, mark the test as TIMEOUT_FAILURE, and feed the timeout reason to the refinement node with an explicit instruction: "This test timed out. The function likely has a sleep, polling loop, or recursion. Ensure all sleep/wait/recursion dependencies are mocked, and add recursion_limit or timeout parameters if applicable."

6. coverage.py does not attribute coverage to the source file when the test uses import . Root cause: When the generated test uses from my_module import , coverage.py may not correctly trace line execution back to my_module.py, producing inaccurate coverage numbers. Fix: Enforce a strict import convention in the system prompt and validate it in the syntactic pre-validation step: always use from my_module import specific_function — never wildcard imports. The validation regex is: if import * is found in the generated test file, replace with a specific function import before writing to disk.

7. ts-morph fails on JSX in .js files. Root cause: React component files written as .js (not .jsx) with JSX syntax cause ts-morph to throw a parse error unless JSX mode is explicitly enabled, because .js files are not assumed to contain JSX by the TypeScript compiler. Fix: Pre-scan each .js file with a lightweight regex: /<[A-Z][a-zA-Z]*|return\s*\(\s*</.test(source). If the pattern matches, instantiate the ts-morph Project with {allowJs: true, jsx: ts.JsxEmit.React} before adding the file. This adds < 5ms per file and prevents incorrect parse errors from dropping React component files from the scan.

8. Generated pytest tests use assertEquals instead of assert ==.

Root cause: The LLM occasionally generates unittest.TestCase-style assertions (assertEquals, assertRaises) even when the framework is configured as pytest. This produces AttributeError: 'function' object has no attribute 'assertEquals' in pytest, which is confusing to parse in the refinement loop.

Fix: Add a post-generation linter that scans for assertEquals, assertTrue, assertRaises and replaces them with pytest equivalents (assert a == b, assert condition, with pytest.raises(Error)). This linter runs after every LLM generation call and before the syntactic pre-validation step — it is a 10-line regex replacement that prevents an entire class of refinement loop iterations.

Ready to Build This Yourself?

Understanding the architecture is not the same as having a working testgen binary that your CI pipeline can call. The gap between this article and a deployed Unit Test Generation Agent includes: a calibrated YAML template registry that produces syntactically valid tests for your specific framework, a mock registry tuned to your dependency stack, a validation loop that converges on your codebase's patterns, and a CLI with the correct exit codes for your pipeline.

The Unit Test Generation Agent course on labs.codersarts.com gives you everything you need:

✅ Full source code for all 5 sprints — LangGraph backend + FastAPI + Next.js dashboard, fully commented

✅ Python AST scanner with FunctionSignature extraction, all edge cases handled (async, decorators, nested classes)

✅ pytest + Jest + Vitest framework templates with YAML template registry

✅ Validation loop with 3-iteration retry logic and oscillation detection

✅ Mock generator for SQLAlchemy, httpx, axios, boto3, OpenAI client, and Stripe — plus a custom dependency registry for your internal clients

✅ coverage.py + c8 integration with gap report and coverage treemap visualisation

✅ CLI with all documented flags, CI/CD exit codes, and a GitHub Action template

✅ LangSmith tracing setup — see per-node token cost and retry convergence rate per session

✅ Docker Compose + Railway deployment guide

✅ Lifetime access — including JavaScript framework additions as the ecosystem evolves

$30 one-time. Everything above.

Get the Full Course → labs.codersarts.com

Need to adapt this to your specific tech stack — a custom internal framework, a language not yet supported, or an unusual dependency pattern? Book a 1:1 session at $20/hour — work through the template registry, mock generator configuration, and validation loop calibration for your codebase with the Codersarts team. Book any number of hours, no package required.

Conclusion

The Unit Test Generation Agent is an eight-layer system: a Next.js dashboard, a FastAPI WebSocket gateway, a LangGraph stateful graph with seven nodes, two AST scanners (Python ast and ts-morph), a rule-based mock generator, a batched LLM test generator, a subprocess validation loop with iterative refinement, and a coverage analysis pipeline. The key architectural insight is the separation of deterministic parsing from probabilistic generation: the AST scanner always produces accurate FunctionSignature records, the mock generator always produces correct import patterns, and the LLM is responsible only for the creative work — generating the test logic — within a structured context it cannot hallucinate around.

The simplest starting point is Stack A: LangGraph + FastAPI + OpenAI + Python ast + pytest + coverage.py + SQLite. Target a single Python module with three to five functions and no external dependencies. You can have a working validation loop that generates passing pytest tests from a function signature, runs them, and produces a coverage report — in a weekend.

When you are ready to move from architecture to working code, the full course is at labs.codersarts.com — complete source, YAML template registry, mock generator, and CLI included. Self-paced course at $30. Guided sessions at $20/hour.

You may also be interested in these blogs:

How to Build a Unit Test Generation Agent with LangGraph, AST Parsing, and a Validation Loop

How It Works: Core Concept

System Architecture Deep Dive

Architecture Table

The AST Parsing Pattern

The Validation Loop Pattern

The Conditional Edge

Failure History in Refinement Context

Implementation Phases

Phase 1: AST Scanner and FunctionSignature Extraction

Phase 2: Mock Generator and Test Generator

Phase 3: Validation Loop and Refinement Node

Phase 4: Coverage Analysis and CLI Interface

Phase 5: JavaScript/TypeScript Support and Dashboard

Common Challenges

Ready to Build This Yourself?

Conclusion

Recent Posts

Comments