top of page

Build a Customer Feedback Analyzer with OpenClaw and OpenAI

  • 2 hours ago
  • 20 min read

Introduction


Most “build an AI agent” tutorials show the happy path: write a skill, register it, call it, done. What they skip is the part where the agent confidently does the wrong thing anyway, in a different way every single time you try again, and you have to figure out why. This tutorial is the version that doesn’t skip that part.


We build a customer feedback analyzer using OpenClaw, an orchestration layer that dispatches commands to registered skills, paired with OpenAI for the actual analysis. You upload a batch of star-rated reviews through a small local web page, and a registered OpenClaw skill parses the ratings, asks OpenAI for a themes-and-complaints report, renders a rating trend chart, tracks token usage and cost, and writes a full execution trace.







What We Are Building


A local web upload form backed by an OpenClaw skill. The workflow:


  1. Upload a feedback file through a simple web page running on 127.0.0.1

  2. Parse every line into a structured date, rating, and review text

  3. Summarize the rating distribution and isolate every low-rated review

  4. Ask OpenAI to write a themes-and-complaints report

  5. Compute an exact rating summary and a per-review positive or negative label, in Python, not asked of the model

  6. Render a chart of average daily rating over time

  7. Track token usage and real dollar cost for every model call, accumulated across every run

  8. Write a full audit trail of every step, for transparency




Tech Stack


Component

Tool

Orchestration

OpenClaw (skill registry and execution dispatcher)

Model

OpenAI gpt-4o-mini

Charting

Matplotlib

Web upload server

Python’s built-in http.server

Environment

A project-local venv, with a shared .env loader for both Windows (PowerShell) and macOS/Linux (bash) launchers




Project Structure


openclaw_log_analyzer/
├── src/
│   ├── main.py              # parses feedback, queries OpenAI, renders the chart, tracks cost, writes the trace
│   └── web_assistant.py     # local upload server that invokes the OpenClaw skill per request
├── examples/
│   └── sample_feedback.txt  # sample feedback with a realistic rating dip, for testing
├── .openclaw-local/
│   └── openclaw.json        # points OpenClaw at OpenAI's built-in provider for agent reasoning
├── SKILL.md                 # defines the local-feedback-analyzer skill OpenClaw dispatches to main.py
├── run_gateway.ps1          # Windows launcher for the OpenClaw gateway, loads .env automatically
├── run_web.ps1              # Windows launcher for the upload server, loads .env automatically
├── load_env.ps1             # shared .env loader used by both Windows launcher scripts
├── run_gateway.sh           # macOS/Linux launcher for the OpenClaw gateway, loads .env automatically
├── run_web.sh               # macOS/Linux launcher for the upload server, loads .env automatically
├── load_env.sh              # shared .env loader used by both macOS/Linux launcher scripts
├── runs/                    # one folder per analysis: report, chart, and trace
├── stats.json               # token usage and cost, accumulated across every analysis ever run
├── .env                     # OPENAI_API_KEY
└── requirements.txt




Setting Up OpenClaw


OpenClaw’s installer targets macOS and Linux by default, but it also ships a dedicated Windows path through PowerShell, installed via npm, requiring no administrator privileges:


iwr -useb https://openclaw.ai/install.ps1 | iex
openclaw onboard


During onboarding, skip the model and auth provider step for now, every bundled marketplace skill (GitHub, Gemini, Whisper, and the rest), every chat channel integration, and web search. None of those are needed when the only interface is a local web page and the only model is configured directly in a project-local config file.




Configuring OpenClaw to Use OpenAI


Create a file named openclaw.json inside a .openclaw-local folder in the project root.



{
  "gateway": {
    "mode": "local"
  },
  "models": {
    "mode": "merge",
    "providers": {
      "openai": {
        "models": [
          {
            "id": "gpt-4o-mini",
            "name": "GPT-4o mini",
            "reasoning": false,
            "input": ["text"],
            "cost": { "input": 0.15, "output": 0.6, "cacheRead": 0, "cacheWrite": 0 },
            "contextWindow": 128000,
            "contextTokens": 100000,
            "maxTokens": 16384
          }
        ]
      }
    }
  },
  "agents": {
    "defaults": {
      "model": "openai/gpt-4o-mini"
    }
  },
  "tools": {
    "web": {
      "search": {
        "enabled": false
      },
      "fetch": {
        "enabled": true
      }
    }
  }
}



gateway.mode must be present and set to "local", or the gateway refuses to start with a message about a missing or clobbered config; this and the rest of this file’s shape were confirmed by running openclaw config schema and validating against a real JSON Schema validator, not guessed from a tutorial description. The models block looks like more work than it should be, since OpenAI is one of OpenClaw’s natively bundled providers and should not need a manual model catalog at all. In practice, starting the gateway with only agents.defaults.model set to "openai/gpt-4o-mini" failed with FailoverError: Unknown model: openai/gpt-4o-mini, because OpenClaw’s own bundled catalog for that provider did not recognize the exact model string being requested. mode: "merge" keeps every one of OpenClaw’s built-in defaults for the openai provider, including authentication, and layers this explicit model declaration on top, the same fix used later in this tutorial for a fully custom provider, just applied here to patch a gap in a bundled one.



Create a file named .env in the project root:



OPENAI_API_KEY=your_openai_api_key_here


Both OPENCLAW_CONFIG_PATH and OPENAI_API_KEY are environment variables scoped to a single terminal session. Rather than re-typing $env:... assignments in every new PowerShell window, create a file named load_env.ps1 that both launcher scripts share:



function Import-DotEnv {
    param([string]$Path = (Join-Path $PSScriptRoot ".env"))   # defaults to the .env next to this script
    if (-not (Test-Path $Path)) { return }                     # silently skip if no .env exists
    Get-Content $Path | ForEach-Object {
        if ($_ -match '^\s*([^#=\s][^=]*)\s*=\s*(.*)\s*$') {   # skip blank lines and lines starting with #
            $name  = $matches[1].Trim()
            $value = $matches[2].Trim().Trim('"').Trim("'")    # strip optional surrounding quotes
            Set-Item -Path "env:$name" -Value $value           # exports it as a real environment variable
        }
    }
}


This is plain PowerShell, not anything OpenClaw provides, since we’re injecting the variable ourselves before ever invoking openclaw, sidestepping any question of whether OpenClaw’s own Node process reads .env natively.


On macOS or Linux, create a file named load_env.sh instead, with the same behavior as a bash function:



import_dotenv() {                                            # bash equivalent of load_env.ps1's Import-DotEnv function
    local env_file="${1:-$(dirname "${BASH_SOURCE[0]}")/.env}"  # defaults to .env next to this script
    [ -f "$env_file" ] || return 0                               # silently skip if no .env exists

    while IFS='=' read -r key value; do
        case "$key" in
            ''|'#'*) continue ;;                                  # skip blank lines and lines starting with #
        esac
        key="$(echo "$key" | xargs)"                              # trim surrounding whitespace
        value="$(echo "$value" | xargs)"                          # trim surrounding whitespace
        value="${value%\"}"; value="${value#\"}"                  # strip optional surrounding double quotes
        value="${value%\'}"; value="${value#\'}"                  # strip optional surrounding single quotes
        export "$key=$value"                                      # exports it as a real environment variable
    done < "$env_file"
}


Same idea as the PowerShell version: skip comments and blank lines, trim quotes, export each variable into the current shell so whatever runs openclaw next inherits it.




Defining the Skill


Create a file named SKILL.md in the project root.



---
name: local-feedback-analyzer
description: "Parses a batch of customer feedback, asks OpenAI for a themes-and-complaints report, and renders a rating trend chart, with a full execution trace."
---

# Local Feedback Analyzer

Analyze a batch of star-rated customer feedback: parse ratings, ask OpenAI for a themes-and-complaints report, render a rating trend chart, and write a full execution trace.

## Quick start

Whenever asked to analyze a feedback file, run this exact command:

```bash
python {baseDir}/src/main.py --feedback /path/to/feedback.txt --analysis-id some-id
```

Replace `/path/to/feedback.txt` with the feedback file path you were given, and `some-id` with the analysis id you were given (or a short random id if none was provided).

## Output

After the command finishes, three files exist in `{baseDir}/runs/<analysis-id>/`:

- `feedback_report.md`, key themes, top complaints, and recommended actions
- `sentiment_trend.png`, a chart of average daily rating over time
- `tool_trace.json`, the full execution trace

Read `feedback_report.md` and report its contents back to whoever asked.

## Notes

- The feedback file format is one review per line: `YYYY-MM-DD|RATING|review text`
- Always run the actual command above. Never write a placeholder, summary, or simulated result file yourself, only the script's real output counts.
- The default model is `gpt-4o-mini`. Only pass `--model` if a different model was explicitly requested.
- Requires `OPENAI_API_KEY` to be set in the environment before this command is run.
- This is a request to perform a task, not a request to modify this skill. Never call `skill_workshop`, never propose edits to this `SKILL.md`, and never read the feedback file yourself to write your own summary. The only valid action is running the command above and reporting its real output.


That last bullet under Notes was not part of the original design. It was added after the agent repeatedly tried to call skill_workshop, a tool meant for creating and editing skill definitions, instead of running anything. The full story of why, and why this one bullet alone did not fully fix it, is in its own section below, since it turned out to be the central problem of the entire project.


This file also went through an earlier, completely different design before reaching the plain-markdown shape shown above. The first version used frontmatter fields called command-dispatch: tool, command-tool: exec, and command-arg-mode: raw, based on a tutorial’s description of how to make a slash command bypass the model and dispatch directly to a deterministic tool. After fixing several unrelated config and CLI issues, that version still produced an agent that replied with a confident, plausible-sounding completion message and never ran anything at all.


Listing OpenClaw’s own bundled skills settled the question: not one of them, weather, video frame extraction, meme generation, uses command-dispatch at all. Every working bundled skill relies on plain markdown instructions in the skill body with a {baseDir}/... command example, and lets the model read those instructions and decide to run a generic shell tool itself. That is the version shown above.




Parsing and Summarizing Feedback


Create a file named main.py inside a src folder. The first part of this file turns a raw feedback file into the numbers the report and chart actually need.



import argparse                          # parse CLI args passed by the OpenClaw skill invocation
import json                              # write tool_trace.json
import os                                # read OPENAI_API_KEY from the environment
import re                                # parse structured fields out of raw feedback lines
import time                              # measure model call latency for stats.json
from collections import Counter, defaultdict  # tally ratings and bucket averages by day
from datetime import datetime            # timestamp trace events
from pathlib import Path                 # filesystem paths for inputs and outputs
from typing import Any, Dict, List, Tuple
import urllib.request                    # call OpenAI's HTTP API without extra dependencies

import matplotlib                        # render the sentiment trend chart
matplotlib.use("Agg")                    # headless backend, no display server needed on a local machine
import matplotlib.pyplot as plt

PROJECT_ROOT = Path(__file__).resolve().parent.parent         # repo root, one level above src/
FEEDBACK_LINE_PATTERN = re.compile(                             # matches "YYYY-MM-DD|RATING|review text"
    r"^(?P<date>\d{4}-\d{2}-\d{2})\|(?P<rating>[1-5])\|(?P<text>.*)$"
)
OPENAI_ENDPOINT = "https://api.openai.com/v1/chat/completions"  # OpenAI's chat completions endpoint
COST_RATES = {                                                   # USD per token, keyed by model name, overridable via .env
    "gpt-4o-mini": {
        "input":  float(os.environ.get("GPT_4O_MINI_INPUT_COST",  0.00000015)),
        "output": float(os.environ.get("GPT_4O_MINI_OUTPUT_COST", 0.00000060)),
    },
}


def record_trace_event(trace_events: List[Dict[str, Any]], category: str, action: str, message: str) -> None:
    trace_events.append({                # one audit-trail entry per step of the pipeline
        "timestamp": datetime.now().isoformat(),   # when this step happened
        "category": category,            # e.g. "fs", "model", "chart"
        "action": action,                # e.g. "read", "generate", "render"
        "message": message,              # human-readable detail for the trace file
    })


def parse_feedback_entries(feedback_path: Path, trace_events: List[Dict[str, Any]]) -> List[Dict[str, str]]:
    record_trace_event(trace_events, "fs", "read", f"Reading feedback file: {feedback_path}")  # log the read before it happens
    entries: List[Dict[str, str]] = []   # structured {date, rating, text} dicts
    with open(feedback_path, "r", encoding="utf-8", errors="replace") as f:  # tolerate odd byte sequences in real exports
        for raw_line in f:
            match = FEEDBACK_LINE_PATTERN.match(raw_line.strip())  # try to parse the date|rating|text format
            if match:
                entries.append(match.groupdict())              # keep only lines that match the expected format
    record_trace_event(trace_events, "fs", "parse", f"Parsed {len(entries)} structured feedback entries")  # how much survived parsing
    return entries


def summarize_feedback(entries: List[Dict[str, str]]) -> Dict[str, Any]:
    rating_counts = Counter(e["rating"] for e in entries)            # total count per star rating, 1 through 5
    ratings_by_day: Dict[str, List[int]] = defaultdict(list)         # all ratings seen on each calendar day
    for entry in entries:
        ratings_by_day[entry["date"]].append(int(entry["rating"]))   # group ratings by the day they were left
    daily_avg_rating = {                                              # mean rating per day, used for the trend chart
        day: sum(ratings) / len(ratings) for day, ratings in ratings_by_day.items()
    }
    negative_reviews = [                                              # full text of low-rated reviews, for the prompt
        e["text"] for e in entries if int(e["rating"]) <= 2
    ]
    return {
        "rating_counts": dict(rating_counts),     # e.g. {"5": 9, "4": 2, "2": 4, "1": 2}
        "daily_avg_rating": daily_avg_rating,     # e.g. {"2024-06-04": 1.6, "2024-06-06": 4.5}
        "negative_reviews": negative_reviews,     # 1-2 star review text, used in the model prompt
    }


FEEDBACK_LINE_PATTERN accepts exactly one format, a date, a single digit rating, and free text, separated by pipes. Lines that do not match are silently skipped rather than raising an error, which means a stray blank line or header row in a real export will not crash the whole analysis. summarize_feedback does all of the actual statistics in plain Python: a Counter for the rating distribution, a defaultdict to group ratings by day before averaging, and a simple list comprehension to isolate every 1 or 2 star review for the model prompt later. COST_RATES is defined here, near the top of the file, because it is consulted every time a model call is logged for stats.json, covered further down.




Calling OpenAI and Rendering the Chart


The next part of main.py talks to OpenAI directly and turns the daily averages into a chart.



def query_openai_model(model_name: str, prompt: str) -> Tuple[str, Dict[str, Any]]:
    api_key = os.environ.get("OPENAI_API_KEY")    # read at call time so .env loaded by the caller is picked up
    if not api_key:
        raise RuntimeError("OPENAI_API_KEY is not set. Add it to .env before running this script.")

    payload = json.dumps({                        # OpenAI's chat completions request body
        "model": model_name,                      # which model to use, e.g. "gpt-4o-mini"
        "messages": [{"role": "user", "content": prompt}],  # single-turn request, no system message needed
        "temperature": 0.3,                       # low but not zero, for consistent yet natural-sounding prose
        "metadata": {                             # tags visible in the OpenAI dashboard usage logs
            "dev_name":    "Ganesh",
            "project":     "codex-test",
            "environment": "local",
            "purpose":     "testing",
        },
    }).encode("utf-8")
    request = urllib.request.Request(             # build the HTTP POST request
        OPENAI_ENDPOINT, data=payload,
        headers={
            "Content-Type":  "application/json",
            "Authorization": f"Bearer {api_key}",  # OpenAI auth via bearer token
        },
    )
    start = time.monotonic()                                          # wall-clock start, for stats.json latency
    with urllib.request.urlopen(request, timeout=120) as response:  # generous headroom for a slow network or long report
        body = json.loads(response.read().decode("utf-8"))           # OpenAI returns {"choices": [...], "usage": {...}}
    latency_seconds = round(time.monotonic() - start, 3)

    content = body["choices"][0]["message"]["content"].strip()  # the model's generated text, whitespace trimmed
    usage = dict(body.get("usage", {}))                          # real prompt/completion/total token counts from OpenAI
    usage["latency_seconds"] = latency_seconds
    return content, usage


def render_sentiment_trend(daily_avg_rating: Dict[str, float], output_path: Path) -> None:
    days = sorted(daily_avg_rating.keys())               # chronological order along the x-axis
    averages = [daily_avg_rating[d] for d in days]       # matching average rating for each day

    plt.figure(figsize=(10, 4))                          # wide, short chart suited to a time series
    plt.plot(days, averages, marker="o", color="#9333ea")  # one point per day, connected by a line
    plt.axhline(y=3, color="gray", linestyle="--", linewidth=1)  # neutral-rating reference line for context
    plt.title("Average Customer Rating Over Time")        # chart title
    plt.xlabel("Date")                                    # x-axis label
    plt.ylabel("Average Rating (1-5)")                    # y-axis label
    plt.ylim(1, 5)                                         # fixed scale matches the 1-5 star rating range
    plt.xticks(rotation=45, ha="right")                   # angle the date labels so they don't overlap
    plt.tight_layout()                                    # avoid clipping the rotated labels
    plt.savefig(output_path)                              # write the PNG to disk
    plt.close()                                           # release the figure from memory


query_openai_model reads OPENAI_API_KEY at call time rather than at import time, specifically so that whatever loaded .env before this script ran, whether that is load_env.ps1 or a test harness setting the variable directly, is respected.


It returns a tuple now instead of a plain string: the report text the model wrote, and a dictionary of real token counts straight from OpenAI’s own usage field, plus the measured latency. Nothing here estimates tokens; the numbers come directly from the API response.


render_sentiment_trend fixes the y-axis to the 1 through 5 star range so the chart’s shape is always comparable across different uploads, and the gray dashed line at 3 stars gives a constant visual reference for what “neutral” looks like.




Computing a Rating Summary and a Per-Review Breakdown


The model’s report covers themes and complaints, but it was never actually asked to report back the rating numbers it was given, and trusting an LLM to recount data it already received exactly, rather than just discuss it, is not a good trade. These two functions compute that part directly in Python instead.



def format_rating_summary(rating_counts: Dict[str, int]) -> str:
    # Computed directly from parsed data rather than asked of the model — these are exact
    # counts we already have correctly, not something worth trusting an LLM to recount.
    total = sum(rating_counts.values())
    positive = sum(rating_counts.get(str(r), 0) for r in (4, 5))   # 4-5 stars
    neutral = rating_counts.get("3", 0)                            # 3 stars
    negative = sum(rating_counts.get(str(r), 0) for r in (1, 2))   # 1-2 stars
    breakdown = ", ".join(f"{r} star: {rating_counts.get(str(r), 0)}" for r in (5, 4, 3, 2, 1))
    return (
        "## Rating Summary\n"
        f"- Total reviews: {total}\n"
        f"- Positive (4-5 stars): {positive}\n"
        f"- Neutral (3 stars): {neutral}\n"
        f"- Negative (1-2 stars): {negative}\n"
        f"- Breakdown: {breakdown}\n"
    )


def format_individual_reviews(entries: List[Dict[str, str]]) -> str:
    # Shows the actual input alongside its classification, one line per review, so the
    # report isn't just aggregate numbers — the label is derived the same way the
    # aggregate counts are (4-5 stars positive, 3 neutral, 1-2 negative), not by the model.
    lines = ["## Individual Reviews"]
    for entry in entries:
        rating = int(entry["rating"])
        label = "Positive" if rating >= 4 else "Negative" if rating <= 2 else "Neutral"
        lines.append(f"- {entry['date']} | {rating} star | {label} | \"{entry['text']}\"")
    return "\n".join(lines) + "\n"


format_rating_summary produces the aggregate counts: how many reviews, how many fell into each bucket, and the exact per-star breakdown. format_individual_reviews goes one level lower, listing every single review next to its own label, computed the same way as the aggregate so the two sections can never disagree with each other. Both get prepended to the model’s narrative before the report is written to disk, so the final file shows exact numbers first and the model’s interpretation of them second.




Tracking Token Usage and Cost


Every model call now gets logged to stats.json at the project root, with totals that accumulate across every analysis ever run, not just the current one.



def build_call_record(model_name: str, usage: Dict[str, Any], analysis_id: str) -> Dict[str, Any]:
    prompt_tok = usage.get("prompt_tokens", 0)
    completion_tok = usage.get("completion_tokens", 0)
    total_tok = usage.get("total_tokens", prompt_tok + completion_tok)
    rates = COST_RATES.get(model_name, {"input": 0, "output": 0})  # unknown models cost $0 rather than raising
    input_cost = prompt_tok * rates["input"]
    output_cost = completion_tok * rates["output"]
    return {
        "timestamp":         datetime.now().isoformat(),
        "analysis_id":       analysis_id,
        "model":             model_name,
        "prompt_tokens":     prompt_tok,
        "completion_tokens": completion_tok,
        "total_tokens":      total_tok,
        "input_cost":        round(input_cost, 7),
        "output_cost":       round(output_cost, 7),
        "total_cost":        round(input_cost + output_cost, 7),
        "latency_seconds":   usage.get("latency_seconds", 0),
    }


def summarize_calls(calls: List[Dict[str, Any]]) -> Dict[str, Any]:
    return {
        "total_calls":             len(calls),
        "total_prompt_tokens":     sum(c["prompt_tokens"] for c in calls),
        "total_completion_tokens": sum(c["completion_tokens"] for c in calls),
        "total_tokens":            sum(c["total_tokens"] for c in calls),
        "total_cost":              round(sum(c["total_cost"] for c in calls), 6),
    }


def record_stats(stats_path: Path, call_record: Dict[str, Any]) -> None:
    try:                                                          # load history written by previous runs so stats accumulate
        existing = json.loads(stats_path.read_text(encoding="utf-8"))
        all_calls = existing.get("calls", [])
    except (FileNotFoundError, json.JSONDecodeError):
        all_calls = []                                              # first run — start with empty history

    all_calls.append(call_record)
    output = {
        "run_info": {
            "timestamp": datetime.now().isoformat(),               # when stats.json was last written
            **summarize_calls(all_calls),                          # lifetime totals across every call ever recorded
        },
        "calls": all_calls,                                         # every individual call ever recorded
    }
    stats_path.parent.mkdir(parents=True, exist_ok=True)
    stats_path.write_text(json.dumps(output, indent=2), encoding="utf-8")


build_call_record turns one model call’s real token usage into a record with USD costs already computed, using the per-token rates defined in COST_RATES. record_stats reads whatever history already exists in stats.json, appends this call, recomputes lifetime totals across every call ever recorded, and writes the file back out. The first time this runs, there is no existing file, so the except branch just starts from an empty list, the same accumulate-across-runs pattern used for token and cost tracking in the other projects in this series.




Writing the Report and Trace


The rest of main.py ties parsing, the model call, the rating summary, the stats tracking, and the chart together, then writes everything to disk.



FEEDBACK_REPORT_PROMPT = """
You are a customer experience analyst reviewing a batch of customer feedback. Given the star
rating distribution and the negative reviews below, write a short feedback analysis.

Rating distribution (1-5 stars): {rating_counts}
Negative reviews (rating 2 stars or below):
{negative_reviews}

Respond in markdown with three sections: "## Key Themes", "## Top Complaints", "## Recommended Actions".
""".strip()


def run_analysis(feedback_path: Path, results_dir: Path, model_name: str, stats_path: Path) -> None:
    trace_events: List[Dict[str, Any]] = []                        # accumulates every step for tool_trace.json

    entries = parse_feedback_entries(feedback_path, trace_events)   # structured {date, rating, text} rows
    summary = summarize_feedback(entries)                            # rating counts, daily averages, negative reviews

    prompt = FEEDBACK_REPORT_PROMPT.format(                          # fill the report-writing prompt with real data
        rating_counts=summary["rating_counts"],
        negative_reviews="\n".join(f"- {text}" for text in summary["negative_reviews"]) or "- (none found)",
    )
    record_trace_event(trace_events, "model", "generate", f"Requesting feedback analysis from {model_name}")  # before the call
    report_body, usage = query_openai_model(model_name, prompt)      # the model's narrative analysis, plus real token usage

    call_record = build_call_record(model_name, usage, results_dir.name)  # results_dir.name is the analysis_id
    record_stats(stats_path, call_record)                            # accumulate token/cost history across all runs

    rating_summary = format_rating_summary(summary["rating_counts"])  # exact counts, prepended ahead of the model's prose
    individual_reviews = format_individual_reviews(entries)           # raw input plus per-review positive/negative label
    full_report = f"{rating_summary}\n{individual_reviews}\n{report_body}"

    results_dir.mkdir(parents=True, exist_ok=True)                   # ensure the per-run output directory exists

    report_path = results_dir / "feedback_report.md"                 # markdown report path
    report_path.write_text(full_report, encoding="utf-8")            # write the rating summary plus the model's narrative
    record_trace_event(trace_events, "fs", "write", f"Wrote {report_path}")  # after the write succeeds

    chart_path = results_dir / "sentiment_trend.png"                  # chart output path
    render_sentiment_trend(summary["daily_avg_rating"], chart_path)    # draw and save the rating trend chart
    record_trace_event(trace_events, "chart", "render", f"Wrote {chart_path}")  # after the chart is saved

    trace_path = results_dir / "tool_trace.json"                      # audit trail output path
    trace_path.write_text(json.dumps(trace_events, indent=2), encoding="utf-8")  # full step-by-step record, written last

    print(f"Analysis complete. Wrote {report_path}, {chart_path}, and {trace_path}")  # explicit stdout signal that this succeeded


def main() -> None:
    parser = argparse.ArgumentParser(description="Customer feedback analyzer")  # CLI entry point invoked by the OpenClaw skill
    parser.add_argument("--feedback", required=True, type=Path)       # path to the feedback file to analyze
    parser.add_argument("--model", default="gpt-4o-mini")               # OpenAI model name
    parser.add_argument("--analysis-id", required=True)                 # unique id for this run, used in the output path
    parser.add_argument("--output-dir", type=Path, default=None)        # base runs/ folder; overrides PROJECT_ROOT when this
                                                                           # script is invoked from an OpenClaw-installed copy
    parser.add_argument("--stats-path", type=Path, default=None)        # stats.json location; same override reason as --output-dir
    args = parser.parse_args()

    runs_base = args.output_dir if args.output_dir is not None else PROJECT_ROOT / "runs"  # explicit dir wins when given
    results_dir = runs_base / args.analysis_id                         # per-run output folder, keeps runs from colliding
    stats_path = args.stats_path if args.stats_path is not None else PROJECT_ROOT / "stats.json"  # explicit path wins when given
    run_analysis(args.feedback, results_dir, args.model, stats_path)


if __name__ == "__main__":
    main()


Two things here exist specifically because of failures discovered later, not because they were part of the original design. The print statement at the end of run_analysis exists because this script used to write three files silently and exit with no output at all, and OpenClaw’s own tool wrapper treated a perfectly successful run with empty stdout as an ambiguous, sometimes failed, result. The --output-dir and --stats-path arguments exist because openclaw skills install runs this script from a copied location, not the original project folder, and without an explicit override, every result would land somewhere far harder to find than where it was uploaded from. Both are explained in detail in the sections that follow.




The Web Upload Server


Create a file named web_assistant.py, also inside src. This is the only thing a person actually interacts with directly; everything else runs invisibly behind it.



import http.server                       # minimal stdlib HTTP server, no extra dependencies needed
import shlex                             # quote arguments safely inside the single --message string
import shutil                            # resolve the openclaw executable's real path, including its extension
import socketserver                      # TCP server base used to host AnalyzerRequestHandler
import subprocess                        # invoke the OpenClaw CLI as a subprocess for each upload
import sys                               # detect Windows to handle .cmd/.bat shims correctly
import uuid                              # generate a unique analysis_id per upload
from pathlib import Path

PROJECT_ROOT = Path(__file__).resolve().parent.parent   # repo root, one level above src/
UPLOADS_DIR = PROJECT_ROOT / "uploads"                    # where incoming feedback files are written
HOST, PORT = "127.0.0.1", 8765                            # local-only — never bound to a public interface

# Where "openclaw skills install ... --as local-feedback-analyzer" actually copies the project.
# Computed directly rather than relying on the model to recall {baseDir} from SKILL.md, since
# testing showed the model does not reliably look up or recall the skill's own instructions —
# it has guessed at fictional script filenames instead of reading the real documented command.
INSTALLED_SKILL_DIR = Path.home() / ".openclaw" / "workspace" / "skills" / "local-feedback-analyzer"

UPLOAD_FORM = """
<html><body>
<h1>Local Customer Feedback Analyzer</h1>
<form method="POST" action="/analyze" enctype="multipart/form-data">
  <input type="file" name="feedbackfile" accept=".txt,.csv">
  <button type="submit">Analyze</button>
</form>
</body></html>
""".strip()


def resolve_openclaw_command() -> list:
    openclaw_path = shutil.which("openclaw")          # resolves to openclaw.cmd on a typical Windows npm install
    if openclaw_path is None:
        raise FileNotFoundError("openclaw was not found on PATH. Make sure OpenClaw is installed and the gateway is running.")
    if sys.platform == "win32" and openclaw_path.lower().endswith((".cmd", ".bat")):
        return ["cmd", "/c", openclaw_path]           # .cmd/.bat shims can't be launched directly by CreateProcess on Windows
    return [openclaw_path]                            # a real executable (.exe, or any non-Windows platform) runs directly


def build_skill_message(feedback_path: Path, analysis_id: str) -> str:
    # Earlier versions asked the model to "use the skill" or "run its documented command",
    # trusting it to recall {baseDir}/src/main.py from SKILL.md. In testing, across two
    # different models, that trust was misplaced: the model called skill_workshop trying
    # to edit the skill, spawned a subagent that hallucinated an unrelated file path and
    # URL, and on one run invented a fictional script filename rather than the real one.
    # The fix is to stop asking it to recall or look anything up at all. The exact,
    # already-resolved command is handed over directly, so running it requires zero
    # interpretation, only calling a generic shell tool with this literal string.
    script_path = INSTALLED_SKILL_DIR / "src" / "main.py"      # the real, resolved path to the installed copy
    venv_python = "Scripts/python.exe" if sys.platform == "win32" else "bin/python"  # venv layout differs by platform
    python_path = INSTALLED_SKILL_DIR / "venv" / venv_python    # the installed copy's own venv, with matplotlib
    output_dir = PROJECT_ROOT / "runs"                          # this project's own runs/ folder, not the installed copy's
    stats_path = PROJECT_ROOT / "stats.json"                    # this project's own stats.json, same reason as output_dir
    report_path = output_dir / analysis_id / "feedback_report.md"  # where the report actually lands, since --output-dir overrides it
    command = (
        f"{shlex.quote(str(python_path))} {shlex.quote(str(script_path))} "
        f"--feedback {shlex.quote(str(feedback_path))} "
        f"--analysis-id {shlex.quote(analysis_id)} "
        f"--output-dir {shlex.quote(str(output_dir))} "
        f"--stats-path {shlex.quote(str(stats_path))}"
    )
    return (
        f"Run this exact shell command using your shell or exec tool, verbatim, with no "
        f"changes: {command}\n\n"
        f"Do not call skill_workshop. Do not create, edit, or propose any skill. Do not "
        f"spawn a subagent. Do not read the feedback file yourself. After the command "
        f"finishes, read {shlex.quote(str(report_path))} "
        f"and report its contents back."
    )


def build_agent_command(feedback_path: Path, analysis_id: str) -> list:
    return resolve_openclaw_command() + [  # argv list passed to subprocess.run — no shell, no manual quoting needed
        "agent", "--local",
        "--session-id", f"local-feedback-{analysis_id}",   # ties this OpenClaw session to the web upload that triggered it
        "--message", build_skill_message(feedback_path, analysis_id),  # the entire skill invocation as one message string
    ]


def extract_uploaded_file(body: bytes) -> bytes:
    # minimal multipart/form-data parser for a single file field — adequate for this local-only tool,
    # not a general-purpose multipart parser.
    marker = b"\r\n\r\n"                  # blank line that separates multipart headers from the file body
    start = body.find(marker) + len(marker)
    end = body.rfind(b"\r\n--")           # multipart closing boundary
    return body[start:end]


class AnalyzerRequestHandler(http.server.BaseHTTPRequestHandler):  # one instance per incoming HTTP request
    def do_GET(self) -> None:                                       # serves the upload form on any GET request
        self.send_response(200)
        self.send_header("Content-Type", "text/html")
        self.end_headers()
        self.wfile.write(UPLOAD_FORM.encode("utf-8"))

    def do_POST(self) -> None:                                       # accepts the uploaded feedback file and runs the analysis
        analysis_id = uuid.uuid4().hex[:8]                            # short unique id for this run
        UPLOADS_DIR.mkdir(parents=True, exist_ok=True)                 # ensure the uploads folder exists
        feedback_path = UPLOADS_DIR / f"{analysis_id}.txt"             # where this upload's feedback file is saved

        content_length = int(self.headers["Content-Length"])           # size of the incoming multipart body
        body = self.rfile.read(content_length)                          # read the full request body
        feedback_path.write_bytes(extract_uploaded_file(body))          # save the extracted file content to disk

        agent_command = build_agent_command(feedback_path, analysis_id)  # the OpenClaw CLI invocation for this run
        subprocess.run(agent_command, check=True)                       # blocks until the skill finishes

        self.send_response(200)
        self.send_header("Content-Type", "text/html")
        self.end_headers()
        self.wfile.write(f"<p>Done. See runs/{analysis_id}/</p>".encode("utf-8"))  # points to the output folder


if __name__ == "__main__":
    with socketserver.TCPServer((HOST, PORT), AnalyzerRequestHandler) as httpd:  # bind to localhost only
        print(f"Serving on http://{HOST}:{PORT}")
        httpd.serve_forever()


resolve_openclaw_command exists because an npm-installed CLI tool on Windows is usually a .cmd wrapper script, not a true executable, and subprocess.run cannot launch a .cmd file the same way it launches a real one unless the call is routed through cmd.exe first.


The venv_python line in build_skill_message exists for a related reason: a virtual environment’s internal layout differs by platform, venv\Scripts\python.exe on Windows versus venv/bin/python on macOS and Linux, and since the installed copy’s own venv has to be found programmatically rather than guessed at by the model, this script needs to get that one detail right for whichever platform it’s actually running on.


build_skill_message is the most rewritten function in this entire project, and the reason why is the subject of the next section.




Running the Application


With .env filled in and the virtual environment active, register the skill, then start both processes. On Windows:



openclaw skills install "E:\workspace\python\tutorials\AI Agents\openclaw_log_analyzer" --as local-feedback-analyzer --force


Terminal 1:



.\run_gateway.ps1


Terminal 2:


.\run_web.ps1


On macOS or Linux, the same three steps use the .sh scripts instead, after making them executable once with chmod +x run_gateway.sh run_web.sh:



openclaw skills install "/path/to/openclaw_log_analyzer" --as local-feedback-analyzer --force


Terminal 1:


./run_gateway.sh


Terminal 2:


./run_web.sh


Open http://127.0.0.1:8765, choose examples/sample_feedback.txt in the file picker, and click Analyze. The sample file tells one clear story: ratings start at 4 and 5 stars, drop sharply to 1 and 2 stars across two days because of a checkout crash, then recover. Once the skill finishes, look in this project’s own runs/<analysis-id>/ folder.


feedback_report.md will open with an exact rating summary and a per-review breakdown, followed by the model naming the checkout crash as the central theme, sentiment_trend.png will show the dip and recovery as a visible line, tool_trace.json will list every step that ran to produce both, and stats.json at the project root will show this analysis’s token counts and real dollar cost, added to the running lifetime total.


Anyone re-running this after a code change needs to repeat the skills install ... --force step first. openclaw skills install copies the entire project into ~/.openclaw/workspace/skills/local-feedback-analyzer/, and the gateway always runs from that copy, not the live files being edited.




Output































Who Can Benefit


  • Students and AI engineers learning agent orchestration can use this project to see the actual difference between an orchestration layer deciding what to do and a script doing the work, including what it looks like when that decision-making layer gets it wrong in several different ways before it gets it right.

  • Anyone building on OpenClaw or a similar skill-based agent framework can use the documented failures as a head start, since registration, model catalog gaps, dispatch wording, and CLI argument shape are exactly the kind of friction that shows up the first time a custom skill is wired up.

  • Teams evaluating how much to trust a model’s own judgment versus an explicit, pre-resolved instruction can use the build_skill_message rewrite history here as a concrete case study: three rewordings that each fixed a symptom, and one architectural change that fixed the cause.

  • Anyone debugging an agent that “says” it did something can use the verification habit demonstrated throughout this project: check the filesystem, not the agent’s own summary of events.




How Codersarts Can Help


If you want to take this further, Codersarts offers hands-on support at every stage.


  • For learners: Live 1-to-1 sessions with an AI engineer who can walk through OpenClaw’s skill architecture, agent reasoning model selection, and the debugging process for agent orchestration tools in detail.

  • For teams: End-to-end development of agent-based automation tooling, including custom skills, model selection and reliability testing, cost tracking, and audit trail design.

  • For enterprises: Architecture consulting for agent orchestration deployments, including model capability evaluation and production debugging strategies for agentic systems.


Reach out at contact@codersarts.com or visit www.codersarts.com to get started.




Continue Your AI Learning Journey with Codersarts


If you enjoyed this article and would like to discover more about modern AI applications, production-ready LLM systems, and real-world RAG and MCP implementations, be sure to explore these other blogs from Codersarts:










Comments


bottom of page