Build a Customer Feedback Analyzer with OpenClaw and OpenAI
- 2 hours ago
- 20 min read
Introduction
Most “build an AI agent” tutorials show the happy path: write a skill, register it, call it, done. What they skip is the part where the agent confidently does the wrong thing anyway, in a different way every single time you try again, and you have to figure out why. This tutorial is the version that doesn’t skip that part.
We build a customer feedback analyzer using OpenClaw, an orchestration layer that dispatches commands to registered skills, paired with OpenAI for the actual analysis. You upload a batch of star-rated reviews through a small local web page, and a registered OpenClaw skill parses the ratings, asks OpenAI for a themes-and-complaints report, renders a rating trend chart, tracks token usage and cost, and writes a full execution trace.

What We Are Building
A local web upload form backed by an OpenClaw skill. The workflow:
Upload a feedback file through a simple web page running on 127.0.0.1
Parse every line into a structured date, rating, and review text
Summarize the rating distribution and isolate every low-rated review
Ask OpenAI to write a themes-and-complaints report
Compute an exact rating summary and a per-review positive or negative label, in Python, not asked of the model
Render a chart of average daily rating over time
Track token usage and real dollar cost for every model call, accumulated across every run
Write a full audit trail of every step, for transparency
Tech Stack
Component | Tool |
Orchestration | OpenClaw (skill registry and execution dispatcher) |
Model | OpenAI gpt-4o-mini |
Charting | Matplotlib |
Web upload server | Python’s built-in http.server |
Environment | A project-local venv, with a shared .env loader for both Windows (PowerShell) and macOS/Linux (bash) launchers |
Project Structure
openclaw_log_analyzer/
├── src/
│ ├── main.py # parses feedback, queries OpenAI, renders the chart, tracks cost, writes the trace
│ └── web_assistant.py # local upload server that invokes the OpenClaw skill per request
├── examples/
│ └── sample_feedback.txt # sample feedback with a realistic rating dip, for testing
├── .openclaw-local/
│ └── openclaw.json # points OpenClaw at OpenAI's built-in provider for agent reasoning
├── SKILL.md # defines the local-feedback-analyzer skill OpenClaw dispatches to main.py
├── run_gateway.ps1 # Windows launcher for the OpenClaw gateway, loads .env automatically
├── run_web.ps1 # Windows launcher for the upload server, loads .env automatically
├── load_env.ps1 # shared .env loader used by both Windows launcher scripts
├── run_gateway.sh # macOS/Linux launcher for the OpenClaw gateway, loads .env automatically
├── run_web.sh # macOS/Linux launcher for the upload server, loads .env automatically
├── load_env.sh # shared .env loader used by both macOS/Linux launcher scripts
├── runs/ # one folder per analysis: report, chart, and trace
├── stats.json # token usage and cost, accumulated across every analysis ever run
├── .env # OPENAI_API_KEY
└── requirements.txt
Setting Up OpenClaw
OpenClaw’s installer targets macOS and Linux by default, but it also ships a dedicated Windows path through PowerShell, installed via npm, requiring no administrator privileges:
iwr -useb https://openclaw.ai/install.ps1 | iex
openclaw onboard
During onboarding, skip the model and auth provider step for now, every bundled marketplace skill (GitHub, Gemini, Whisper, and the rest), every chat channel integration, and web search. None of those are needed when the only interface is a local web page and the only model is configured directly in a project-local config file.
Configuring OpenClaw to Use OpenAI
Create a file named openclaw.json inside a .openclaw-local folder in the project root.
{
"gateway": {
"mode": "local"
},
"models": {
"mode": "merge",
"providers": {
"openai": {
"models": [
{
"id": "gpt-4o-mini",
"name": "GPT-4o mini",
"reasoning": false,
"input": ["text"],
"cost": { "input": 0.15, "output": 0.6, "cacheRead": 0, "cacheWrite": 0 },
"contextWindow": 128000,
"contextTokens": 100000,
"maxTokens": 16384
}
]
}
}
},
"agents": {
"defaults": {
"model": "openai/gpt-4o-mini"
}
},
"tools": {
"web": {
"search": {
"enabled": false
},
"fetch": {
"enabled": true
}
}
}
}
gateway.mode must be present and set to "local", or the gateway refuses to start with a message about a missing or clobbered config; this and the rest of this file’s shape were confirmed by running openclaw config schema and validating against a real JSON Schema validator, not guessed from a tutorial description. The models block looks like more work than it should be, since OpenAI is one of OpenClaw’s natively bundled providers and should not need a manual model catalog at all. In practice, starting the gateway with only agents.defaults.model set to "openai/gpt-4o-mini" failed with FailoverError: Unknown model: openai/gpt-4o-mini, because OpenClaw’s own bundled catalog for that provider did not recognize the exact model string being requested. mode: "merge" keeps every one of OpenClaw’s built-in defaults for the openai provider, including authentication, and layers this explicit model declaration on top, the same fix used later in this tutorial for a fully custom provider, just applied here to patch a gap in a bundled one.
Create a file named .env in the project root:
OPENAI_API_KEY=your_openai_api_key_here
Both OPENCLAW_CONFIG_PATH and OPENAI_API_KEY are environment variables scoped to a single terminal session. Rather than re-typing $env:... assignments in every new PowerShell window, create a file named load_env.ps1 that both launcher scripts share:
function Import-DotEnv {
param([string]$Path = (Join-Path $PSScriptRoot ".env")) # defaults to the .env next to this script
if (-not (Test-Path $Path)) { return } # silently skip if no .env exists
Get-Content $Path | ForEach-Object {
if ($_ -match '^\s*([^#=\s][^=]*)\s*=\s*(.*)\s*$') { # skip blank lines and lines starting with #
$name = $matches[1].Trim()
$value = $matches[2].Trim().Trim('"').Trim("'") # strip optional surrounding quotes
Set-Item -Path "env:$name" -Value $value # exports it as a real environment variable
}
}
}
This is plain PowerShell, not anything OpenClaw provides, since we’re injecting the variable ourselves before ever invoking openclaw, sidestepping any question of whether OpenClaw’s own Node process reads .env natively.
On macOS or Linux, create a file named load_env.sh instead, with the same behavior as a bash function:
import_dotenv() { # bash equivalent of load_env.ps1's Import-DotEnv function
local env_file="${1:-$(dirname "${BASH_SOURCE[0]}")/.env}" # defaults to .env next to this script
[ -f "$env_file" ] || return 0 # silently skip if no .env exists
while IFS='=' read -r key value; do
case "$key" in
''|'#'*) continue ;; # skip blank lines and lines starting with #
esac
key="$(echo "$key" | xargs)" # trim surrounding whitespace
value="$(echo "$value" | xargs)" # trim surrounding whitespace
value="${value%\"}"; value="${value#\"}" # strip optional surrounding double quotes
value="${value%\'}"; value="${value#\'}" # strip optional surrounding single quotes
export "$key=$value" # exports it as a real environment variable
done < "$env_file"
}
Same idea as the PowerShell version: skip comments and blank lines, trim quotes, export each variable into the current shell so whatever runs openclaw next inherits it.
Defining the Skill
Create a file named SKILL.md in the project root.
---
name: local-feedback-analyzer
description: "Parses a batch of customer feedback, asks OpenAI for a themes-and-complaints report, and renders a rating trend chart, with a full execution trace."
---
# Local Feedback Analyzer
Analyze a batch of star-rated customer feedback: parse ratings, ask OpenAI for a themes-and-complaints report, render a rating trend chart, and write a full execution trace.
## Quick start
Whenever asked to analyze a feedback file, run this exact command:
```bash
python {baseDir}/src/main.py --feedback /path/to/feedback.txt --analysis-id some-id
```
Replace `/path/to/feedback.txt` with the feedback file path you were given, and `some-id` with the analysis id you were given (or a short random id if none was provided).
## Output
After the command finishes, three files exist in `{baseDir}/runs/<analysis-id>/`:
- `feedback_report.md`, key themes, top complaints, and recommended actions
- `sentiment_trend.png`, a chart of average daily rating over time
- `tool_trace.json`, the full execution trace
Read `feedback_report.md` and report its contents back to whoever asked.
## Notes
- The feedback file format is one review per line: `YYYY-MM-DD|RATING|review text`
- Always run the actual command above. Never write a placeholder, summary, or simulated result file yourself, only the script's real output counts.
- The default model is `gpt-4o-mini`. Only pass `--model` if a different model was explicitly requested.
- Requires `OPENAI_API_KEY` to be set in the environment before this command is run.
- This is a request to perform a task, not a request to modify this skill. Never call `skill_workshop`, never propose edits to this `SKILL.md`, and never read the feedback file yourself to write your own summary. The only valid action is running the command above and reporting its real output.
That last bullet under Notes was not part of the original design. It was added after the agent repeatedly tried to call skill_workshop, a tool meant for creating and editing skill definitions, instead of running anything. The full story of why, and why this one bullet alone did not fully fix it, is in its own section below, since it turned out to be the central problem of the entire project.
This file also went through an earlier, completely different design before reaching the plain-markdown shape shown above. The first version used frontmatter fields called command-dispatch: tool, command-tool: exec, and command-arg-mode: raw, based on a tutorial’s description of how to make a slash command bypass the model and dispatch directly to a deterministic tool. After fixing several unrelated config and CLI issues, that version still produced an agent that replied with a confident, plausible-sounding completion message and never ran anything at all.
Listing OpenClaw’s own bundled skills settled the question: not one of them, weather, video frame extraction, meme generation, uses command-dispatch at all. Every working bundled skill relies on plain markdown instructions in the skill body with a {baseDir}/... command example, and lets the model read those instructions and decide to run a generic shell tool itself. That is the version shown above.
Parsing and Summarizing Feedback
Create a file named main.py inside a src folder. The first part of this file turns a raw feedback file into the numbers the report and chart actually need.
import argparse # parse CLI args passed by the OpenClaw skill invocation
import json # write tool_trace.json
import os # read OPENAI_API_KEY from the environment
import re # parse structured fields out of raw feedback lines
import time # measure model call latency for stats.json
from collections import Counter, defaultdict # tally ratings and bucket averages by day
from datetime import datetime # timestamp trace events
from pathlib import Path # filesystem paths for inputs and outputs
from typing import Any, Dict, List, Tuple
import urllib.request # call OpenAI's HTTP API without extra dependencies
import matplotlib # render the sentiment trend chart
matplotlib.use("Agg") # headless backend, no display server needed on a local machine
import matplotlib.pyplot as plt
PROJECT_ROOT = Path(__file__).resolve().parent.parent # repo root, one level above src/
FEEDBACK_LINE_PATTERN = re.compile( # matches "YYYY-MM-DD|RATING|review text"
r"^(?P<date>\d{4}-\d{2}-\d{2})\|(?P<rating>[1-5])\|(?P<text>.*)$"
)
OPENAI_ENDPOINT = "https://api.openai.com/v1/chat/completions" # OpenAI's chat completions endpoint
COST_RATES = { # USD per token, keyed by model name, overridable via .env
"gpt-4o-mini": {
"input": float(os.environ.get("GPT_4O_MINI_INPUT_COST", 0.00000015)),
"output": float(os.environ.get("GPT_4O_MINI_OUTPUT_COST", 0.00000060)),
},
}
def record_trace_event(trace_events: List[Dict[str, Any]], category: str, action: str, message: str) -> None:
trace_events.append({ # one audit-trail entry per step of the pipeline
"timestamp": datetime.now().isoformat(), # when this step happened
"category": category, # e.g. "fs", "model", "chart"
"action": action, # e.g. "read", "generate", "render"
"message": message, # human-readable detail for the trace file
})
def parse_feedback_entries(feedback_path: Path, trace_events: List[Dict[str, Any]]) -> List[Dict[str, str]]:
record_trace_event(trace_events, "fs", "read", f"Reading feedback file: {feedback_path}") # log the read before it happens
entries: List[Dict[str, str]] = [] # structured {date, rating, text} dicts
with open(feedback_path, "r", encoding="utf-8", errors="replace") as f: # tolerate odd byte sequences in real exports
for raw_line in f:
match = FEEDBACK_LINE_PATTERN.match(raw_line.strip()) # try to parse the date|rating|text format
if match:
entries.append(match.groupdict()) # keep only lines that match the expected format
record_trace_event(trace_events, "fs", "parse", f"Parsed {len(entries)} structured feedback entries") # how much survived parsing
return entries
def summarize_feedback(entries: List[Dict[str, str]]) -> Dict[str, Any]:
rating_counts = Counter(e["rating"] for e in entries) # total count per star rating, 1 through 5
ratings_by_day: Dict[str, List[int]] = defaultdict(list) # all ratings seen on each calendar day
for entry in entries:
ratings_by_day[entry["date"]].append(int(entry["rating"])) # group ratings by the day they were left
daily_avg_rating = { # mean rating per day, used for the trend chart
day: sum(ratings) / len(ratings) for day, ratings in ratings_by_day.items()
}
negative_reviews = [ # full text of low-rated reviews, for the prompt
e["text"] for e in entries if int(e["rating"]) <= 2
]
return {
"rating_counts": dict(rating_counts), # e.g. {"5": 9, "4": 2, "2": 4, "1": 2}
"daily_avg_rating": daily_avg_rating, # e.g. {"2024-06-04": 1.6, "2024-06-06": 4.5}
"negative_reviews": negative_reviews, # 1-2 star review text, used in the model prompt
}
FEEDBACK_LINE_PATTERN accepts exactly one format, a date, a single digit rating, and free text, separated by pipes. Lines that do not match are silently skipped rather than raising an error, which means a stray blank line or header row in a real export will not crash the whole analysis. summarize_feedback does all of the actual statistics in plain Python: a Counter for the rating distribution, a defaultdict to group ratings by day before averaging, and a simple list comprehension to isolate every 1 or 2 star review for the model prompt later. COST_RATES is defined here, near the top of the file, because it is consulted every time a model call is logged for stats.json, covered further down.
Calling OpenAI and Rendering the Chart
The next part of main.py talks to OpenAI directly and turns the daily averages into a chart.
def query_openai_model(model_name: str, prompt: str) -> Tuple[str, Dict[str, Any]]:
api_key = os.environ.get("OPENAI_API_KEY") # read at call time so .env loaded by the caller is picked up
if not api_key:
raise RuntimeError("OPENAI_API_KEY is not set. Add it to .env before running this script.")
payload = json.dumps({ # OpenAI's chat completions request body
"model": model_name, # which model to use, e.g. "gpt-4o-mini"
"messages": [{"role": "user", "content": prompt}], # single-turn request, no system message needed
"temperature": 0.3, # low but not zero, for consistent yet natural-sounding prose
"metadata": { # tags visible in the OpenAI dashboard usage logs
"dev_name": "Ganesh",
"project": "codex-test",
"environment": "local",
"purpose": "testing",
},
}).encode("utf-8")
request = urllib.request.Request( # build the HTTP POST request
OPENAI_ENDPOINT, data=payload,
headers={
"Content-Type": "application/json",
"Authorization": f"Bearer {api_key}", # OpenAI auth via bearer token
},
)
start = time.monotonic() # wall-clock start, for stats.json latency
with urllib.request.urlopen(request, timeout=120) as response: # generous headroom for a slow network or long report
body = json.loads(response.read().decode("utf-8")) # OpenAI returns {"choices": [...], "usage": {...}}
latency_seconds = round(time.monotonic() - start, 3)
content = body["choices"][0]["message"]["content"].strip() # the model's generated text, whitespace trimmed
usage = dict(body.get("usage", {})) # real prompt/completion/total token counts from OpenAI
usage["latency_seconds"] = latency_seconds
return content, usage
def render_sentiment_trend(daily_avg_rating: Dict[str, float], output_path: Path) -> None:
days = sorted(daily_avg_rating.keys()) # chronological order along the x-axis
averages = [daily_avg_rating[d] for d in days] # matching average rating for each day
plt.figure(figsize=(10, 4)) # wide, short chart suited to a time series
plt.plot(days, averages, marker="o", color="#9333ea") # one point per day, connected by a line
plt.axhline(y=3, color="gray", linestyle="--", linewidth=1) # neutral-rating reference line for context
plt.title("Average Customer Rating Over Time") # chart title
plt.xlabel("Date") # x-axis label
plt.ylabel("Average Rating (1-5)") # y-axis label
plt.ylim(1, 5) # fixed scale matches the 1-5 star rating range
plt.xticks(rotation=45, ha="right") # angle the date labels so they don't overlap
plt.tight_layout() # avoid clipping the rotated labels
plt.savefig(output_path) # write the PNG to disk
plt.close() # release the figure from memory
query_openai_model reads OPENAI_API_KEY at call time rather than at import time, specifically so that whatever loaded .env before this script ran, whether that is load_env.ps1 or a test harness setting the variable directly, is respected.
It returns a tuple now instead of a plain string: the report text the model wrote, and a dictionary of real token counts straight from OpenAI’s own usage field, plus the measured latency. Nothing here estimates tokens; the numbers come directly from the API response.
render_sentiment_trend fixes the y-axis to the 1 through 5 star range so the chart’s shape is always comparable across different uploads, and the gray dashed line at 3 stars gives a constant visual reference for what “neutral” looks like.
Computing a Rating Summary and a Per-Review Breakdown
The model’s report covers themes and complaints, but it was never actually asked to report back the rating numbers it was given, and trusting an LLM to recount data it already received exactly, rather than just discuss it, is not a good trade. These two functions compute that part directly in Python instead.
def format_rating_summary(rating_counts: Dict[str, int]) -> str:
# Computed directly from parsed data rather than asked of the model — these are exact
# counts we already have correctly, not something worth trusting an LLM to recount.
total = sum(rating_counts.values())
positive = sum(rating_counts.get(str(r), 0) for r in (4, 5)) # 4-5 stars
neutral = rating_counts.get("3", 0) # 3 stars
negative = sum(rating_counts.get(str(r), 0) for r in (1, 2)) # 1-2 stars
breakdown = ", ".join(f"{r} star: {rating_counts.get(str(r), 0)}" for r in (5, 4, 3, 2, 1))
return (
"## Rating Summary\n"
f"- Total reviews: {total}\n"
f"- Positive (4-5 stars): {positive}\n"
f"- Neutral (3 stars): {neutral}\n"
f"- Negative (1-2 stars): {negative}\n"
f"- Breakdown: {breakdown}\n"
)
def format_individual_reviews(entries: List[Dict[str, str]]) -> str:
# Shows the actual input alongside its classification, one line per review, so the
# report isn't just aggregate numbers — the label is derived the same way the
# aggregate counts are (4-5 stars positive, 3 neutral, 1-2 negative), not by the model.
lines = ["## Individual Reviews"]
for entry in entries:
rating = int(entry["rating"])
label = "Positive" if rating >= 4 else "Negative" if rating <= 2 else "Neutral"
lines.append(f"- {entry['date']} | {rating} star | {label} | \"{entry['text']}\"")
return "\n".join(lines) + "\n"
format_rating_summary produces the aggregate counts: how many reviews, how many fell into each bucket, and the exact per-star breakdown. format_individual_reviews goes one level lower, listing every single review next to its own label, computed the same way as the aggregate so the two sections can never disagree with each other. Both get prepended to the model’s narrative before the report is written to disk, so the final file shows exact numbers first and the model’s interpretation of them second.
Tracking Token Usage and Cost
Every model call now gets logged to stats.json at the project root, with totals that accumulate across every analysis ever run, not just the current one.
def build_call_record(model_name: str, usage: Dict[str, Any], analysis_id: str) -> Dict[str, Any]:
prompt_tok = usage.get("prompt_tokens", 0)
completion_tok = usage.get("completion_tokens", 0)
total_tok = usage.get("total_tokens", prompt_tok + completion_tok)
rates = COST_RATES.get(model_name, {"input": 0, "output": 0}) # unknown models cost $0 rather than raising
input_cost = prompt_tok * rates["input"]
output_cost = completion_tok * rates["output"]
return {
"timestamp": datetime.now().isoformat(),
"analysis_id": analysis_id,
"model": model_name,
"prompt_tokens": prompt_tok,
"completion_tokens": completion_tok,
"total_tokens": total_tok,
"input_cost": round(input_cost, 7),
"output_cost": round(output_cost, 7),
"total_cost": round(input_cost + output_cost, 7),
"latency_seconds": usage.get("latency_seconds", 0),
}
def summarize_calls(calls: List[Dict[str, Any]]) -> Dict[str, Any]:
return {
"total_calls": len(calls),
"total_prompt_tokens": sum(c["prompt_tokens"] for c in calls),
"total_completion_tokens": sum(c["completion_tokens"] for c in calls),
"total_tokens": sum(c["total_tokens"] for c in calls),
"total_cost": round(sum(c["total_cost"] for c in calls), 6),
}
def record_stats(stats_path: Path, call_record: Dict[str, Any]) -> None:
try: # load history written by previous runs so stats accumulate
existing = json.loads(stats_path.read_text(encoding="utf-8"))
all_calls = existing.get("calls", [])
except (FileNotFoundError, json.JSONDecodeError):
all_calls = [] # first run — start with empty history
all_calls.append(call_record)
output = {
"run_info": {
"timestamp": datetime.now().isoformat(), # when stats.json was last written
**summarize_calls(all_calls), # lifetime totals across every call ever recorded
},
"calls": all_calls, # every individual call ever recorded
}
stats_path.parent.mkdir(parents=True, exist_ok=True)
stats_path.write_text(json.dumps(output, indent=2), encoding="utf-8")
build_call_record turns one model call’s real token usage into a record with USD costs already computed, using the per-token rates defined in COST_RATES. record_stats reads whatever history already exists in stats.json, appends this call, recomputes lifetime totals across every call ever recorded, and writes the file back out. The first time this runs, there is no existing file, so the except branch just starts from an empty list, the same accumulate-across-runs pattern used for token and cost tracking in the other projects in this series.
Writing the Report and Trace
The rest of main.py ties parsing, the model call, the rating summary, the stats tracking, and the chart together, then writes everything to disk.
FEEDBACK_REPORT_PROMPT = """
You are a customer experience analyst reviewing a batch of customer feedback. Given the star
rating distribution and the negative reviews below, write a short feedback analysis.
Rating distribution (1-5 stars): {rating_counts}
Negative reviews (rating 2 stars or below):
{negative_reviews}
Respond in markdown with three sections: "## Key Themes", "## Top Complaints", "## Recommended Actions".
""".strip()
def run_analysis(feedback_path: Path, results_dir: Path, model_name: str, stats_path: Path) -> None:
trace_events: List[Dict[str, Any]] = [] # accumulates every step for tool_trace.json
entries = parse_feedback_entries(feedback_path, trace_events) # structured {date, rating, text} rows
summary = summarize_feedback(entries) # rating counts, daily averages, negative reviews
prompt = FEEDBACK_REPORT_PROMPT.format( # fill the report-writing prompt with real data
rating_counts=summary["rating_counts"],
negative_reviews="\n".join(f"- {text}" for text in summary["negative_reviews"]) or "- (none found)",
)
record_trace_event(trace_events, "model", "generate", f"Requesting feedback analysis from {model_name}") # before the call
report_body, usage = query_openai_model(model_name, prompt) # the model's narrative analysis, plus real token usage
call_record = build_call_record(model_name, usage, results_dir.name) # results_dir.name is the analysis_id
record_stats(stats_path, call_record) # accumulate token/cost history across all runs
rating_summary = format_rating_summary(summary["rating_counts"]) # exact counts, prepended ahead of the model's prose
individual_reviews = format_individual_reviews(entries) # raw input plus per-review positive/negative label
full_report = f"{rating_summary}\n{individual_reviews}\n{report_body}"
results_dir.mkdir(parents=True, exist_ok=True) # ensure the per-run output directory exists
report_path = results_dir / "feedback_report.md" # markdown report path
report_path.write_text(full_report, encoding="utf-8") # write the rating summary plus the model's narrative
record_trace_event(trace_events, "fs", "write", f"Wrote {report_path}") # after the write succeeds
chart_path = results_dir / "sentiment_trend.png" # chart output path
render_sentiment_trend(summary["daily_avg_rating"], chart_path) # draw and save the rating trend chart
record_trace_event(trace_events, "chart", "render", f"Wrote {chart_path}") # after the chart is saved
trace_path = results_dir / "tool_trace.json" # audit trail output path
trace_path.write_text(json.dumps(trace_events, indent=2), encoding="utf-8") # full step-by-step record, written last
print(f"Analysis complete. Wrote {report_path}, {chart_path}, and {trace_path}") # explicit stdout signal that this succeeded
def main() -> None:
parser = argparse.ArgumentParser(description="Customer feedback analyzer") # CLI entry point invoked by the OpenClaw skill
parser.add_argument("--feedback", required=True, type=Path) # path to the feedback file to analyze
parser.add_argument("--model", default="gpt-4o-mini") # OpenAI model name
parser.add_argument("--analysis-id", required=True) # unique id for this run, used in the output path
parser.add_argument("--output-dir", type=Path, default=None) # base runs/ folder; overrides PROJECT_ROOT when this
# script is invoked from an OpenClaw-installed copy
parser.add_argument("--stats-path", type=Path, default=None) # stats.json location; same override reason as --output-dir
args = parser.parse_args()
runs_base = args.output_dir if args.output_dir is not None else PROJECT_ROOT / "runs" # explicit dir wins when given
results_dir = runs_base / args.analysis_id # per-run output folder, keeps runs from colliding
stats_path = args.stats_path if args.stats_path is not None else PROJECT_ROOT / "stats.json" # explicit path wins when given
run_analysis(args.feedback, results_dir, args.model, stats_path)
if __name__ == "__main__":
main()
Two things here exist specifically because of failures discovered later, not because they were part of the original design. The print statement at the end of run_analysis exists because this script used to write three files silently and exit with no output at all, and OpenClaw’s own tool wrapper treated a perfectly successful run with empty stdout as an ambiguous, sometimes failed, result. The --output-dir and --stats-path arguments exist because openclaw skills install runs this script from a copied location, not the original project folder, and without an explicit override, every result would land somewhere far harder to find than where it was uploaded from. Both are explained in detail in the sections that follow.
The Web Upload Server
Create a file named web_assistant.py, also inside src. This is the only thing a person actually interacts with directly; everything else runs invisibly behind it.
import http.server # minimal stdlib HTTP server, no extra dependencies needed
import shlex # quote arguments safely inside the single --message string
import shutil # resolve the openclaw executable's real path, including its extension
import socketserver # TCP server base used to host AnalyzerRequestHandler
import subprocess # invoke the OpenClaw CLI as a subprocess for each upload
import sys # detect Windows to handle .cmd/.bat shims correctly
import uuid # generate a unique analysis_id per upload
from pathlib import Path
PROJECT_ROOT = Path(__file__).resolve().parent.parent # repo root, one level above src/
UPLOADS_DIR = PROJECT_ROOT / "uploads" # where incoming feedback files are written
HOST, PORT = "127.0.0.1", 8765 # local-only — never bound to a public interface
# Where "openclaw skills install ... --as local-feedback-analyzer" actually copies the project.
# Computed directly rather than relying on the model to recall {baseDir} from SKILL.md, since
# testing showed the model does not reliably look up or recall the skill's own instructions —
# it has guessed at fictional script filenames instead of reading the real documented command.
INSTALLED_SKILL_DIR = Path.home() / ".openclaw" / "workspace" / "skills" / "local-feedback-analyzer"
UPLOAD_FORM = """
<html><body>
<h1>Local Customer Feedback Analyzer</h1>
<form method="POST" action="/analyze" enctype="multipart/form-data">
<input type="file" name="feedbackfile" accept=".txt,.csv">
<button type="submit">Analyze</button>
</form>
</body></html>
""".strip()
def resolve_openclaw_command() -> list:
openclaw_path = shutil.which("openclaw") # resolves to openclaw.cmd on a typical Windows npm install
if openclaw_path is None:
raise FileNotFoundError("openclaw was not found on PATH. Make sure OpenClaw is installed and the gateway is running.")
if sys.platform == "win32" and openclaw_path.lower().endswith((".cmd", ".bat")):
return ["cmd", "/c", openclaw_path] # .cmd/.bat shims can't be launched directly by CreateProcess on Windows
return [openclaw_path] # a real executable (.exe, or any non-Windows platform) runs directly
def build_skill_message(feedback_path: Path, analysis_id: str) -> str:
# Earlier versions asked the model to "use the skill" or "run its documented command",
# trusting it to recall {baseDir}/src/main.py from SKILL.md. In testing, across two
# different models, that trust was misplaced: the model called skill_workshop trying
# to edit the skill, spawned a subagent that hallucinated an unrelated file path and
# URL, and on one run invented a fictional script filename rather than the real one.
# The fix is to stop asking it to recall or look anything up at all. The exact,
# already-resolved command is handed over directly, so running it requires zero
# interpretation, only calling a generic shell tool with this literal string.
script_path = INSTALLED_SKILL_DIR / "src" / "main.py" # the real, resolved path to the installed copy
venv_python = "Scripts/python.exe" if sys.platform == "win32" else "bin/python" # venv layout differs by platform
python_path = INSTALLED_SKILL_DIR / "venv" / venv_python # the installed copy's own venv, with matplotlib
output_dir = PROJECT_ROOT / "runs" # this project's own runs/ folder, not the installed copy's
stats_path = PROJECT_ROOT / "stats.json" # this project's own stats.json, same reason as output_dir
report_path = output_dir / analysis_id / "feedback_report.md" # where the report actually lands, since --output-dir overrides it
command = (
f"{shlex.quote(str(python_path))} {shlex.quote(str(script_path))} "
f"--feedback {shlex.quote(str(feedback_path))} "
f"--analysis-id {shlex.quote(analysis_id)} "
f"--output-dir {shlex.quote(str(output_dir))} "
f"--stats-path {shlex.quote(str(stats_path))}"
)
return (
f"Run this exact shell command using your shell or exec tool, verbatim, with no "
f"changes: {command}\n\n"
f"Do not call skill_workshop. Do not create, edit, or propose any skill. Do not "
f"spawn a subagent. Do not read the feedback file yourself. After the command "
f"finishes, read {shlex.quote(str(report_path))} "
f"and report its contents back."
)
def build_agent_command(feedback_path: Path, analysis_id: str) -> list:
return resolve_openclaw_command() + [ # argv list passed to subprocess.run — no shell, no manual quoting needed
"agent", "--local",
"--session-id", f"local-feedback-{analysis_id}", # ties this OpenClaw session to the web upload that triggered it
"--message", build_skill_message(feedback_path, analysis_id), # the entire skill invocation as one message string
]
def extract_uploaded_file(body: bytes) -> bytes:
# minimal multipart/form-data parser for a single file field — adequate for this local-only tool,
# not a general-purpose multipart parser.
marker = b"\r\n\r\n" # blank line that separates multipart headers from the file body
start = body.find(marker) + len(marker)
end = body.rfind(b"\r\n--") # multipart closing boundary
return body[start:end]
class AnalyzerRequestHandler(http.server.BaseHTTPRequestHandler): # one instance per incoming HTTP request
def do_GET(self) -> None: # serves the upload form on any GET request
self.send_response(200)
self.send_header("Content-Type", "text/html")
self.end_headers()
self.wfile.write(UPLOAD_FORM.encode("utf-8"))
def do_POST(self) -> None: # accepts the uploaded feedback file and runs the analysis
analysis_id = uuid.uuid4().hex[:8] # short unique id for this run
UPLOADS_DIR.mkdir(parents=True, exist_ok=True) # ensure the uploads folder exists
feedback_path = UPLOADS_DIR / f"{analysis_id}.txt" # where this upload's feedback file is saved
content_length = int(self.headers["Content-Length"]) # size of the incoming multipart body
body = self.rfile.read(content_length) # read the full request body
feedback_path.write_bytes(extract_uploaded_file(body)) # save the extracted file content to disk
agent_command = build_agent_command(feedback_path, analysis_id) # the OpenClaw CLI invocation for this run
subprocess.run(agent_command, check=True) # blocks until the skill finishes
self.send_response(200)
self.send_header("Content-Type", "text/html")
self.end_headers()
self.wfile.write(f"<p>Done. See runs/{analysis_id}/</p>".encode("utf-8")) # points to the output folder
if __name__ == "__main__":
with socketserver.TCPServer((HOST, PORT), AnalyzerRequestHandler) as httpd: # bind to localhost only
print(f"Serving on http://{HOST}:{PORT}")
httpd.serve_forever()
resolve_openclaw_command exists because an npm-installed CLI tool on Windows is usually a .cmd wrapper script, not a true executable, and subprocess.run cannot launch a .cmd file the same way it launches a real one unless the call is routed through cmd.exe first.
The venv_python line in build_skill_message exists for a related reason: a virtual environment’s internal layout differs by platform, venv\Scripts\python.exe on Windows versus venv/bin/python on macOS and Linux, and since the installed copy’s own venv has to be found programmatically rather than guessed at by the model, this script needs to get that one detail right for whichever platform it’s actually running on.
build_skill_message is the most rewritten function in this entire project, and the reason why is the subject of the next section.
Running the Application
With .env filled in and the virtual environment active, register the skill, then start both processes. On Windows:
openclaw skills install "E:\workspace\python\tutorials\AI Agents\openclaw_log_analyzer" --as local-feedback-analyzer --force
Terminal 1:
.\run_gateway.ps1
Terminal 2:
.\run_web.ps1
On macOS or Linux, the same three steps use the .sh scripts instead, after making them executable once with chmod +x run_gateway.sh run_web.sh:
openclaw skills install "/path/to/openclaw_log_analyzer" --as local-feedback-analyzer --force
Terminal 1:
./run_gateway.sh
Terminal 2:
./run_web.sh
Open http://127.0.0.1:8765, choose examples/sample_feedback.txt in the file picker, and click Analyze. The sample file tells one clear story: ratings start at 4 and 5 stars, drop sharply to 1 and 2 stars across two days because of a checkout crash, then recover. Once the skill finishes, look in this project’s own runs/<analysis-id>/ folder.
feedback_report.md will open with an exact rating summary and a per-review breakdown, followed by the model naming the checkout crash as the central theme, sentiment_trend.png will show the dip and recovery as a visible line, tool_trace.json will list every step that ran to produce both, and stats.json at the project root will show this analysis’s token counts and real dollar cost, added to the running lifetime total.
Anyone re-running this after a code change needs to repeat the skills install ... --force step first. openclaw skills install copies the entire project into ~/.openclaw/workspace/skills/local-feedback-analyzer/, and the gateway always runs from that copy, not the live files being edited.
Output



















Who Can Benefit
Students and AI engineers learning agent orchestration can use this project to see the actual difference between an orchestration layer deciding what to do and a script doing the work, including what it looks like when that decision-making layer gets it wrong in several different ways before it gets it right.
Anyone building on OpenClaw or a similar skill-based agent framework can use the documented failures as a head start, since registration, model catalog gaps, dispatch wording, and CLI argument shape are exactly the kind of friction that shows up the first time a custom skill is wired up.
Teams evaluating how much to trust a model’s own judgment versus an explicit, pre-resolved instruction can use the build_skill_message rewrite history here as a concrete case study: three rewordings that each fixed a symptom, and one architectural change that fixed the cause.
Anyone debugging an agent that “says” it did something can use the verification habit demonstrated throughout this project: check the filesystem, not the agent’s own summary of events.
How Codersarts Can Help
If you want to take this further, Codersarts offers hands-on support at every stage.
For learners: Live 1-to-1 sessions with an AI engineer who can walk through OpenClaw’s skill architecture, agent reasoning model selection, and the debugging process for agent orchestration tools in detail.
For teams: End-to-end development of agent-based automation tooling, including custom skills, model selection and reliability testing, cost tracking, and audit trail design.
For enterprises: Architecture consulting for agent orchestration deployments, including model capability evaluation and production debugging strategies for agentic systems.
Reach out at contact@codersarts.com or visit www.codersarts.com to get started.
Continue Your AI Learning Journey with Codersarts
If you enjoyed this article and would like to discover more about modern AI applications, production-ready LLM systems, and real-world RAG and MCP implementations, be sure to explore these other blogs from Codersarts:
Build a Cost-Efficient Writing Quality Checker with Tiered Model Routing and OpenAI
Build Your First A2A Agent: An Email Drafting Pipeline Using Python and OpenAI
Building an AI Interview Prep Agent with Qwen 3.7 Max and Streamlit
https://www.codersarts.com/post/building-an-ai-interview-prep-agent-with-qwen-3-7-max-and-streamlit
Academic Research Assistance and Literature Review Automation Using RAG
Clinical Decision Support Systems Using RAG: Intelligent Diagnostic Assistance for Healthcare
Financial Decision Making with RAG Powered Market Intelligence
https://www.codersarts.com/post/financial-decision-making-with-rag-powered-market-intelligence




Comments