Build a Reading Companion with Supermemory and the OpenAI Agents SDK

1 hour ago
12 min read

Introduction

Most “AI memory” tutorials show a single isolated call: add one fact, search for it, print the result. They rarely show a real conversational agent deciding for itself, turn by turn, whether something the user just said should be written to memory, recalled from memory, or neither.

In this tutorial we build a reading companion using Supermemory, a hosted memory API, paired with the OpenAI Agents SDK. You chat with it the way you’d chat with a tutor: tell it what you studied, and it logs the topic, source, time spent, and how well you understood it; ask it what to read next, and it grounds its answer in your actual history, stable interests separated from recent activity, rather than guessing from its own training data.

What We Are Building

A command-line chat agent. The workflow:

Tell the agent what you read or studied, in plain conversation
Log that as a structured fact in Supermemory, plus any standalone fact about you that isn’t tied to one session
Ask what to read next, and have the agent pull your history before answering
Track every turn’s real token usage, cost, and response time, alongside the actual prompt and response text

Tech Stack

Component	Tool
Memory	Supermemory (hosted memory API)
Agent framework	OpenAI Agents SDK
Model	OpenAI gpt-4o-mini
Session history	SQLite, via the Agents SDK’s built-in SQLiteSession

Pricing: What Actually Costs Money

Supermemory has a genuine free tier, no credit card required, about $5 of usage credit included monthly, which easily covers a tutorial project. OpenAI does not work that way: every chat turn calls gpt-4o-mini through the Agents SDK, billed per token on your own OpenAI account. Two separate API keys, two separate billing relationships, and only one of them is free to start.

Project Structure


supermemory_reading_tracker/
├── src/
│   ├── memory_tools.py        # record_reading_session, record_reader_fact, recommend_next_reading
│   └── reading_companion.py   # Curator agent definition, cost tracking, and the CLI chat loop
├── requirements.txt           # supermemory, openai-agents, python-dotenv
├── stats.json                 # prompt, response, token usage, cost, and timing for every chat turn
└── .env                       # SUPERMEMORY_API_KEY, OPENAI_API_KEY

Setting Up Supermemory and OpenAI

Create a Supermemory account and grab an API key from console.supermemory.ai, not app.supermemory.ai, which is a different page entirely. You’ll also need an OpenAI API key from your own OpenAI account.

Create a file named .env in the project root:


SUPERMEMORY_API_KEY=sm_your_key_here
OPENAI_API_KEY=sk-your_key_here

Both keys load through python-dotenv the moment memory_tools.py is imported, before either client is constructed, so there’s no separate setup step beyond having this file in place.

Building the Memory Tools

Here’s what we’re about to build: three functions, each wrapped with @function_tool so the OpenAI Agents SDK can expose them to the agent as callable tools. Two of them write to Supermemory, and one reads from it. Create a file named memory_tools.py inside a src folder.


from agents import function_tool         # decorator that exposes a plain function as an agent tool
from dotenv import load_dotenv           # loads SUPERMEMORY_API_KEY and OPENAI_API_KEY from .env
from supermemory import Supermemory      # hosted memory client: stores and retrieves per-reader facts

load_dotenv()                            # must run before Supermemory() reads the key from the environment

READER_ID = "demo_reader"                # container_tag scoping every memory to one reader's history
memory_client = Supermemory()            # reads SUPERMEMORY_API_KEY from the environment automatically

READER_ID is passed as Supermemory’s container_tag on every call, which is what scopes memories to one specific reader rather than mixing everyone’s facts together. memory_client is constructed once at import time and shared by every tool function below, rather than each function building its own client.

Now let’s write the first tool, the one that logs a specific reading session.


@function_tool
def record_reading_session(
    topic: str,
    source: str,
    duration_minutes: int,
    comprehension: int,
    notes: str = "",
) -> str:
    """Log a finished reading session to the reader's memory.

    Args:
        topic: What the reader studied (e.g. "transformer attention", "Python decorators").
        source: Where it came from (e.g. "a blog post", "chapter 4 of a textbook").
        duration_minutes: How many minutes the reader spent on this session.
        comprehension: Reader's own 1-5 self-rating of how well they understood it.
        notes: Optional notes about what was confusing or notable.
    """
    print(f"[record_reading_session] {topic=} {source=} {duration_minutes=} {comprehension=} {notes=}")

    content = (
        f"Studied {topic} via {source} for {duration_minutes} minutes, "
        f"self-rated comprehension {comprehension}/5."
    )
    if notes:
        content += f" Notes: {notes}"

    response = memory_client.add(content=content, container_tag=READER_ID)  # writes one fact to Supermemory
    print(f"[record_reading_session] -> id={response.id} status={response.status}")
    return f"Logged {topic} ({duration_minutes} min, comprehension {comprehension}/5)."

The function’s docstring is not decoration, the Agents SDK reads it directly to build the tool’s schema and the per-argument descriptions the model sees, so writing a vague docstring here means the model gets vague guidance about what each argument means. The content string is deliberately built as one natural-language sentence rather than a structured object, since Supermemory’s job is to extract facts from free text the same way it would from a real conversation.

Next, the tool that closes a real gap: a place to log facts about the reader that aren’t tied to any single session at all.


@function_tool
def record_reader_fact(fact: str) -> str:
    """Log a standalone fact about the reader that is not tied to one specific reading session.

    Use this for things like stated skill level, long-running interests, goals, or constraints
    (e.g. "intermediate in Python, learning for about a year", "wants to focus on backend
    development", "prefers short articles over long textbooks"). These are exactly the kind of
    fact record_reading_session has no field for, since it only describes a single session.

    Args:
        fact: A short, self-contained sentence describing the stable fact, in the reader's own words
            or a faithful paraphrase. One fact per call; split multi-fact statements into separate calls.
    """
    print(f"[record_reader_fact] {fact=}")
    response = memory_client.add(content=fact, container_tag=READER_ID)  # writes one fact to Supermemory
    print(f"[record_reader_fact] -> id={response.id} status={response.status}")
    return f"Noted: {fact}"

This tool exists because of a real gap found by actually testing the agent, not because of anything in the design up front, covered in full further down. Without it, a message like “I’m at an intermediate level overall in Python, been learning it for about a year” had nowhere to go: record_reading_session only has fields for a single session’s topic, source, duration, and comprehension, so a standalone skill-level statement was silently dropped, never written to Supermemory at all.

Finally, the read side: the tool that turns Supermemory’s stored facts into something the agent can actually base a recommendation on.


@function_tool
def recommend_next_reading(focus: str) -> str:
    """Fetch the reader's history and stable interests for a given focus area.

    Returns a context string the agent can use to recommend what to read next.
    This tool only surfaces what Supermemory knows about the reader; the agent
    is responsible for the actual recommendation.

    Args:
        focus: What the reader wants to study next (e.g. "machine learning",
            "web development", "anything"). Drives semantic search against past sessions.
    """
    print(f"[recommend_next_reading] focus={focus!r}")
    profile = memory_client.profile(container_tag=READER_ID, q=focus)  # static facts + recent activity + matches

    static_facts = profile.profile.static          # stable traits: skill level, long-running interests
    dynamic_facts = profile.profile.dynamic         # recent activity: what was studied lately
    matches = profile.search_results.results        # past sessions closest to this focus area

    print(
        f"[recommend_next_reading] static={len(static_facts)} "
        f"dynamic={len(dynamic_facts)} matches={len(matches)}"
    )

    sections = []
    if static_facts:
        sections.append("Stable interests and skill level:")        # things unlikely to change session to session
        sections.extend(f"- {fact}" for fact in static_facts)
    if dynamic_facts:
        sections.append("Recent reading activity:")                  # what the reader has studied lately
        sections.extend(f"- {fact}" for fact in dynamic_facts)
    if matches:
        sections.append("Closest matching past sessions:")           # semantically similar prior entries
        for r in matches[:5]:
            sections.append(f"- {r['memory']}")

    if not sections:
        return (
            "No matching reading history found for this reader yet. This may mean nothing has been "
            "logged on this topic, or a just-logged session hasn't finished indexing in Supermemory "
            "(it processes new memories asynchronously, on the order of seconds). "
            "Ask the reader about their goals, current skill level, and what they've studied recently."
        )
    return "\n".join(sections)

profile.profile.static and profile.profile.dynamic are Supermemory’s own categorization, static facts are things unlikely to change turn to turn, dynamic facts are recent activity, and this distinction is exactly what separates Supermemory from a plain RAG search over a document corpus. The fallback message when nothing is found explains the asynchronous indexing delay directly in the text the agent reads, so the agent has a chance to account for it rather than implying nothing was ever saved.

Defining the Agent and Its Instructions

With the three tools written, the next step is the agent itself, the system instructions that tell it when to call which tool, and the model it runs on. Create a file named reading_companion.py, also inside src.


import asyncio                            # Runner.run is async; the CLI loop awaits it per message
import json                               # read/write stats.json
import os                                 # read per-token cost overrides from the environment
import time                               # measure only the Runner.run() call's own duration
from datetime import datetime             # timestamp each stats record
from pathlib import Path                  # filesystem path for stats.json
from typing import Any, Dict, List        # type hints for stats records and aggregate summaries

from agents import Agent, ModelSettings, Runner, RunResult, SQLiteSession  # Agents SDK pieces used below
from memory_tools import record_reading_session, record_reader_fact, recommend_next_reading  # all three Supermemory-backed tools

PROJECT_ROOT = Path(__file__).resolve().parent.parent          # repo root, one level above src/

OPENAI_CALL_METADATA = {                  # forwarded to OpenAI's actual API, visible in the OpenAI usage dashboard
    "dev_name":    "Ganesh",
    "project":     "codex-test",
    "environment": "local",
    "purpose":     "testing",
}

OPENAI_COST_RATES = {                     # USD per token, keyed by model name, overridable via .env
    "gpt-4o-mini": {
        "input":  float(os.environ.get("GPT_4O_MINI_INPUT_COST",  0.00000015)),
        "output": float(os.environ.get("GPT_4O_MINI_OUTPUT_COST", 0.00000060)),
    },
}

OPENAI_CALL_METADATA gets attached to every real OpenAI call this agent makes, useful for filtering usage in OpenAI’s own dashboard when you have several projects sharing one account. OPENAI_COST_RATES mirrors the same per-token rate table used elsewhere in this series, kept overridable through environment variables rather than hardcoded, in case pricing changes later.

Now the instructions themselves, the part that actually decides the agent’s behavior turn to turn.


AGENT_INSTRUCTIONS = """You are a reading companion who tracks what the reader studies and
recommends what to read next.

You have no memory of the reader's history on your own. Every fact about the reader lives in
Supermemory and reaches you only through tool calls.

Three rules, no exceptions:

1. Whenever the reader reports finishing or studying something, call record_reading_session
immediately, before responding. Extract the topic, source, duration in minutes, and a 1-5
comprehension self-rating from what they said. If a value is missing, ask one short follow-up
question instead of guessing. After logging, confirm in one short sentence and stop. Do NOT
recommend what to read next unless the reader asks for it.

2. Whenever the reader states a standalone fact about themselves that is not part of reporting
a specific reading session, stated skill level, a long-running interest, a goal, or a constraint
like preferring short articles, call record_reader_fact for that fact, in addition to (not instead
of) calling record_reading_session if the same message also reports a session. A single message
can require both tool calls; make both before responding if both apply.

3. When the reader explicitly asks what to read next (or asks for a recommendation or
suggestion), call recommend_next_reading first. Never recommend from your own training data.
The tool returns the reader's recent activity, stable interests, and matching past sessions.
Reference those facts directly in your reply.

Keep replies concise (2-4 sentences). Be specific: name the topic or source. Honor any stated
skill level or time constraints the tool surfaces.
""".strip()


def create_reading_agent() -> Agent:
    return Agent(
        name="Curator",                                       # the agent's display name, distinct from any tool
        instructions=AGENT_INSTRUCTIONS,
        tools=[record_reading_session, record_reader_fact, recommend_next_reading],  # the only three ways this agent touches Supermemory
        model="gpt-4o-mini",                                    # far cheaper than the tutorial's gpt-4, same SDK
        model_settings=ModelSettings(metadata=OPENAI_CALL_METADATA),  # attached to every real OpenAI call this agent makes
    )

Rule 2 was not part of the first version of these instructions. The original two-rule version is exactly what produced the dropped skill-level fact: a message that both reported a session and stated a standalone fact only triggered record_reading_session, and the standalone part was simply never written anywhere. Rule 2 explicitly tells the agent both tool calls can apply to a single message, which is what actually closed the gap, covered in detail in the failures section below.

Tracking Real Cost, Tokens, and Timing

The next step is making every chat turn accountable: real tokens, real cost, real response time, and the actual prompt and response text, not just a vague sense that the agent worked. The OpenAI Agents SDK does the heavy lifting here too, since Runner.run()’s result already carries summed token usage across every OpenAI call a turn makes, including any tool-calling round trips.


def build_turn_cost_record(result: RunResult, model_name: str, prompt_text: str, generation_seconds: float) -> Dict[str, Any]:
    # One record per chat turn. A single turn can involve multiple OpenAI calls if the agent
    # invokes a tool and then responds, so context_wrapper.usage is already summed across those.
    usage = result.context_wrapper.usage                          # real token counts for this whole turn
    rates = OPENAI_COST_RATES.get(model_name, {"input": 0, "output": 0})  # unknown models cost $0 rather than raising
    input_cost = usage.input_tokens * rates["input"]
    output_cost = usage.output_tokens * rates["output"]
    return {
        "timestamp":          datetime.now().isoformat(),         # when this turn was logged
        "model":               model_name,                        # which model actually served this turn
        "prompt":              prompt_text,                        # the reader's raw input message for this turn
        "response":            result.final_output,                # the agent's final reply for this turn
        "generation_seconds":  round(generation_seconds, 3),       # wall-clock time of just the Runner.run() call
        "requests":           usage.requests,                      # how many OpenAI calls this single turn made
        "input_tokens":       usage.input_tokens,
        "output_tokens":      usage.output_tokens,
        "total_tokens":       usage.total_tokens,
        "input_cost":         round(input_cost, 7),                # rounded for a readable stats.json, not billing
        "output_cost":        round(output_cost, 7),
        "total_cost":         round(input_cost + output_cost, 7),  # input plus output, the number most people want
    }

result.context_wrapper.usage is the field that makes this possible without parsing a raw HTTP response ourselves, the Agents SDK accumulates it internally across however many OpenAI calls a single Runner.run() invocation actually makes. Saving prompt_text and result.final_output alongside the numbers means stats.json doubles as a readable transcript, not just a cost ledger.

With one turn’s record defined, the next two functions handle accumulating those records across the whole lifetime of the agent, the same pattern used for cost tracking throughout this series.


def summarize_session_costs(records: List[Dict[str, Any]]) -> Dict[str, Any]:
    # Re-derived from the full record list every time rather than kept as a running counter,
    # so this function alone is the single source of truth for what the totals mean.
    return {
        "total_turns":              len(records),                                      # how many chat turns this list covers
        "total_requests":           sum(r["requests"] for r in records),               # OpenAI calls across every turn
        "total_input_tokens":       sum(r["input_tokens"] for r in records),
        "total_output_tokens":      sum(r["output_tokens"] for r in records),
        "total_tokens":             sum(r["total_tokens"] for r in records),
        "total_generation_seconds": round(sum(r["generation_seconds"] for r in records), 3),  # summed response-gen time
        "total_cost":               round(sum(r["total_cost"] for r in records), 6),    # summed USD cost, all turns
    }


def record_session_cost(stats_path: Path, record: Dict[str, Any]) -> None:
    try:                                                            # load history written by previous sessions
        existing = json.loads(stats_path.read_text(encoding="utf-8"))  # parse whatever was written last time
        records = existing.get("turns", [])                          # every chat turn recorded so far
    except (FileNotFoundError, json.JSONDecodeError):
        records = []                                                  # first run — start with empty history

    records.append(record)                                            # this turn now joins the lifetime history
    output = {
        "summary": {
            "timestamp": datetime.now().isoformat(),                # when stats.json was last written
            **summarize_session_costs(records),                      # lifetime totals across every turn ever made
        },
        "turns": records,                                              # every individual turn ever recorded
    }
    stats_path.parent.mkdir(parents=True, exist_ok=True)              # in case stats.json lives in a new folder
    stats_path.write_text(json.dumps(output, indent=2), encoding="utf-8")  # overwrite with the updated lifetime history

record_session_cost reads back whatever was already in stats.json, appends the new turn, and rewrites the whole file with fresh lifetime totals, so the file is always a complete, self-consistent history rather than something that needs separate runs stitched together.

The Chat Loop

Everything so far is plumbing. This is the part that actually runs: a loop that reads a message, hands it to the agent, times the response, logs it, and prints the reply.


async def run_chat_loop() -> None:
    agent = create_reading_agent()
    session = SQLiteSession(session_id="reading-companion-cli")   # short-term turn history, kept locally in SQLite
    stats_path = PROJECT_ROOT / "stats.json"                       # accumulates real OpenAI cost across every turn

    print("Reading companion ready. Type a message, or 'exit' to quit.\n")
    while True:
        try:
            message = input("You: ").strip()
        except (EOFError, KeyboardInterrupt):                     # Ctrl+C / Ctrl+D exits cleanly, no traceback
            print()
            break
        if not message:
            continue
        if message.lower() in {"exit", "quit"}:
            break

        generation_start = time.monotonic()                          # marks only the response-generation window
        result = await Runner.run(agent, message, session=session)  # session carries this conversation's turns
        generation_seconds = time.monotonic() - generation_start    # excludes input() wait time, just the API call
        record_session_cost(
            stats_path, build_turn_cost_record(result, agent.model, message, generation_seconds)
        )  # log prompt, response, timing, and cost
        print(f"\nCurator: {result.final_output}\n")


if __name__ == "__main__":
    asyncio.run(run_chat_loop())

SQLiteSession gives the agent short-term memory of the current conversation, what was said earlier in this chat, entirely separate from Supermemory, which is the reader’s long-term memory across every conversation that has ever happened.

generation_start/generation_seconds deliberately wrap only the Runner.run() call, not the input() prompt before it, so the timing reflects actual model latency, not however long a human took to type.

Running It

Create the virtual environment and install the real SDKs:


python -m venv venv
venv\Scripts\activate
pip install -r requirements.txt

Then run the chat loop:


python src\reading_companion.py

Output

Who Can Benefit

Developers building their first agent on top of a hosted memory API instead of rolling their own RAG pipeline

Students who want to see the real difference between an agent’s short-term session history and a reader’s long-term memory across every conversation
Anyone integrating the OpenAI Agents SDK who wants a concrete example of pulling real token usage and cost out of RunResult instead of guessing
Teams evaluating Supermemory who want an honest account of its async indexing behavior before relying on it in a latency-sensitive flow
Writers who want a private reading or learning tracker that recommends based on real history, not a generic suggestion

How Codersarts Can Help

If you want to take this further, Codersarts offers hands-on support at every stage.

For learners: Live 1-to-1 sessions with an AI engineer who can walk through agent memory architecture, the OpenAI Agents SDK, and debugging strategies for hosted, asynchronous APIs in detail.
For teams: End-to-end development of memory-backed agent tooling, including tool design, cost tracking, and reliability testing against real third-party services.
For enterprises: Architecture consulting for agent memory systems, including evaluating hosted memory APIs against self-hosted RAG for latency, cost, and data ownership requirements.

Reach out at contact@codersarts.com or visit www.codersarts.com to get started.

Continue Your AI Learning Journey with Codersarts

If you enjoyed this article and would like to discover more about modern AI applications, production-ready LLM systems, and real-world RAG and MCP implementations, be sure to explore these other blogs from Codersarts:

Build a Cost-Efficient Writing Quality Checker with Tiered Model Routing and OpenAI
https://www.codersarts.com/post/build-a-cost-efficient-writing-quality-checker-with-tiered-model-routing-and-openai

Build Your First A2A Agent: An Email Drafting Pipeline Using Python and OpenAI
https://www.codersarts.com/post/build-your-first-a2a-agent-an-email-drafting-pipeline-using-python-and-openai

Building an AI Interview Prep Agent with Qwen 3.7 Max and Streamlit
https://www.codersarts.com/post/building-an-ai-interview-prep-agent-with-qwen-3-7-max-and-streamlit

Academic Research Assistance and Literature Review Automation Using RAG
https://www.codersarts.com/post/academic-research-assistance-and-literature-review-automation-using-rag

Clinical Decision Support Systems Using RAG: Intelligent Diagnostic Assistance for Healthcare
https://www.codersarts.com/post/clinical-decision-support-systems-using-rag-healthcare-with-intelligent-diagnostic-assistance

Financial Decision Making with RAG Powered Market Intelligence
https://www.codersarts.com/post/financial-decision-making-with-rag-powered-market-intelligence