Build a Personal Book Tracker with Mem0 and OpenAI

12 minutes ago
15 min read

Introduction

Most chatbots forget everything the moment a session ends. Ask one for a book recommendation today and tomorrow it has no memory of what you already read, rated, or disliked, so it falls back to generic suggestions that ignore your actual taste.

In this tutorial we build a personal book tracker using Mem0’s open-source local memory layer and the OpenAI Agents SDK. You tell it what you have read and how you felt about it, and it remembers that across every future session, stored entirely on your own machine with no cloud account and no per-memory fee.

What We Are Building

A single conversational agent backed by four tools that read and write to a local Mem0 memory store. The workflow:

Tell the agent about a book you finished and your rating
Store that fact permanently, tagged to your reader ID
Ask what you have read or what you should read next
Search your stored facts before the agent answers, so recommendations are grounded in real history
Revise or delete a fact when your opinion changes

Tech Stack

Component	Tool
Memory layer	Mem0 local Memory class (open-source, on-disk)
Vector store	Qdrant (local, file-based, no server)
Agent framework	OpenAI Agents SDK
Model	OpenAI gpt-4o-mini
UI	Streamlit

Pricing

Mem0’s local Memory class is completely free. There is no Mem0 account, no MEM0_API_KEY, and no cloud service involved, memories live on disk in a local Qdrant database. The only cost is the OpenAI API: gpt-4o-mini at $0.15 per million input tokens and $0.60 per million output tokens, used both for the agent’s own reasoning and for Mem0’s internal fact extraction. Every call is logged to stats.json with prompt tokens, completion tokens, generation time, and computed cost.

Project Structure

Create a file named requirements.txt in the project root:


mem0ai            # open-source Mem0 SDK, local Memory class and Qdrant integration
openai-agents      # OpenAI's Agents SDK, provides Agent, Runner, and function_tool
python-dotenv      # loads OPENAI_API_KEY and other config from .env
streamlit          # renders the chat UI

Also install these

pip install mem0ai[nlp]
pip install "mem0ai[extras]"

mem0ai is the open-source Mem0 SDK, openai-agents is OpenAI’s own Agents SDK for building tool-using agents, python-dotenv loads configuration from .env, and streamlit renders the chat interface.

Create a file named .env in the project root:


OPENAI_API_KEY=your_openai_api_key_here     # your own OpenAI secret key, never commit this file
OPENAI_MODEL=gpt-4o-mini                    # model used for both the agent and Mem0's internal fact extraction
GPT_4O_MINI_INPUT_COST=0.00000015           # USD per input token, used to compute stats.json costs
GPT_4O_MINI_OUTPUT_COST=0.00000060          # USD per output token, used to compute stats.json costs

Every configurable value, the API key, model name, and per-token cost rates, lives in .env so nothing needs to change in code to adjust pricing or switch models.

The finished project looks like this:


mem0_book_tracker/
├── book_tracker.py    # Memory config, four tools, the agent, and ask_tracker()
├── tracker_ui.py       # Streamlit chat interface
├── requirements.txt    # mem0ai, openai-agents, python-dotenv, streamlit
├── .env                # OPENAI_API_KEY, OPENAI_MODEL, cost rates
├── stats.json           # created on first run, accumulates token and cost data
└── qdrant_db/           # created on first run, stores all memories persistently

Install dependencies and run:


python -m venv venv               # create an isolated Python environment in ./venv
venv\Scripts\activate              # activate it so pip installs land inside venv, not system-wide
pip install -r requirements.txt   # install mem0ai, openai-agents, python-dotenv, and streamlit

Building the Memory Layer and Tools

Everything for this project lives in a single file. We start with the imports, configuration, and the local Mem0 setup that every tool shares.

Create a file named book_tracker.py:


import json                               # read and write stats.json
import os                                # read OPENAI_MODEL and cost rates from the environment
import time                              # measure wall-clock time for each ask_tracker() call
from dataclasses import dataclass        # lightweight value object to carry the reader id through tool calls
from datetime import datetime            # timestamp every stats record
from pathlib import Path                 # resolve the stats.json path relative to this file
from dotenv import load_dotenv           # load .env before any os.environ.get() call
from mem0 import Memory                  # local Mem0 — no cloud API key, no account, completely free
from agents import Agent, ModelSettings, Runner, function_tool, RunContextWrapper  # OpenAI Agents SDK

load_dotenv()                            # must run before os.environ.get() reads OPENAI_MODEL

PROJECT_ROOT = Path(__file__).resolve().parent   # directory containing this file
STATS_FILE = PROJECT_ROOT / "stats.json"          # accumulates across every session, never overwritten

load_dotenv() reads .env into the process environment so every os.environ.get() call below can see it. PROJECT_ROOT and STATS_FILE are computed once so the stats log always lands next to this script regardless of which directory the app is launched from.

Next we define the metadata that gets attached to every OpenAI API call, and the per-token cost rates used to compute spend.


# ── OpenAI metadata (forwarded to every API call, visible in the OpenAI dashboard) ───
OPENAI_CALL_METADATA = {
    "dev_name":    "Ganesh",       # identifies who triggered the call, shown in OpenAI's usage dashboard
    "project":     "codex-test",   # groups usage under this project label
    "environment": "local",        # distinguishes local dev calls from staging or production
    "purpose":     "testing",      # flags these calls as non-production traffic
}

# ── Per-token cost rates (read from .env so they can be updated without code changes) ─
_INPUT_COST  = float(os.environ.get("GPT_4O_MINI_INPUT_COST",  0.00000015))  # $0.15 per million input tokens
_OUTPUT_COST = float(os.environ.get("GPT_4O_MINI_OUTPUT_COST", 0.00000060)) # $0.60 per million output tokens

MODEL = os.environ.get("OPENAI_MODEL", "gpt-4o-mini")  # model for the agent's own reasoning turns

OPENAI_CALL_METADATA is a plain dictionary that OpenAI attaches to usage records in its dashboard, useful for tracking which project or environment generated a given call. The cost rates default to gpt-4o-mini’s published pricing but are read from .env first, so updating a rate never requires touching code.

Now we configure Mem0 itself. This project uses the free, local Memory class rather than Mem0’s paid cloud service.


# ── Local Mem0 configuration ─────────────────────────────────────────────────────────
# Using the open-source local Memory class (no MEM0_API_KEY required).
# Memories are stored in a local Qdrant DB in ./qdrant_db and never leave the machine.
MEMORY_CONFIG = {
    "llm": {                                # the model Mem0 uses internally to turn raw text into structured facts
        "provider": "openai",               # use OpenAI's chat completion API for fact extraction
        "config": {
            "model": MODEL,              # same model the agent uses, read from .env
            "temperature": 0.1,          # low temp for consistent memory extraction
        },
    },
    "embedder": {                            # the model Mem0 uses to convert facts into vectors for similarity search
        "provider": "openai",               # use OpenAI's embeddings API
        "config": {
            "model": "text-embedding-3-small",   # cheapest OpenAI embedding model, adequate for book facts
        },
    },
    "vector_store": {                        # where the vectors and their metadata are actually stored
        "provider": "qdrant",               # use Qdrant, a local vector database, instead of a remote service
        "config": {
            "collection_name": "book_memories",  # all facts for all readers live in this one collection
            "path": "./qdrant_db",               # on-disk Qdrant, no separate server needed, survives restarts
        },
    },
}

book_memory = Memory.from_config(MEMORY_CONFIG)  # the shared in-process memory store used by every tool

MEMORY_CONFIG has three parts: an llm for turning raw text into structured facts, an embedder for turning facts into vectors, and a vector_store that tells Mem0 to use an on-disk Qdrant database instead of a remote server. Memory.from_config builds a single shared instance that every tool below calls into, and book_memory reads and writes are automatically namespaced by whichever user_id you pass in.

The agent needs a way to know which reader it is talking to on every tool call. That is what this small dataclass carries.


# ── Reader context dataclass ──────────────────────────────────────────────────────────
@dataclass                                                # auto-generates __init__ and __repr__ for this value object
class ReaderProfile:                                       # carried through every tool call via RunContextWrapper
    reader_id: str    # unique identifier for this reader, passed through every tool call via RunContextWrapper

ReaderProfile is a single-field dataclass. The OpenAI Agents SDK passes this object into every tool function through RunContextWrapper, so each tool always knows whose memories to search, add, update, or delete without the model needing to pass a reader ID as a parameter.

With the memory store and context object in place, we can now define the four tools the agent will call.


# ── Tools ─────────────────────────────────────────────────────────────────────────────
@function_tool                                            # registers this plain function as a callable tool for the model
def fetch_reading_history(ctx: RunContextWrapper[ReaderProfile], query: str) -> str:  # tool the model calls to search this reader's stored facts
    """Search the reader's stored book facts, ratings, and preferences."""
    try:                                                   # wrap the whole search so a Mem0-side failure never crashes the agent run
        found = book_memory.search(                          # vector search over the reader's stored facts
            query,
            filters={"user_id": ctx.context.reader_id},     # mem0 v2 API: scope results via a filters dict
            limit=6,                                         # return at most 6 matching facts
        )
        entries = found.get("results") if isinstance(found, dict) else found  # handle both response shapes
        if entries:                                         # only build a formatted reply when the search actually matched something
            lines = []                                      # accumulate one formatted string per matching memory
            for e in entries:                               # walk every matching memory record returned by Mem0
                mem_text = e.get("memory") or e.get("text") or str(e)  # handle key name differences across mem0 versions
                lines.append(f"- [ID: {e['id']}] {mem_text}")   # include the ID so revise/erase tools can reference this memory later
            return "\n".join(lines)                         # join every formatted line into one newline-separated block
        return "No matching reading history found."         # no facts matched this reader and query
    except Exception as exc:                                # catch any Mem0 or network failure rather than letting it propagate
        return f"ERROR in fetch_reading_history: {exc}"   # surface the real error to the agent so we can debug

@function_tool turns this plain Python function into something the model can call by name. book_memory.search runs a semantic vector search rather than an exact keyword match, so asking “what have I read” finds facts about specific books even though the words differ. The filters={"user_id": ...} argument is Mem0’s v2 API for scoping a search to one reader; passing user_id directly is rejected by newer Mem0 versions. Wrapping the whole body in try/except means a Mem0-side error becomes a visible string the agent can react to instead of crashing the entire conversation.

Next, the tool that stores a brand-new fact about the reader.


@function_tool                                            # registers this plain function as a callable tool for the model
def record_book_fact(ctx: RunContextWrapper[ReaderProfile], fact: str) -> str:  # tool the model calls to store a new fact
    """Store a new fact about the reader: a book they read, a rating, a genre preference, or an author opinion."""
    book_memory.add(                                        # write a new memory into the local Qdrant-backed store
        [{"role": "user", "content": fact}],   # Mem0 expects a messages list, same shape as OpenAI
        user_id=ctx.context.reader_id,          # tag the memory to this reader
    )
    return "Stored."                                        # short confirmation string handed back to the agent

book_memory.add expects a list of message dictionaries in the same shape OpenAI’s chat API uses, even though we are only ever passing a single fact. Internally, Mem0 runs its own LLM call to extract a clean, structured memory from that raw sentence before saving it, which is why MEMORY_CONFIG needed an llm block earlier.

Facts change over time, so the agent needs a way to update one without deleting and re-adding it.


@function_tool                                            # registers this plain function as a callable tool for the model
def revise_book_fact(ctx: RunContextWrapper[ReaderProfile], fact_id: str, revised_text: str) -> str:  # tool the model calls to update a memory
    """Update an existing memory when the reader's opinion or status changes. Requires the memory ID."""
    book_memory.update(memory_id=fact_id, data=revised_text)   # update by ID, not by re-adding
    return f"Updated memory {fact_id}."                     # confirmation string, includes the ID that was changed

book_memory.update takes the exact memory ID returned by an earlier fetch_reading_history call and overwrites its content in place, preserving the same ID rather than creating a duplicate entry.

Finally, a tool to remove a fact entirely.


@function_tool                                            # registers this plain function as a callable tool for the model
def erase_book_fact(ctx: RunContextWrapper[ReaderProfile], fact_id: str) -> str:  # tool the model calls to delete a memory
    """Delete a specific memory by its ID."""
    book_memory.delete(memory_id=fact_id)                   # permanently remove this memory from the local Qdrant store
    return f"Deleted memory {fact_id}."                     # confirmation string, includes the ID that was removed

book_memory.delete removes the memory permanently from the local Qdrant store. Like revise_book_fact, it requires an ID, which the agent is instructed to obtain from fetch_reading_history first.

Building the Agent

With all four tools defined, we assemble the agent itself: its model, the metadata attached to every call it makes, and the instructions that tell it how to use its tools correctly.


# ── Agent ─────────────────────────────────────────────────────────────────────────────
reading_companion = Agent(
    name="BookTracker",                                            # internal agent identifier, shown in traces and logs
    model=MODEL,                                                   # which OpenAI model drives this agent's reasoning
    model_settings=ModelSettings(metadata=OPENAI_CALL_METADATA),  # attached to every OpenAI call this agent makes
    instructions="""You are a personal reading companion with persistent memory.

Your job:
- Help the reader track books they have read, their ratings, and preferences.
- Recommend what to read next based on their stored history and tastes.
- Remember genres they love or avoid, authors they enjoy, and their mood for reading.

Rules:
- Always call fetch_reading_history before answering any question about the reader's history or preferences.
- When the reader tells you about a book they finished, call record_book_fact to store it.
- When information changes (e.g. "I changed my mind about that author"), use fetch_reading_history to find the existing memory ID, then call revise_book_fact with that ID.
- Memory IDs appear in the format [ID: xxx] in search results.
- Keep responses concise: 2-4 sentences for factual recall, up to 5 book recommendations with a one-line reason each.
""",
    tools=[fetch_reading_history, record_book_fact, revise_book_fact, erase_book_fact],  # every function the model may call
)

ModelSettings(metadata=OPENAI_CALL_METADATA) is what forwards the dev name, project, environment, and purpose tags to every OpenAI API call this specific agent makes. The instructions string is the agent’s system prompt: it explicitly tells the model to call fetch_reading_history before answering history questions, and to look up a memory’s ID before attempting to revise it, since the model cannot guess an ID on its own. The tools list registers all four functions so the model can choose which one to call based on the conversation.

Stats Tracking

Every call to the agent should leave a permanent record of what it cost, how long it took, and how many tokens it used. This function handles writing that record to stats.json.


# ── Stats logging ─────────────────────────────────────────────────────────────────────
def _write_stats(record: dict) -> None:                  # appends one interaction record and recomputes lifetime totals
    try:                                                   # guard against a missing or corrupted stats.json
        # load whatever is already on disk; start with an empty structure if stats.json does not exist yet
        existing = json.loads(STATS_FILE.read_text(encoding="utf-8")) if STATS_FILE.exists() else {"summary": {}, "interactions": []}
    except (json.JSONDecodeError, OSError):
        existing = {"summary": {}, "interactions": []}   # corrupt or unreadable file: start fresh rather than crash

    existing["interactions"].append(record)               # accumulate: never overwrite, always append the new record
    interactions = existing["interactions"]               # local alias, avoids repeated dict lookups below
    existing["summary"] = {
        "timestamp":                datetime.now().isoformat(),                                            # when this summary was last recomputed
        "total_interactions":       len(interactions),                                                      # lifetime count of every ask_tracker() call
        "total_prompt_tokens":      sum(i.get("prompt_tokens", 0) for i in interactions),                   # sum of input tokens across all calls
        "total_completion_tokens":  sum(i.get("completion_tokens", 0) for i in interactions),               # sum of output tokens across all calls
        "total_tokens":             sum(i.get("total_tokens", 0) for i in interactions),                    # combined input and output tokens
        "total_generation_seconds": round(sum(i.get("generation_seconds", 0) for i in interactions), 3),    # cumulative wall-clock time spent generating
        "total_input_cost":         round(sum(i.get("input_cost", 0) for i in interactions), 6),            # cumulative USD cost of input tokens
        "total_output_cost":        round(sum(i.get("output_cost", 0) for i in interactions), 6),           # cumulative USD cost of output tokens
        "total_cost":               round(sum(i.get("total_cost", 0) for i in interactions), 6),            # lifetime cost across every interaction
    }
    STATS_FILE.write_text(json.dumps(existing, indent=2, ensure_ascii=False), encoding="utf-8")   # atomic overwrite of the whole file

The function first reads whatever already exists in stats.json, falling back to an empty structure if the file is missing or corrupt. It appends the new record to the interactions list, then recomputes the entire summary block from scratch by summing across every interaction ever recorded, so the summary always reflects true lifetime totals rather than an incrementally-updated (and potentially drifting) counter.

Finally, the public function the UI calls: it runs the agent, times the call, extracts real usage numbers, computes cost, and logs everything before returning the reply.


# ── Public entry point ────────────────────────────────────────────────────────────────
def ask_tracker(user_message: str, reader_id: str) -> str:  # public function the Streamlit UI calls for every message
    """Send one message to the reading companion, log stats, and return the response."""
    start = time.monotonic()                              # start the clock before the model call begins
    outcome = Runner.run_sync(                              # drive the whole agent loop synchronously until a final answer
        reading_companion,                                  # the Agent instance defined above
        user_message,                                       # the raw text the reader typed in the UI
        context=ReaderProfile(reader_id=reader_id),   # inject the reader id into every tool call
    )
    elapsed = time.monotonic() - start                    # total wall-clock seconds the whole run took

    usage = outcome.context_wrapper.usage             # real token counts from the Agents SDK
    prompt_tokens     = getattr(usage, "input_tokens", 0) or 0       # input token count, 0 if the SDK did not report one
    completion_tokens = getattr(usage, "output_tokens", 0) or 0      # output token count, 0 if the SDK did not report one
    total_tokens      = getattr(usage, "total_tokens", prompt_tokens + completion_tokens) or 0  # combined token count
    input_cost  = round(prompt_tokens * _INPUT_COST, 7)   # USD cost of the input tokens for this call
    output_cost = round(completion_tokens * _OUTPUT_COST, 7)  # USD cost of the output tokens for this call

    _write_stats({                                          # append one record describing this exact interaction
        "timestamp":          datetime.now().isoformat(),    # when this interaction happened
        "reader_id":          reader_id,                     # which reader this interaction belongs to
        "model":              MODEL,                         # which model produced the response
        "prompt":             user_message,                  # the reader's raw message
        "response":           outcome.final_output,           # the agent's final text reply
        "generation_seconds": round(elapsed, 3),              # how long this specific call took
        "prompt_tokens":      prompt_tokens,                  # input tokens used by this call
        "completion_tokens":  completion_tokens,              # output tokens used by this call
        "total_tokens":       total_tokens,                   # combined tokens used by this call
        "input_cost":         input_cost,                     # USD cost of input tokens for this call
        "output_cost":        output_cost,                    # USD cost of output tokens for this call
        "total_cost":         round(input_cost + output_cost, 7),  # combined USD cost for this call
    })

    return outcome.final_output                            # hand the reply back to the Streamlit UI to display

Runner.run_sync drives the entire agent loop synchronously: sending the message, letting the model decide which tools to call, executing those tools, and looping until the model produces a final answer. context=ReaderProfile(reader_id=reader_id) is what makes ctx.context.reader_id available inside every tool function shown earlier.

outcome.context_wrapper.usage exposes the real token counts the SDK tracked across the whole run, including any tool-calling round trips, which is what makes the cost figures in stats.json accurate rather than estimated.

The Streamlit UI

The UI is a standard Streamlit chat interface: a text input for the reader ID, a running message history, and a chat input box that calls ask_tracker on every submission.

Create a file named tracker_ui.py:


import streamlit as st                       # the chat UI
from dotenv import load_dotenv              # load .env so OPENAI_API_KEY is available
from book_tracker import ask_tracker        # the single function the UI needs

load_dotenv()                               # must run before any import that reads OPENAI_API_KEY

st.set_page_config(page_title="Book Tracker", page_icon="📚", layout="centered")  # browser tab title and icon

# override the hardcoded red user avatar color that Streamlit applies by default
st.markdown("""
<style>
[data-testid="stChatMessageAvatarUser"] {
    background-color: #2563EB !important;
}
</style>
""", unsafe_allow_html=True)                # inject raw CSS, unsafe_allow_html is required for <style> to take effect

st.title("📚 Personal Book Tracker")                                              # large heading at the top of the page
st.caption("Tell me what you've read, rate it, and I'll help you decide what to read next.")  # small grey subtitle text

# ── Reader ID ────────────────────────────────────────────────────────────────
reader_id = st.text_input(                              # text box the reader types their ID into
    "Your reader ID",                                    # label shown above the input box
    value="reader_01",                                   # default value pre-filled on first load
    help="Use the same ID every session to keep your reading history persistent.",  # tooltip shown on hover
)

# ── Chat history ─────────────────────────────────────────────────────────────
if "messages" not in st.session_state:                  # runs only once per browser session, on first load
    st.session_state.messages = []          # conversation history for display only; actual memory lives in Mem0

for msg in st.session_state.messages:                   # redraw every past message on each Streamlit rerun
    with st.chat_message(msg["role"]):                  # "user" or "assistant" bubble styling, chosen automatically
        st.markdown(msg["content"])                     # render the stored message text as markdown

# ── Input ─────────────────────────────────────────────────────────────────────
if prompt := st.chat_input("Tell me about a book or ask for a recommendation..."):  # blocks until the reader submits text
    st.session_state.messages.append({"role": "user", "content": prompt})   # record the reader's message for display
    with st.chat_message("user"):                       # render the reader's own bubble immediately
        st.markdown(prompt)

    with st.chat_message("assistant"):                  # render the agent's reply in an assistant-styled bubble
        with st.spinner("Thinking..."):                 # shows a spinning indicator while the agent call runs
            reply = ask_tracker(prompt, reader_id)   # call the book tracker with the reader's id
        st.markdown(reply)                              # render the agent's final text reply as markdown
    st.session_state.messages.append({"role": "assistant", "content": reply})  # record the reply for display on rerun

st.set_page_config sets the browser tab title and icon before anything else renders. The st.markdown block right after it injects custom CSS that targets Streamlit’s built-in user avatar element and overrides its default red background with blue, since Streamlit hardcodes that color independently of the app theme. reader_id defaults to "reader_01" and must stay the same across sessions for a given person, since Mem0 scopes every fact to whatever ID is passed in.

st.session_state.messages only controls what is displayed on screen in the current browser tab; the actual long-term memory lives entirely in the local Qdrant database, which is why closing and reopening the app still remembers everything as long as the same reader ID is used.

The blue accent color and cream background come from a small Streamlit theme file placed alongside the app.

Create a file named .streamlit/config.toml:


[theme]
primaryColor = "#2563EB"              # buttons and interactive accents, replaces Streamlit's default red
backgroundColor = "#FEFCE8"           # main page background, warm cream instead of plain white
secondaryBackgroundColor = "#FFFFFF"  # input boxes, checkboxes, and the sidebar
textColor = "#1C1917"                 # default text color across the whole app

Streamlit reads this file automatically on startup. primaryColor controls buttons and interactive accents, backgroundColor sets the main page background, secondaryBackgroundColor controls input boxes, checkboxes, and the sidebar, and textColor sets the default text color, together replacing Streamlit’s default red theme with blue and cream.

Running the Application


streamlit run tracker_ui.py

Open http://localhost:8501, type a reader ID, and start talking.

stats.json in the project root accumulates a record for every message sent, with prompt text, response text, token counts, cost, and generation time. The summary block at the top reflects lifetime totals across every reader and every session since the file was first created.

Who Can Benefit

Readers who want book recommendations grounded in what they have actually read rather than generic bestseller lists.

Developers exploring Mem0’s local memory layer before committing to any paid cloud memory service.
Students learning how persistent, per-user memory can be added to an OpenAI Agents SDK application with a handful of tool functions.
Book club organizers or librarians tracking reading history and preferences for multiple people through separate reader IDs.
Engineers evaluating Mem0 against other memory solutions who want real cost and latency numbers from a working local deployment.

How Codersarts Can Help

If you want to take this further, Codersarts offers hands-on support at every stage.

For learners: Live 1-to-1 sessions with an AI engineer who can walk through Mem0’s local architecture, vector search internals, and how the OpenAI Agents SDK wires context into tool calls.
For teams: End-to-end development of memory-backed conversational agents, including schema design for what to remember, prompt engineering for reliable tool use, and cost optimisation.
For enterprises: Architecture consulting for scaling local memory to a shared team deployment, adding authentication per reader, and migrating from local Qdrant to a managed vector database when needed.