Building an AI Book Recommender with Kimi K2 and Streamlit

7 hours ago
8 min read

Introduction

Finding the next great book is harder than it sounds. Generic bestseller lists ignore your taste, and search engines return the same ten titles for every query. What most readers need is a recommendation that actually understands them — their preferred themes, emotional tone, narrative pace, and the books they already love.

In this tutorial, we build an AI-powered Book Recommender using Kimi K2, Moonshot AI’s flagship agentic model. The user describes their reading taste in plain language and the agent returns a curated list of 6 books, each with a personalised reason, plus a suggested reading order.

What makes this more than a single API call is the 3-phase multi-turn workflow: the model first builds a deep reader profile, then selects books against that profile, and finally arranges them into a thoughtful reading sequence. Each phase builds directly on the previous one, producing output that feels genuinely reasoned rather than randomly generated.

What We’re Building

The agent follows a structured 3-phase workflow:

Phase	Stage	What Happens
1	Profile	Analyses the reader’s taste and identifies key reading traits
2	Recommendations	Selects 6 books matched precisely to the profile
3	Reading Order	Arranges the books into an optimal reading sequence

Each phase feeds into the next. The book selection in Phase 2 is directly grounded in the traits identified in Phase 1, and the reading order in Phase 3 considers the emotional arc and difficulty curve of the specific 6 books chosen — not a generic ordering.

The result is a Streamlit web app where users describe their taste, click a button, and receive a downloadable reading list.

Tech Stack

Component	Tool
AI Model	Kimi K2 (kimi-k2)
API Provider	Moonshot (official Kimi API)
API Client	openai Python SDK (OpenAI-compatible)
UI Framework	Streamlit
Env Management	python-dotenv

Why Kimi K2? It supports a 256K-token context window and is purpose-built for agentic workflows. Long reader descriptions, detailed book analysis, and multi-phase reasoning all fit comfortably within a single context.

Project Structure



kimi/
├── book_agent.py    # Agent logic — API calls, 3-phase workflow
├── app.py           # Streamlit UI
├── requirements.txt # Dependencies
└── .env             # API key and config (not committed to git)

Setting Up

1. Install Dependencies

pip install openai streamlit python-dotenv

2. Configure Environment

Create a .env file in the project folder:



KIMI_PROVIDER=moonshot
MOONSHOT_API_KEY=your_moonshot_api_key_here

Get your Moonshot API key at platform.moonshot.cn — sign up, navigate to API Keys, and generate a key.

Building the Agent — book_agent.py

Client Setup

We load credentials from .env and create a single API client pointed at Moonshot’s OpenAI-compatible endpoint. The openai SDK works here unchanged because Moonshot’s API mirrors the OpenAI interface.



import os
from dotenv import load_dotenv
from openai import OpenAI

load_dotenv()                                        # must run before os.getenv — injects .env values into the process environment

API_ENDPOINT = os.getenv("KIMI_BASE_URL", "https://api.moonshot.cn/v1")  # Moonshot's OpenAI-compatible base URL
SECRET_KEY   = os.getenv("MOONSHOT_API_KEY", "")                          # API key from platform.moonshot.cn
LLM_MODEL    = os.getenv("KIMI_MODEL", "kimi-k2")                         # default to kimi-k2; override via KIMI_MODEL in .env

llm = OpenAI(api_key=SECRET_KEY, base_url=API_ENDPOINT)  # single client instance reused across all three phases

We also define the model's persona as a system prompt that persists for the entire conversation:



CURATOR_PERSONA = (
    "You are an expert literary curator with deep knowledge across all genres, "
    "cultures, and eras of literature. You give thoughtful, personalised book "   # 'personalised' signals British-English style — intentional for literary tone
    "recommendations based on the reader's specific tastes. Be specific — always "
    "name real books with real authors."                                           # prevents the model from inventing fictional titles
)

The Core Chat Function

ask_model handles every API call in the agent. It appends the user message, sends the full conversation history, and returns both the reply and any internal reasoning the model produced.



def ask_model(conversation: list, user_text: str) -> tuple[str, str]:
    conversation.append({"role": "user", "content": user_text})   # add new message before sending — the full history goes with every call

    response = llm.chat.completions.create(
        model=LLM_MODEL,
        messages=conversation,   # full conversation so far — this gives the model memory of all prior phases
    )

    reply     = response.choices[0].message.content or ""                              # the model's visible response — always present
    reasoning = getattr(response.choices[0].message, "reasoning_content", "") or ""   # Kimi-specific internal chain-of-thought; getattr avoids AttributeError on providers that don't return it

    assistant_entry = {"role": "assistant", "content": reply}
    if reasoning:
        assistant_entry["reasoning_content"] = reasoning   # attach reasoning to history so later phases can reference prior thinking
    conversation.append(assistant_entry)

    return reply, reasoning   # reply goes to the UI; reasoning is stored and shown in the "Model Reasoning" expander

The 3-Phase Workflow



def curate_books(reader_taste: str, on_stage=None) -> dict:
    conversation = [{"role": "system", "content": CURATOR_PERSONA}]  # system message is always first — sets the literary curator persona for the entire session
    output = {"reader_taste": reader_taste, "stages": []}             # output dict accumulates all three phases — returned to the UI when complete

    # ## Phase 1: Reader Profile ###############################################
    # The model identifies 3-4 reading traits from the user's description.
    # These traits are stored in conversation so Phase 2 can ground its selections in them.
    if on_stage:
        on_stage("Building your reader profile...")   # updates the UI stage card — optional so the function works without a UI

    profile, reasoning1 = ask_model(
        conversation,
        f"A reader has described their taste as follows:\n\n\"{reader_taste}\"\n\n"  # embeds the raw user input verbatim so the model works from exact words, not a summary
        "Analyse their preferences. Identify 3-4 key reading traits "               # fixed number forces the model to be selective rather than listing everything
        "(e.g. preferred themes, pacing, narrative style, emotional tone) "          # examples guide the model toward the right abstraction level
        "and explain what each tells you about what they will enjoy."                # asking it to explain why ensures each trait is reasoned, not just labelled
    )
    output["stages"].append({"phase": "Profile", "content": profile, "reasoning": reasoning1})  # reasoning1 stored separately — shown in the UI expander, not mixed into main content

    # ## Phase 2: Recommendations ##############################################
    # The model selects 6 books. Because conversation contains Phase 1's trait analysis,
    # each selection is justified against a specific identified trait — not chosen generically.
    if on_stage:
        on_stage("Curating your reading list...")   # updates the UI stage card to show Phase 2 is active

    book_list, reasoning2 = ask_model(
        conversation,
        "Based on the reader profile you just built, recommend exactly 6 books. "  # "you just built" refers the model back to Phase 1 output already in conversation
        "For each book include: title, author, genre, and 2-3 sentences explaining "
        "precisely why it matches this reader's taste. "                            # "precisely why" prevents vague matches like "this is a good book"
        "Format each as:\n**[Title]** by [Author] *(Genre)*\n[Reason]"             # explicit format string so the UI can render markdown without post-processing
    )
    output["stages"].append({"phase": "Recommendations", "content": book_list, "reasoning": reasoning2})  # reasoning2 captured but not shown in the main UI — available for debugging

    # ## Phase 3: Reading Order ################################################
    # The model orders the specific 6 books it just chose — not a generic rule.
    # It considers emotional journey, difficulty curve, and thematic flow across those titles.
    if on_stage:
        on_stage("Suggesting a reading order...")   # updates the UI stage card to show Phase 3 is active

    reading_order, reasoning3 = ask_model(
        conversation,
        "Now suggest a reading order for the 6 books you recommended. "   # "you recommended" anchors the order to the exact 6 books in conversation — not a generic list
        "Number them 1-6 and give a one-sentence reason for each position — "
        "consider emotional journey, difficulty curve, and thematic flow."  # three explicit criteria prevent the model from defaulting to alphabetical or publication order
    )
    output["stages"].append({"phase": "Reading Order", "content": reading_order, "reasoning": reasoning3})  # final stage — after this the output dict is complete and returned to the UI

    return output   # all three phases are in output["stages"] — the UI iterates over this list to render each section

Building the UI — app.py

The UI uses a 3-card pipeline display that transitions from neutral to active to complete as each phase finishes, giving the user clear visual feedback on progress.



import os                              # reads KIMI_PROVIDER from environment after load_dotenv() runs
import streamlit as st
from dotenv import load_dotenv        # loads .env so os.getenv can read the provider setting in the UI
from book_agent import curate_books   # imports the 3-phase agent function from the backend module

load_dotenv()                         # must run before any os.getenv call — injects .env values into the process environment

st.set_page_config(
    page_title="Book Recommender",
    page_icon="📚",
    layout="wide",                    # full browser width — better for multi-column results layout
    initial_sidebar_state="collapsed" # no sidebar content — collapsed to avoid the slide animation
)

PHASES = [
    ("🧑‍🎨", "Profile",         "Analyse reading taste"),   # Phase 1 — reader trait analysis
    ("📖", "Recommendations", "Curate 6 books"),             # Phase 2 — book selection
    ("🗂️", "Reading Order",   "Suggest reading sequence"),  # Phase 3 — ordering
]   # PHASES is defined at module level so render_pipeline can reference it without parameters

The pipeline renderer updates the same placeholder on every progress callback, so the cards animate in place without re-rendering the whole page:



def render_pipeline(active_index: int, done_indices: list):
    cols = st.columns(3)                        # one column per phase — equal width
    for i, (icon, label, desc) in enumerate(PHASES):
        with cols[i]:
            if i in done_indices:
                st.success(f"{icon} **{label}**\n\n{desc}")    # green — phase complete
            elif i == active_index:
                st.info(f"{icon} **{label}**\n\n_{desc}_")     # blue — phase currently running
            else:
                st.container(border=True).markdown(            # neutral border — phase not yet started
                    f"{icon} **{label}**\n\n{desc}")

The input is a multi-line text area rather than a single-line field, giving readers space to describe their taste in natural, detailed language:



reader_taste = st.text_area(
    "Describe your reading taste",
    placeholder=(
        "e.g. I love slow-burn literary fiction with morally complex characters. "
        "I enjoyed The Secret History, Normal People, and anything by Kazuo Ishiguro."
    ),
    height=120,                          # tall enough for a detailed description — more input = better recommendations
    label_visibility="collapsed",        # label hidden — the placeholder is descriptive enough
)

run = st.button(
    "Get Recommendations",
    type="primary",                      # renders as a filled primary button — visually prominent
    disabled=not reader_taste.strip(),   # greyed out until the user types something; .strip() blocks whitespace-only input
)

Results are stored in st.session_state so they survive Streamlit re-renders without re-calling the agent:



with st.spinner(""):                  # shows a loading indicator while the three API calls complete
    result = curate_books(
        reader_taste.strip(),             # .strip() removes leading/trailing whitespace before sending to the model
        on_stage=on_stage                 # passes the progress callback so each phase updates the pipeline card
    )
    st.session_state["result"] = result   # store in session_state so results survive Streamlit re-renders (e.g. when download button is clicked)

Finally, the full output is available for download:



st.download_button(
    label="Download as Markdown",
    data=md,                              # md is a string built by concatenating all three phase outputs with markdown headers
    file_name="book_recommendations.md",  # fixed filename — the content already includes the reader's taste description at the top
    mime="text/markdown",                 # tells the browser this is a .md file — triggers correct file association on download
)

Running the App


streamlit run app.py

Open your browser at http://localhost:8501. Describe your reading taste in the text area (the more specific the better) and click Get Recommendations.

The agent works through its three phases, updating the pipeline cards as it goes, and displays the full reading list within seconds.

Output and What to Expect

For a reader who described enjoying slow-paced literary fiction with philosophical themes (similar to Dostoevsky and Hesse), the agent produces:

Profile: Traits such as “drawn to existential introspection”, “prefers dense, layered prose”, and “values moral ambiguity over resolution”
Recommendations: 6 books like Steppenwolf by Hermann Hesse, The Master and Margarita by Bulgakov, and Nausea by Sartre — each with a 2-3 sentence explanation tied directly to the identified traits
Reading Order: An ordered sequence that starts with the most accessible title and builds toward the most philosophically demanding, with a one-sentence rationale for each position

The model’s internal reasoning (visible in the “Model Reasoning” expander inside the Profile card) shows how it connects the reader’s stated preferences to the trait labels it generates, making the recommendation logic transparent.

Who Can Benefit

Avid readers who have exhausted recommendations from friends and want something genuinely personalised
Book clubs looking for a curated selection matched to the group’s collective taste
Librarians and educators who need to match readers to titles quickly and accurately
Developers learning to build multi-phase AI agents with persistent conversation context

How Codersarts Can Help

Building AI agents like this one requires solid understanding of multi-turn reasoning, API integration, and production-ready UI design. If you need help implementing a custom AI agent for your project (book recommendations, business automation, or something entirely different), Codersarts offers end-to-end development and mentorship support.