Building an AI Interview Prep Agent with Qwen 3.7 Max and Streamlit

Jun 10
8 min read

Introduction

Job interviews are stressful, not because candidates lack skills, but because they lack structured preparation. Most people either over-prepare generic answers or walk in completely unprepared for role-specific questions.

In this tutorial, we build an AI-powered Interview Prep Agent using Qwen 3.7 Max, Alibaba’s flagship reasoning model. The agent takes a single job title as input and returns a full preparation package: categorized question types, 8 tailored practice questions, model answers for each, and focused preparation tips.

What makes this more than a simple API call is the use of multi-turn reasoning with thinking preservation, a key feature of Qwen 3.7 Max that allows the model’s reasoning from one step to directly influence the next, producing consistent and deeply tailored output.

What We’re Building

The agent follows a structured 4-turn workflow:

Turn	Stage	What Happens
1	Plan	Model identifies relevant question categories for the role
2	Questions	Generates 8 practice questions based on the plan
3	Answers	Writes model answers for each question
4	Tips	Distills 3 key preparation tips from the full context

Each turn builds on the previous one. Because Qwen 3.7 Max preserves its internal reasoning across turns, the questions in Turn 2 are directly informed by the categories it reasoned about in Turn 1, not generated in isolation.

The final product is a Streamlit web app where users type a job title, click a button, and receive a complete interview prep kit they can also download as a markdown file.

Tech Stack

Component	Tool
AI Model	Qwen 3.7 Max (qwen/qwen3.7-max)
API Gateway	DashScope (default) · OpenRouter (alternative)
API Client	openai Python SDK (OpenAI-compatible)
UI Framework	Streamlit
Env Management	python-dotenv

DashScope is Alibaba’s official Qwen API — the most direct way to access Qwen 3.7 Max.

OpenRouter is an alternative that works with the same code and is easier to sign up for. Switch between them with a single env variable.

Project Structure



qwen/
├── interview_agent.py   # Agent logic — API calls, multi-turn workflow
├── app.py               # Streamlit UI
├── requirements.txt     # Dependencies
└── .env                 # API keys and config (not committed to git)

Setting Up

1. Install Dependencies

pip install openai streamlit httpx python-dotenv

2. Configure Environment

Create a .env file in the project folder:

Option A — DashScope (default, official Qwen API):

QWEN_PROVIDER=dashscopeDASHSCOPE_API_KEY=your_dashscope_api_key_here

Get your DashScope API key at dashscope.console.aliyun.com — sign up with an Alibaba Cloud account and generate a key under API Keys.

Option B — OpenRouter (alternative):

QWEN_PROVIDER=openrouterOPENROUTER_API_KEY=your_openrouter_api_key_hereQWEN_MODEL=qwen/qwen3.7-max

Get your OpenRouter API key at openrouter.ai under Settings > Keys.

Building the Agent — interview_agent.py

Client Setup

We start by loading environment variables and creating a single API client. The provider flag decides whether requests go to OpenRouter or DashScope. Both use the same OpenAI-compatible interface, so no other code changes.



import os                                        # used to read environment variables after load_dotenv() injects them from .env
from dotenv import load_dotenv                   # reads the .env file and sets each key as an environment variable — without this, os.getenv returns None
from openai import OpenAI                        # the OpenAI Python SDK — used here as a client for OpenRouter and DashScope, both of which expose an OpenAI-compatible API

load_dotenv()                                    # must be called before any os.getenv() — loads .env into the process environment

PROVIDER = os.getenv("QWEN_PROVIDER", "dashscope").lower()  # single flag that controls which API is used — defaults to dashscope; .lower() prevents case mismatch

if PROVIDER == "dashscope":
    # DashScope is Alibaba's official Qwen API — requires a DashScope account and API key
    BASE_URL = os.getenv("QWEN_BASE_URL", "https://dashscope.aliyuncs.com/compatible-mode/v1")  # /compatible-mode/v1 is the OpenAI-compatible endpoint on DashScope
    API_KEY  = os.getenv("DASHSCOPE_API_KEY", "")   # DashScope key — starts with "sk-" like OpenAI keys
    MODEL    = os.getenv("QWEN_MODEL", "qwen-max")  # qwen-max is DashScope's production Qwen3 flagship model
else:
    # OpenRouter routes requests to the correct model — one API key works for hundreds of models
    BASE_URL = os.getenv("QWEN_BASE_URL", "https://openrouter.ai/api/v1")      # OpenRouter's OpenAI-compatible base URL
    API_KEY  = os.getenv("OPENROUTER_API_KEY", "")                              # OpenRouter key — get one at openrouter.ai/settings/keys
    MODEL    = os.getenv("QWEN_MODEL", "qwen/qwen3.7-max")                      # OpenRouter model slug — provider/model-name format

client = OpenAI(api_key=API_KEY, base_url=BASE_URL)  # one client instance shared across all four agent turns — base_url points it away from OpenAI's servers

The agent supports two providers, DashScope (default, Alibaba’s official Qwen API) and OpenRouter (alternative), controlled by a single QWEN_PROVIDER flag. Since both expose an OpenAI-compatible API, the same openai SDK client works for both with no code changes.

The Core Chat Function

_chat is the single function that handles every API call in the agent. It appends the user message, sends the full conversation history, and returns both the visible response and the model’s internal reasoning.



def _chat(history: list, user_message: str) -> tuple[str, str]:
    # history is mutated in place — the caller's list grows with each turn, building the full conversation context
    history.append({"role": "user", "content": user_message})   # add the new user message before sending — the full history is sent every call

    response = client.chat.completions.create(
        model=MODEL,
        messages=history,        # the entire conversation so far — this is what gives the model memory across turns
        extra_body={
            "enable_thinking": True,    # activates Qwen's internal chain-of-thought — the model reasons before responding, returned in reasoning_content
            "preserve_thinking": True,  # tells the model to use prior reasoning_content in history as context — this is what makes Turn 2 aware of Turn 1's plan
        },
    )

    msg     = response.choices[0].message                          # choices[0] is the first (and only) completion — we do not request multiple completions
    thinking = getattr(msg, "reasoning_content", "") or ""         # reasoning_content is Qwen-specific — not present on standard OpenAI responses; getattr avoids AttributeError
    content  = msg.content or ""                                   # the visible reply text — always present; fallback to "" if somehow None

    assistant_entry = {"role": "assistant", "content": content}
    if thinking:
        assistant_entry["reasoning_content"] = thinking            # only attach reasoning_content if there is one — keeps history clean on providers that don't return it
    history.append(assistant_entry)                                # append the full assistant message (with reasoning) so the next turn can reference it

    return content, thinking   # content goes to the UI; thinking is stored separately and shown in the "Model Reasoning" expander

Two things to note here:

enable_thinking: True — activates Qwen 3.7 Max’s internal chain-of-thought reasoning, which is returned separately in reasoning_content.
preserve_thinking: True — when the assistant’s message (including reasoning_content) is added back to history, the model can reference its own prior reasoning in subsequent turns. This is what gives the agent its coherence across steps.

The 4-Turn Workflow

run_interview_prep orchestrates the four sequential calls. Each turn’s output (including the model’s reasoning) is added to history before the next call, so every stage has full awareness of what came before.

def run_interview_prep(job_title: str, progress_callback=None) -> dict:
    history = [{"role": "system", "content": SYSTEM_PROMPT}]  # system message is always first — sets the model's persona for the entire conversation
    result  = {"job_title": job_title, "turns": []}            # result dict accumulates all four stages — returned to the UI when complete

    # ── Turn 1: Plan ──────────────────────────────────────────────────────────
    # The model decides which question categories are relevant for this specific role.
    # Its reasoning here (thinking1) is preserved in history so Turn 2 can reference it.
    if progress_callback:
        progress_callback("Planning question categories...")   # updates the UI progress indicator — optional so the function works without a UI too
    plan, thinking1 = _chat(
        history,
        f"I need to prepare for a {job_title} interview. "
        "Plan 4-5 categories of interview questions with a brief reason for each."
    )
    result["turns"].append({"stage": "Plan", "content": plan, "thinking": thinking1})

    # ── Turn 2: Generate Questions ────────────────────────────────────────────
    # The model generates 8 questions. Because history contains Turn 1's reasoning,
    # the questions are drawn from the exact categories the model identified — not generic ones.
    if progress_callback:
        progress_callback("Generating interview questions...")
    questions, thinking2 = _chat(
        history,
        f"Generate 8 interview questions for the {job_title} role, "
        "one mix from each category. Number them 1–8."
    )
    result["turns"].append({"stage": "Questions", "content": questions, "thinking": thinking2})

    # ── Turn 3: Model Answers ─────────────────────────────────────────────────
    # The model writes answers for the exact 8 questions it just generated.
    # No question list is re-sent — the model reads them from its own prior message in history.
    if progress_callback:
        progress_callback("Writing model answers...")
    answers, thinking3 = _chat(
        history,
        "Provide a concise model answer for each question. "
        "Format as **Q[n]: [question]** then the answer."
    )
    result["turns"].append({"stage": "Answers", "content": answers, "thinking": thinking3})

    # ── Turn 4: Tips ──────────────────────────────────────────────────────────
    # The model distills 3 tips from the full conversation context — plan, questions, and answers.
    # These tips are specific to this role, not generic interview advice.
    if progress_callback:
        progress_callback("Adding preparation tips...")
    tips, thinking4 = _chat(
        history,
        f"Give 3 specific preparation tips for a {job_title} interview "
        "based on everything above."
    )
    result["turns"].append({"stage": "Tips", "content": tips, "thinking": thinking4})

    return result   # all four stages are in result["turns"] — the UI iterates over this list to render each section

The same history list is passed through all four turns. Each assistant response, including its reasoning, is appended before the next call, so the model always has the full context of what it has already planned and decided.

Building the UI — app.py

The UI is kept intentionally minimal: a single text input, a button, and a results display. Streamlit re-renders the page on every interaction, so results are stored in session_state to avoid re-running the agent unnecessarily.



import os
import streamlit as st
from dotenv import load_dotenv
from interview_agent import run_interview_prep   # imports the 4-turn agent function from our backend module

load_dotenv()                                    # loads .env so os.getenv can read QWEN_PROVIDER in the UI

st.set_page_config(page_title="Interview Prep Agent", page_icon="🎯", layout="wide",
                   initial_sidebar_state="collapsed")   # sidebar collapsed — nothing in it, no slide animation
st.markdown("## Interview Prep Agent")
provider = os.getenv("QWEN_PROVIDER", "dashscope").capitalize()   # reads the active provider from .env — caption updates automatically when provider changes
st.caption(f"Powered by **Qwen 3.7 Max** via {provider}")

col_input, col_btn = st.columns([4, 1], vertical_alignment="bottom")  # input takes 4x the width of the button

with col_input:
    job_title = st.text_input("Job Title",
        placeholder="e.g. Senior Data Scientist, Frontend Engineer, Product Manager",
        label_visibility="collapsed")   # label hidden — the placeholder text is descriptive enough

with col_btn:
    run = st.button("Generate", type="primary", use_container_width=True,
                    disabled=not job_title.strip())   # greyed out until the user types something; .strip() blocks whitespace-only input

Results are stored in st.session_state so they persist across UI interactions without re-running the agent. Each stage is rendered separately. Plan and Questions are collapsed by default, while Answers and Tips are shown immediately. The model’s reasoning is hidden inside a nested expander so it does not clutter the main view.



if stage == "Plan":
    with st.expander("📋 Step 1 — Question Plan", expanded=False):  # collapsed by default — keeps the page clean; user opens it if they want to inspect the plan
        st.markdown(content)
        if thinking:
            with st.expander("Model Reasoning", expanded=False):  # nested expander — only shown when the model returned reasoning_content; hidden otherwise
                st.caption(thinking)   # st.caption renders in smaller gray text — visually distinguishes reasoning from the main response

Once all stages are displayed, a download button lets users save the full prep kit as a markdown file named after the job title.



st.download_button(
    label="Download as Markdown",
    data=md,                                        # md is a string built by concatenating all four stage outputs with headers
    file_name=f"interview_prep_{job_title}.md",     # filename includes the job title so downloaded files are self-identifying
    mime="text/markdown",                           # tells the browser this is a .md file — triggers correct file association on download
)

Running the App



streamlit run app.py

Open your browser at http://localhost:8501. Type a job title (for example, “Senior Data Scientist”) and click Generate Interview Prep.

The agent will work through its four stages, showing progress as it goes, and display the full prep kit within seconds.

Output and What to Expect

For a job title like “Product Manager”, the agent produces:

Plan: 5 categories (Product Strategy, Stakeholder Management, Execution & Delivery, Behavioral, and Metrics & Analytics), each with a reason
Questions: 8 targeted questions like “Describe a time you had to kill a feature you personally championed” or “How do you prioritize a backlog when engineering capacity is constrained?”
Answers: Structured STAR-format responses for each question, specific to the PM role
Tips: Focused advice such as leading with data when discussing decisions, and preparing a product critique for a live product

The model’s internal reasoning — visible by expanding the “Model Reasoning” panel — shows how it connects the categories from Turn 1 to the questions in Turn 2, demonstrating why thinking preservation produces more coherent output than independent API calls.

Who Can Benefit

Job seekers preparing for technical or managerial interviews across any industry
Career coaches who want to generate role-specific question sets quickly
Students transitioning from academia to industry and unsure what to expect
Developers looking to learn how to build multi-turn AI agents with thinking preservation

How Codersarts Can Help

Building AI agents like this one requires solid understanding of multi-turn reasoning, API integration, and production-ready UI design. If you need help implementing a custom AI agent for your project (interview prep, business automation, or something entirely different), Codersarts offers end-to-end development and mentorship support.