Building an AI Interview Prep Agent with Qwen 3.7 Max and Streamlit
- 8 hours ago
- 8 min read
Introduction
Job interviews are stressful, not because candidates lack skills, but because they lack structured preparation. Most people either over-prepare generic answers or walk in completely unprepared for role-specific questions.
In this tutorial, we build an AI-powered Interview Prep Agent using Qwen 3.7 Max, Alibaba’s flagship reasoning model. The agent takes a single job title as input and returns a full preparation package: categorized question types, 8 tailored practice questions, model answers for each, and focused preparation tips.
What makes this more than a simple API call is the use of multi-turn reasoning with thinking preservation, a key feature of Qwen 3.7 Max that allows the model’s reasoning from one step to directly influence the next, producing consistent and deeply tailored output.

What We’re Building
The agent follows a structured 4-turn workflow:
Turn | Stage | What Happens |
1 | Plan | Model identifies relevant question categories for the role |
2 | Questions | Generates 8 practice questions based on the plan |
3 | Answers | Writes model answers for each question |
4 | Tips | Distills 3 key preparation tips from the full context |
Each turn builds on the previous one. Because Qwen 3.7 Max preserves its internal reasoning across turns, the questions in Turn 2 are directly informed by the categories it reasoned about in Turn 1, not generated in isolation.
The final product is a Streamlit web app where users type a job title, click a button, and receive a complete interview prep kit they can also download as a markdown file.
Tech Stack
Component | Tool |
AI Model | Qwen 3.7 Max (qwen/qwen3.7-max) |
API Gateway | DashScope (default) · OpenRouter (alternative) |
API Client | openai Python SDK (OpenAI-compatible) |
UI Framework | Streamlit |
Env Management | python-dotenv |
DashScope is Alibaba’s official Qwen API — the most direct way to access Qwen 3.7 Max.
OpenRouter is an alternative that works with the same code and is easier to sign up for. Switch between them with a single env variable.
Project Structure
qwen/
├── interview_agent.py # Agent logic — API calls, multi-turn workflow
├── app.py # Streamlit UI
├── requirements.txt # Dependencies
└── .env # API keys and config (not committed to git)
Setting Up
1. Install Dependencies
pip install openai streamlit httpx python-dotenv
2. Configure Environment
Create a .env file in the project folder:
Option A — DashScope (default, official Qwen API):
QWEN_PROVIDER=dashscopeDASHSCOPE_API_KEY=your_dashscope_api_key_here
Get your DashScope API key at dashscope.console.aliyun.com — sign up with an Alibaba Cloud account and generate a key under API Keys.
Option B — OpenRouter (alternative):
QWEN_PROVIDER=openrouterOPENROUTER_API_KEY=your_openrouter_api_key_hereQWEN_MODEL=qwen/qwen3.7-max
Get your OpenRouter API key at openrouter.ai under Settings > Keys.
Building the Agent — interview_agent.py
Client Setup
We start by loading environment variables and creating a single API client. The provider flag decides whether requests go to OpenRouter or DashScope. Both use the same OpenAI-compatible interface, so no other code changes.
import os # used to read environment variables after load_dotenv() injects them from .env
from dotenv import load_dotenv # reads the .env file and sets each key as an environment variable — without this, os.getenv returns None
from openai import OpenAI # the OpenAI Python SDK — used here as a client for OpenRouter and DashScope, both of which expose an OpenAI-compatible API
load_dotenv() # must be called before any os.getenv() — loads .env into the process environment
PROVIDER = os.getenv("QWEN_PROVIDER", "dashscope").lower() # single flag that controls which API is used — defaults to dashscope; .lower() prevents case mismatch
if PROVIDER == "dashscope":
# DashScope is Alibaba's official Qwen API — requires a DashScope account and API key
BASE_URL = os.getenv("QWEN_BASE_URL", "https://dashscope.aliyuncs.com/compatible-mode/v1") # /compatible-mode/v1 is the OpenAI-compatible endpoint on DashScope
API_KEY = os.getenv("DASHSCOPE_API_KEY", "") # DashScope key — starts with "sk-" like OpenAI keys
MODEL = os.getenv("QWEN_MODEL", "qwen-max") # qwen-max is DashScope's production Qwen3 flagship model
else:
# OpenRouter routes requests to the correct model — one API key works for hundreds of models
BASE_URL = os.getenv("QWEN_BASE_URL", "https://openrouter.ai/api/v1") # OpenRouter's OpenAI-compatible base URL
API_KEY = os.getenv("OPENROUTER_API_KEY", "") # OpenRouter key — get one at openrouter.ai/settings/keys
MODEL = os.getenv("QWEN_MODEL", "qwen/qwen3.7-max") # OpenRouter model slug — provider/model-name format
client = OpenAI(api_key=API_KEY, base_url=BASE_URL) # one client instance shared across all four agent turns — base_url points it away from OpenAI's servers
The agent supports two providers, DashScope (default, Alibaba’s official Qwen API) and OpenRouter (alternative), controlled by a single QWEN_PROVIDER flag. Since both expose an OpenAI-compatible API, the same openai SDK client works for both with no code changes.
The Core Chat Function
_chat is the single function that handles every API call in the agent. It appends the user message, sends the full conversation history, and returns both the visible response and the model’s internal reasoning.
def _chat(history: list, user_message: str) -> tuple[str, str]:
# history is mutated in place — the caller's list grows with each turn, building the full conversation context
history.append({"role": "user", "content": user_message}) # add the new user message before sending — the full history is sent every call
response = client.chat.completions.create(
model=MODEL,
messages=history, # the entire conversation so far — this is what gives the model memory across turns
extra_body={
"enable_thinking": True, # activates Qwen's internal chain-of-thought — the model reasons before responding, returned in reasoning_content
"preserve_thinking": True, # tells the model to use prior reasoning_content in history as context — this is what makes Turn 2 aware of Turn 1's plan
},
)
msg = response.choices[0].message # choices[0] is the first (and only) completion — we do not request multiple completions
thinking = getattr(msg, "reasoning_content", "") or "" # reasoning_content is Qwen-specific — not present on standard OpenAI responses; getattr avoids AttributeError
content = msg.content or "" # the visible reply text — always present; fallback to "" if somehow None
assistant_entry = {"role": "assistant", "content": content}
if thinking:
assistant_entry["reasoning_content"] = thinking # only attach reasoning_content if there is one — keeps history clean on providers that don't return it
history.append(assistant_entry) # append the full assistant message (with reasoning) so the next turn can reference it
return content, thinking # content goes to the UI; thinking is stored separately and shown in the "Model Reasoning" expander
Two things to note here:
enable_thinking: True — activates Qwen 3.7 Max’s internal chain-of-thought reasoning, which is returned separately in reasoning_content.
preserve_thinking: True — when the assistant’s message (including reasoning_content) is added back to history, the model can reference its own prior reasoning in subsequent turns. This is what gives the agent its coherence across steps.
The 4-Turn Workflow
run_interview_prep orchestrates the four sequential calls. Each turn’s output (including the model’s reasoning) is added to history before the next call, so every stage has full awareness of what came before.
def run_interview_prep(job_title: str, progress_callback=None) -> dict:
history = [{"role": "system", "content": SYSTEM_PROMPT}] # system message is always first — sets the model's persona for the entire conversation
result = {"job_title": job_title, "turns": []} # result dict accumulates all four stages — returned to the UI when complete
# ── Turn 1: Plan ──────────────────────────────────────────────────────────
# The model decides which question categories are relevant for this specific role.
# Its reasoning here (thinking1) is preserved in history so Turn 2 can reference it.
if progress_callback:
progress_callback("Planning question categories...") # updates the UI progress indicator — optional so the function works without a UI too
plan, thinking1 = _chat(
history,
f"I need to prepare for a {job_title} interview. "
"Plan 4-5 categories of interview questions with a brief reason for each."
)
result["turns"].append({"stage": "Plan", "content": plan, "thinking": thinking1})
# ── Turn 2: Generate Questions ────────────────────────────────────────────
# The model generates 8 questions. Because history contains Turn 1's reasoning,
# the questions are drawn from the exact categories the model identified — not generic ones.
if progress_callback:
progress_callback("Generating interview questions...")
questions, thinking2 = _chat(
history,
f"Generate 8 interview questions for the {job_title} role, "
"one mix from each category. Number them 1–8."
)
result["turns"].append({"stage": "Questions", "content": questions, "thinking": thinking2})
# ── Turn 3: Model Answers ─────────────────────────────────────────────────
# The model writes answers for the exact 8 questions it just generated.
# No question list is re-sent — the model reads them from its own prior message in history.
if progress_callback:
progress_callback("Writing model answers...")
answers, thinking3 = _chat(
history,
"Provide a concise model answer for each question. "
"Format as **Q[n]: [question]** then the answer."
)
result["turns"].append({"stage": "Answers", "content": answers, "thinking": thinking3})
# ── Turn 4: Tips ──────────────────────────────────────────────────────────
# The model distills 3 tips from the full conversation context — plan, questions, and answers.
# These tips are specific to this role, not generic interview advice.
if progress_callback:
progress_callback("Adding preparation tips...")
tips, thinking4 = _chat(
history,
f"Give 3 specific preparation tips for a {job_title} interview "
"based on everything above."
)
result["turns"].append({"stage": "Tips", "content": tips, "thinking": thinking4})
return result # all four stages are in result["turns"] — the UI iterates over this list to render each section
The same history list is passed through all four turns. Each assistant response, including its reasoning, is appended before the next call, so the model always has the full context of what it has already planned and decided.
Building the UI — app.py
The UI is kept intentionally minimal: a single text input, a button, and a results display. Streamlit re-renders the page on every interaction, so results are stored in session_state to avoid re-running the agent unnecessarily.
import os
import streamlit as st
from dotenv import load_dotenv
from interview_agent import run_interview_prep # imports the 4-turn agent function from our backend module
load_dotenv() # loads .env so os.getenv can read QWEN_PROVIDER in the UI
st.set_page_config(page_title="Interview Prep Agent", page_icon="🎯", layout="wide",
initial_sidebar_state="collapsed") # sidebar collapsed — nothing in it, no slide animation
st.markdown("## Interview Prep Agent")
provider = os.getenv("QWEN_PROVIDER", "dashscope").capitalize() # reads the active provider from .env — caption updates automatically when provider changes
st.caption(f"Powered by **Qwen 3.7 Max** via {provider}")
col_input, col_btn = st.columns([4, 1], vertical_alignment="bottom") # input takes 4x the width of the button
with col_input:
job_title = st.text_input("Job Title",
placeholder="e.g. Senior Data Scientist, Frontend Engineer, Product Manager",
label_visibility="collapsed") # label hidden — the placeholder text is descriptive enough
with col_btn:
run = st.button("Generate", type="primary", use_container_width=True,
disabled=not job_title.strip()) # greyed out until the user types something; .strip() blocks whitespace-only input
Results are stored in st.session_state so they persist across UI interactions without re-running the agent. Each stage is rendered separately. Plan and Questions are collapsed by default, while Answers and Tips are shown immediately. The model’s reasoning is hidden inside a nested expander so it does not clutter the main view.
if stage == "Plan":
with st.expander("📋 Step 1 — Question Plan", expanded=False): # collapsed by default — keeps the page clean; user opens it if they want to inspect the plan
st.markdown(content)
if thinking:
with st.expander("Model Reasoning", expanded=False): # nested expander — only shown when the model returned reasoning_content; hidden otherwise
st.caption(thinking) # st.caption renders in smaller gray text — visually distinguishes reasoning from the main response
Once all stages are displayed, a download button lets users save the full prep kit as a markdown file named after the job title.
st.download_button(
label="Download as Markdown",
data=md, # md is a string built by concatenating all four stage outputs with headers
file_name=f"interview_prep_{job_title}.md", # filename includes the job title so downloaded files are self-identifying
mime="text/markdown", # tells the browser this is a .md file — triggers correct file association on download
)
Running the App
streamlit run app.py
Open your browser at http://localhost:8501. Type a job title (for example, “Senior Data Scientist”) and click Generate Interview Prep.
The agent will work through its four stages, showing progress as it goes, and display the full prep kit within seconds.
Output and What to Expect
For a job title like “Product Manager”, the agent produces:
Plan: 5 categories (Product Strategy, Stakeholder Management, Execution & Delivery, Behavioral, and Metrics & Analytics), each with a reason
Questions: 8 targeted questions like “Describe a time you had to kill a feature you personally championed” or “How do you prioritize a backlog when engineering capacity is constrained?”
Answers: Structured STAR-format responses for each question, specific to the PM role
Tips: Focused advice such as leading with data when discussing decisions, and preparing a product critique for a live product
The model’s internal reasoning — visible by expanding the “Model Reasoning” panel — shows how it connects the categories from Turn 1 to the questions in Turn 2, demonstrating why thinking preservation produces more coherent output than independent API calls.
Who Can Benefit
Job seekers preparing for technical or managerial interviews across any industry
Career coaches who want to generate role-specific question sets quickly
Students transitioning from academia to industry and unsure what to expect
Developers looking to learn how to build multi-turn AI agents with thinking preservation
How Codersarts Can Help
Building AI agents like this one requires solid understanding of multi-turn reasoning, API integration, and production-ready UI design. If you need help implementing a custom AI agent for your project (interview prep, business automation, or something entirely different), Codersarts offers end-to-end development and mentorship support.
Custom AI agent development tailored to your use case
One-on-one mentorship and code reviews
Project-based learning with real-world applications
Get in touch: codersarts.com | contact@codersarts.com




Comments