Building a Conversational AI Agent with Memory
- 5 hours ago
- 7 min read
Course: LLM Foundational Course
Level: Medium → Advanced
Type: Individual Assignment
Duration: 5–7 days
Total Marks: 100

Objective
The objective of this assignment is to help you:
Implement conversation memory that manages context windows
Control LLM output using temperature, max_tokens, and stop sequences
Build a complete agent that combines conversation history with semantic search
Handle multi-turn conversations with proper context management
Track usage and costs across the entire session
Think practically about building real conversational systems
Problem Statement
You are building a personal AI assistant that can:
Have multi-turn conversations with memory
Answer questions using a knowledge base (from Assignment 1 or new)
Remember what was discussed earlier in the conversation
Adjust its behavior based on query type
Track token usage and costs
Your task is to:
Build a conversation manager that handles context window limits
Implement output control (temperature, max_tokens)
Integrate conversation + RAG together
Create a complete agent that puts everything together
Prerequisites
This assignment builds on concepts from Assignment 1. You will reuse:
Token counting functions
Vector database with embeddings
Basic RAG functionality
Note: If you haven't completed Assignment 1, you must implement those components first.
Tasks & Requirements
Task 1: Conversation Manager with Memory (15 Marks)
Objective: Build a conversation manager that handles context limits intelligently.
Requirements:
Implement a `ConversationManager` class:
init(max_context_tokens=8000) - Initialize with token limit
add_message(role, content) - Add user or assistant message
get_messages() - Return messages for API call
get_stats() - Return: total_messages, total_tokens, remaining_tokens
trimcontext() - Remove old messages when limit is reached
reset() - Clear conversation history
Implement trimming strategy (choose ONE):
Option A: Sliding Window (Simpler)
Keep only the most recent N messages
Always preserve system message if present
Token-aware (not just message count)
Option B: Priority-Based (Advanced)
Keep: first user message + last 3 exchanges
Remove middle messages when limit reached
Keeps conversation bookends
Token counting:
Count tokens for each message
Track total tokens in conversation
Trigger trimming at 80% of limit
Testing:
Create a 10-turn conversation that exceeds 1000 tokens
Show that trimming happens automatically
Verify messages are removed correctly
Deliverable:
Complete ConversationManager class
Test showing trimming in action
Output showing token counts at each turn
Explanation (150 words) of your trimming strategy
Example output:
Turn 1: 245 tokens (remaining: 7755)
Turn 2: 512 tokens (remaining: 7243)
...
Turn 8: 6845 tokens (remaining: 1155)
[TRIMMING] Removed 2 old messages
Turn 9: 4521 tokens (remaining: 3479)
Task 2: Output Control System (15 Marks)
Objective: Control LLM behavior based on query type.
Requirements:
Implement query type detection:
Factual: Questions with clear answers ("What is...?", "How many...?")
Creative: Open-ended requests ("Write...", "Imagine...", "Create...")
Conversational: Greetings, chat, general discussion
Set parameters based on query type:
Factual queries:
Temperature: 0.0
Max tokens: 150-200
Reasoning: Want deterministic, concise answers
Creative queries:
Temperature: 0.7-1.0
Max tokens: 500-1000
Reasoning: Want varied, detailed responses
Conversational:
Temperature: 0.5
Max tokens: 200-300
Reasoning: Balanced, natural responses
Implement functions:
detect_query_type(query) → str # Return: "factual", "creative", or "conversational"
get_parameters(query_type) → dict # Return: {"temperature": X, "max_tokens": Y}
Testing:
Test with at least 3 examples of each type
Show how temperature affects output (run same query with different temps)
Analyze the differences
Deliverable:
Query type detection function
Parameter configuration function
9+ test examples (3 per type)
Temperature comparison showing different outputs
Analysis (200 words) on how parameters affect responses
Example output:
Query: "What is the capital of France?"
Type: factualParameters: temperature=0.0, max_tokens=150
Query: "Write a creative story about a robot"
Type: creative
Parameters: temperature=0.9, max_tokens=800
Task 3: Memory Testing and Context Preservation (15 Marks)
Objective: Verify that conversation memory actually works.
Requirements:
Design a conversation that tests memory:
Turn 1: User introduces information ("My name is X, I live in Y")
Turn 2-5: General conversation
Turn 6: Ask about the earlier information ("What's my name?")
Agent should remember!
Test multi-step reasoning:
Turn 1: "I have 5 apples"
Turn 2: "I bought 3 more"
Turn 3: "Then I ate 2"
Turn 4: "How many do I have now?"
Test what happens when context is trimmed:
Create conversation that exceeds limit
Ask about information from early messages (that got trimmed)
Agent should gracefully say it doesn't remember
Edge cases:
What if first message is very long (>1000 tokens)?
What if every message is 100 tokens (slow buildup to limit)?
Deliverable:
3 test conversations demonstrating memory
Edge case examples
Analysis (200 words) on memory limitations and solutions
Task 4: Integrated RAG + Conversation Agent (30 Marks)
Objective: Build a complete agent combining everything.
Requirements:
Implement `ConversationalAgent` class:
init(knowledge_base) # Initialize with knowledge base and conversation manager
chat(user_message, use_rag=True) # Main interface - handles everything
determineif_rag_needed(query) # Decide: Does this need knowledge base?
formatprompt(query, rag_context=None) # Create final prompt with context
get_session_stats() # Return conversation and cost statistics
Smart RAG triggering:
Detect when user asks about knowledge base topics
Skip RAG for general chat ("Hello", "How are you?")
Use RAG for factual questions about your documents
Conversation flow:
User Query
↓
Add to Conversation History
↓
Detect Query Type → Set Parameters
↓
Check if RAG is Needed
↓
Retrieve Documents (if required)
↓
Format Prompt (Conversation History + RAG Context)
↓
Call Claude API
↓
Add Response to Conversation History
↓
Trim History (if needed)
↓
Return Response
Testing:
Create a 10-turn conversation that:
Starts with general chat
Asks knowledge base questions (triggers RAG)
References earlier conversation
Includes follow-up questions
Tests memory and RAG together
Deliverable:
Complete ConversationalAgent class
10-turn demonstration conversation
Examples showing RAG being triggered/skipped
Architecture explanation (300 words)
Example conversation:
Turn 1
User: Hi! I'm learning about Python.
Agent: Hello! That's great that you're learning
Python.[No RAG needed — general greeting]
================================================
Turn 2
User: What are variables in Python?
Agent: [RAG triggered] Based on the documentation, variables in Python are used to store data values.
Sources: [doc2: Python variables]
================================================
Turn 3
User: Can you give me an example?
Agent: [Uses previous context] Sure! Building on what I just explained, here’s an example of variables in Python…[No RAG — answering from previous context]
================================================
Turn 4
User: What was I learning about?
Agent: You mentioned you're learning about Python.[Memory test — should remember Turn 1]
Task 5: Cost Tracking and Session Management (10 Marks)
Objective: Track costs across the entire conversation session.
Requirements:
Track for each turn:
Input tokens (conversation history + RAG context if used)
Output tokens (agent's response)
Cost for that turn
Cumulative cost
Implement session statistics:
{ 'total_turns': N, 'total_input_tokens': X, 'total_output_tokens': Y, 'total_cost_usd': Z, 'rag_queries': R, 'average_cost_per_turn': A }
Analyze cost factors:
How much does conversation history add to each turn?
How much does RAG add to costs?
What's the most expensive turn and why?
Projections:
If average conversation is 15 turns And you serve 50 conversations per day What's the monthly cost?
Deliverable:
Cost tracking implementation
Turn-by-turn cost breakdown
Monthly projection
Example output:
Turn 1: Input=245, Output=123 → $0.00256
Turn 2: Input=378, Output=145 → $0.00331 (cumulative: $0.00587)
Turn 3: Input=523, Output=156 → $0.00391 (cumulative: $0.00978)
...
Total session cost: $0.0245
Average per turn: $0.00245
Task 6: Final Demo and Edge Cases (15 Marks)
Objective: Create polished demo with error handling.
Requirements:
Create complete demo conversation:
15+ turns
Mix of general chat + knowledge queries
Tests memory, RAG, and parameter control
Shows costs throughout
Handle edge cases:
Empty user input → Ask user to provide input
Very long input (>2000 tokens) → Warn or truncate
No RAG results found → Agent admits it doesn't know
API error → Graceful error message
Polish presentation:
Clear output formatting
Helpful labels ("Using RAG", "Trimmed context", "Cost: $X")
Summary at end
Deliverable:
15-turn demo conversation
Edge case handling examples
Final summary of session
Bonus Tasks
Bonus A: Conversation Summarization
When context limit is reached, use Claude to summarize old messages
Replace old messages with summary
Continue with summary + recent messages
Bonus B: Adaptive Context Window
Start with small context window
Expand when conversation gets complex
Contract when simple chat
Bonus C: Export Conversation
Save full conversation to file (JSON or Markdown)
Include metadata (timestamps, costs, RAG usage)
Reload conversation in new session
________________________________________________________________________________
Deliverables
You must submit:
1. Code (Required)
Jupyter Notebook (.ipynb)
All code with implementation
Clear task sections
All outputs visible
Runs without errors
2. Knowledge Base (Optional)
If different from Assignment 1, then include your documents
3. Report (Required)
Short report (3–5 pages) with:
System architecture overview
Implementation approach for each task
Key findings and experiments
Challenges and solutions
Learnings and insights
Format: PDF
Submission Guidelines
Submit via: LMS (Moodle / Google Classroom)
File Naming: <YourName>_LLM_Assignment2.zip
Inside ZIP: /notebook.ipynb/report.pdf/knowledge_base/ (optional)
Deadline: 7 days from release
Late Policy:
<24hrs: -10%
24-48hrs: -20%
>48hrs: Not accepted
Important Instructions
Build on Assignment 1 – Reuse your vector database and RAG system
Test thoroughly – Don't just assume things work
Comment your code – Explain your logic
Track costs – Monitor API usage
Handle errors – Don't let your code crash
Call to Action
Ready to transform your business with AI-powered intelligence that accelerates insights, enhances decision-making, and unlocks the full value of your data?
Codersarts is here to help you turn complex data workflows into efficient, scalable, and evidence-driven AI systems that empower teams to make smarter, faster, and more confident decisions.
Whether you’re a startup looking to build AI-driven products, an enterprise aiming to optimize operations through data science, or a research organization advancing innovation with intelligent data solutions, we bring the expertise and experience needed to design, develop, and deploy impactful AI systems that drive measurable business outcomes.
Get Started Today
Schedule an AI & Data Science Consultation:
Book a 30-minute discovery call with our AI strategists and data science experts to discuss your challenges, identify high-impact opportunities, and explore how intelligent AI solutions can transform your workflows and performance.
Request a Custom AI Demo:
Experience AI in action with a personalized demonstration built around your business use cases, datasets, operational environment, and decision workflows — showcasing practical value and real-world impact.
Email: contact@codersarts.com
Transform your organization from data accumulation to intelligent decision enablement — accelerating insight generation, improving operational efficiency, and strengthening competitive advantage.
Partner with Codersarts to build scalable AI solutions including RAG systems, predictive analytics platforms, intelligent automation tools, recommendation engines, and custom machine learning models that empower your teams to deliver exceptional results.
Contact us today and take the first step toward next-generation AI and data science capabilities that grow with your business ambitions.

Comments