Building a Conversational AI Agent with Memory

Mar 25
7 min read

Course: LLM Foundational Course

Level: Medium → Advanced

Type: Individual Assignment

Duration: 5–7 days

Total Marks: 100

Objective

The objective of this assignment is to help you:

Implement conversation memory that manages context windows
Control LLM output using temperature, max_tokens, and stop sequences
Build a complete agent that combines conversation history with semantic search
Handle multi-turn conversations with proper context management
Track usage and costs across the entire session
Think practically about building real conversational systems

Problem Statement

You are building a personal AI assistant that can:

Have multi-turn conversations with memory
Answer questions using a knowledge base (from Assignment 1 or new)
Remember what was discussed earlier in the conversation
Adjust its behavior based on query type
Track token usage and costs

Your task is to:

Build a conversation manager that handles context window limits
Implement output control (temperature, max_tokens)
Integrate conversation + RAG together
Create a complete agent that puts everything together

Prerequisites

This assignment builds on concepts from Assignment 1. You will reuse:

Token counting functions
Vector database with embeddings
Basic RAG functionality

Note: If you haven't completed Assignment 1, you must implement those components first.

Tasks & Requirements

Task 1: Conversation Manager with Memory (15 Marks)

Objective: Build a conversation manager that handles context limits intelligently.

Requirements:

Implement a `ConversationManager` class:

init(max_context_tokens=8000) - Initialize with token limit
add_message(role, content) - Add user or assistant message
get_messages() - Return messages for API call
get_stats() - Return: total_messages, total_tokens, remaining_tokens
trimcontext() - Remove old messages when limit is reached
reset() - Clear conversation history

Implement trimming strategy (choose ONE):

Option A: Sliding Window (Simpler)

Keep only the most recent N messages
Always preserve system message if present
Token-aware (not just message count)

Option B: Priority-Based (Advanced)

Keep: first user message + last 3 exchanges
Remove middle messages when limit reached
Keeps conversation bookends

Token counting:

Count tokens for each message
Track total tokens in conversation
Trigger trimming at 80% of limit

Testing:

Create a 10-turn conversation that exceeds 1000 tokens
Show that trimming happens automatically
Verify messages are removed correctly

Deliverable:

Complete ConversationManager class
Test showing trimming in action
Output showing token counts at each turn
Explanation (150 words) of your trimming strategy

Example output:


Turn 1: 245 tokens (remaining: 7755)
Turn 2: 512 tokens (remaining: 7243)
...
Turn 8: 6845 tokens (remaining: 1155)
[TRIMMING] Removed 2 old messages
Turn 9: 4521 tokens (remaining: 3479)

Task 2: Output Control System (15 Marks)

Objective: Control LLM behavior based on query type.

Requirements:

Implement query type detection:
Factual: Questions with clear answers ("What is...?", "How many...?")
Creative: Open-ended requests ("Write...", "Imagine...", "Create...")
Conversational: Greetings, chat, general discussion

Set parameters based on query type:

Factual queries:

Temperature: 0.0
Max tokens: 150-200
Reasoning: Want deterministic, concise answers

Creative queries:

Temperature: 0.7-1.0
Max tokens: 500-1000
Reasoning: Want varied, detailed responses

Conversational:

Temperature: 0.5
Max tokens: 200-300
Reasoning: Balanced, natural responses

Implement functions:

detect_query_type(query) → str # Return: "factual", "creative", or "conversational"
get_parameters(query_type) → dict # Return: {"temperature": X, "max_tokens": Y}

Testing:

Test with at least 3 examples of each type
Show how temperature affects output (run same query with different temps)
Analyze the differences

Deliverable:

Query type detection function
Parameter configuration function
9+ test examples (3 per type)
Temperature comparison showing different outputs
Analysis (200 words) on how parameters affect responses

Example output:


Query: "What is the capital of France?"
Type: factualParameters: temperature=0.0, max_tokens=150
Query: "Write a creative story about a robot"
Type: creative
Parameters: temperature=0.9, max_tokens=800

Task 3: Memory Testing and Context Preservation (15 Marks)

Objective: Verify that conversation memory actually works.

Requirements:

Design a conversation that tests memory:

Turn 1: User introduces information ("My name is X, I live in Y")
Turn 2-5: General conversation
Turn 6: Ask about the earlier information ("What's my name?")
Agent should remember!

Test multi-step reasoning:

Turn 1: "I have 5 apples"
Turn 2: "I bought 3 more"
Turn 3: "Then I ate 2"
Turn 4: "How many do I have now?"
Agent should track the math across turns

Test what happens when context is trimmed:

Create conversation that exceeds limit
Ask about information from early messages (that got trimmed)
Agent should gracefully say it doesn't remember

Edge cases:

What if first message is very long (>1000 tokens)?
What if every message is 100 tokens (slow buildup to limit)?

Deliverable:

3 test conversations demonstrating memory
Edge case examples
Analysis (200 words) on memory limitations and solutions

Task 4: Integrated RAG + Conversation Agent (30 Marks)

Objective: Build a complete agent combining everything.

Requirements:

Implement `ConversationalAgent` class:

init(knowledge_base) # Initialize with knowledge base and conversation manager
chat(user_message, use_rag=True) # Main interface - handles everything
determineif_rag_needed(query) # Decide: Does this need knowledge base?
formatprompt(query, rag_context=None) # Create final prompt with context
get_session_stats() # Return conversation and cost statistics

Smart RAG triggering:

Detect when user asks about knowledge base topics
Skip RAG for general chat ("Hello", "How are you?")
Use RAG for factual questions about your documents

Conversation flow:


User Query
   ↓

Add to Conversation History

   ↓

Detect Query Type → Set Parameters

   ↓

Check if RAG is Needed

   ↓

Retrieve Documents (if required)

   ↓

Format Prompt (Conversation History + RAG Context)

   ↓

Call Claude API

   ↓

Add Response to Conversation History

   ↓

Trim History (if needed)

   ↓

Return Response

Testing:

Create a 10-turn conversation that:
Starts with general chat
Asks knowledge base questions (triggers RAG)
References earlier conversation
Includes follow-up questions
Tests memory and RAG together

Deliverable:

Complete ConversationalAgent class
10-turn demonstration conversation
Examples showing RAG being triggered/skipped
Architecture explanation (300 words)

Example conversation:


Turn 1

User: Hi! I'm learning about Python.

Agent: Hello! That's great that you're learning 

Python.[No RAG needed — general greeting]

================================================

Turn 2

User: What are variables in Python?

Agent: [RAG triggered] Based on the documentation, variables in Python are used to store data values.

Sources: [doc2: Python variables]

================================================

Turn 3

User: Can you give me an example?

Agent: [Uses previous context] Sure! Building on what I just explained, here’s an example of variables in Python…[No RAG — answering from previous context]

================================================

Turn 4

User: What was I learning about?

Agent: You mentioned you're learning about Python.[Memory test — should remember Turn 1]

Task 5: Cost Tracking and Session Management (10 Marks)

Objective: Track costs across the entire conversation session.

Requirements:

Track for each turn:

Input tokens (conversation history + RAG context if used)
Output tokens (agent's response)
Cost for that turn
Cumulative cost

Implement session statistics:


{ 'total_turns': N, 'total_input_tokens': X, 'total_output_tokens': Y, 'total_cost_usd': Z, 'rag_queries': R, 'average_cost_per_turn': A }

Analyze cost factors:

How much does conversation history add to each turn?
How much does RAG add to costs?
What's the most expensive turn and why?

Projections:

If average conversation is 15 turns And you serve 50 conversations per day What's the monthly cost?

Deliverable:

Cost tracking implementation
Turn-by-turn cost breakdown
Monthly projection

Example output:


Turn 1: Input=245, Output=123 → $0.00256
Turn 2: Input=378, Output=145 → $0.00331 (cumulative: $0.00587)
Turn 3: Input=523, Output=156 → $0.00391 (cumulative: $0.00978)
...
Total session cost: $0.0245
Average per turn: $0.00245

Task 6: Final Demo and Edge Cases (15 Marks)

Objective: Create polished demo with error handling.

Requirements:

Create complete demo conversation:

15+ turns
Mix of general chat + knowledge queries
Tests memory, RAG, and parameter control
Shows costs throughout

Handle edge cases:

Empty user input → Ask user to provide input
Very long input (>2000 tokens) → Warn or truncate
No RAG results found → Agent admits it doesn't know
API error → Graceful error message

Polish presentation:

Clear output formatting
Helpful labels ("Using RAG", "Trimmed context", "Cost: $X")
Summary at end

Deliverable:

15-turn demo conversation
Edge case handling examples
Final summary of session

Bonus Tasks

Bonus A: Conversation Summarization

When context limit is reached, use Claude to summarize old messages
Replace old messages with summary
Continue with summary + recent messages

Bonus B: Adaptive Context Window

Start with small context window
Expand when conversation gets complex
Contract when simple chat

Bonus C: Export Conversation

Save full conversation to file (JSON or Markdown)
Include metadata (timestamps, costs, RAG usage)
Reload conversation in new session

________________________________________________________________________________

Deliverables

You must submit:

1. Code (Required)

Jupyter Notebook (.ipynb)

All code with implementation
Clear task sections
All outputs visible
Runs without errors

2. Knowledge Base (Optional)

If different from Assignment 1, then include your documents

3. Report (Required)

Short report (3–5 pages) with:

System architecture overview
Implementation approach for each task
Key findings and experiments
Challenges and solutions
Learnings and insights

Format: PDF

Submission Guidelines

Submit via: LMS (Moodle / Google Classroom)

File Naming: <YourName>_LLM_Assignment2.zip

Inside ZIP: /notebook.ipynb/report.pdf/knowledge_base/ (optional)

Deadline: 7 days from release

Late Policy:

<24hrs: -10%
24-48hrs: -20%
>48hrs: Not accepted

Important Instructions

Build on Assignment 1 – Reuse your vector database and RAG system
Test thoroughly – Don't just assume things work
Comment your code – Explain your logic
Track costs – Monitor API usage
Handle errors – Don't let your code crash

Call to Action

Ready to transform your business with AI-powered intelligence that accelerates insights, enhances decision-making, and unlocks the full value of your data?

Codersarts is here to help you turn complex data workflows into efficient, scalable, and evidence-driven AI systems that empower teams to make smarter, faster, and more confident decisions.

Whether you’re a startup looking to build AI-driven products, an enterprise aiming to optimize operations through data science, or a research organization advancing innovation with intelligent data solutions, we bring the expertise and experience needed to design, develop, and deploy impactful AI systems that drive measurable business outcomes.

Get Started Today

Schedule an AI & Data Science Consultation:

Book a 30-minute discovery call with our AI strategists and data science experts to discuss your challenges, identify high-impact opportunities, and explore how intelligent AI solutions can transform your workflows and performance.

Request a Custom AI Demo:

Experience AI in action with a personalized demonstration built around your business use cases, datasets, operational environment, and decision workflows — showcasing practical value and real-world impact.

Email: contact@codersarts.com

Transform your organization from data accumulation to intelligent decision enablement — accelerating insight generation, improving operational efficiency, and strengthening competitive advantage.

Partner with Codersarts to build scalable AI solutions including RAG systems, predictive analytics platforms, intelligent automation tools, recommendation engines, and custom machine learning models that empower your teams to deliver exceptional results.

Contact us today and take the first step toward next-generation AI and data science capabilities that grow with your business ambitions.

Objective

Problem Statement

Prerequisites

Tasks & Requirements

Task 1: Conversation Manager with Memory (15 Marks)

Requirements:

Deliverable:

Task 2: Output Control System (15 Marks)

Requirements:

Deliverable:

Task 3: Memory Testing and Context Preservation (15 Marks)

Requirements:

Deliverable:

Task 4: Integrated RAG + Conversation Agent (30 Marks)

Requirements:

Deliverable:

Task 5: Cost Tracking and Session Management (10 Marks)

Requirements:

Deliverable:

Task 6: Final Demo and Edge Cases (15 Marks)

Requirements:

Deliverable:

Bonus Tasks

Bonus A: Conversation Summarization

Bonus B: Adaptive Context Window

Bonus C: Export Conversation

Deliverables

1. Code (Required)

2. Knowledge Base (Optional)

3. Report (Required)

Submission Guidelines

Submit via: LMS (Moodle / Google Classroom)

File Naming: <YourName>_LLM_Assignment2.zip

Inside ZIP: /notebook.ipynb/report.pdf/knowledge_base/ (optional)

Deadline: 7 days from release

Late Policy:

Important Instructions

Call to Action

Get Started Today

Schedule an AI & Data Science Consultation:

Request a Custom AI Demo:

Comments