top of page

Building a Conversational AI Agent with Memory

  • 5 hours ago
  • 7 min read

Course: LLM Foundational Course

Level: Medium → Advanced

Type: Individual Assignment

Duration: 5–7 days

Total Marks: 100






Objective


The objective of this assignment is to help you:


  • Implement conversation memory that manages context windows

  • Control LLM output using temperature, max_tokens, and stop sequences

  • Build a complete agent that combines conversation history with semantic search

  • Handle multi-turn conversations with proper context management

  • Track usage and costs across the entire session

  • Think practically about building real conversational systems





Problem Statement


You are building a personal AI assistant that can:


  • Have multi-turn conversations with memory

  • Answer questions using a knowledge base (from Assignment 1 or new)

  • Remember what was discussed earlier in the conversation

  • Adjust its behavior based on query type

  • Track token usage and costs


Your task is to:


  • Build a conversation manager that handles context window limits

  • Implement output control (temperature, max_tokens)

  • Integrate conversation + RAG together

  • Create a complete agent that puts everything together





Prerequisites


This assignment builds on concepts from Assignment 1. You will reuse:


  • Token counting functions

  • Vector database with embeddings

  • Basic RAG functionality


Note: If you haven't completed Assignment 1, you must implement those components first.





Tasks & Requirements


Task 1: Conversation Manager with Memory (15 Marks)


Objective: Build a conversation manager that handles context limits intelligently.


Requirements:


Implement a `ConversationManager` class:


  • init(max_context_tokens=8000)   - Initialize with token limit   

  • add_message(role, content)   - Add user or assistant message   

  • get_messages()   - Return messages for API call   

  • get_stats()   - Return: total_messages, total_tokens, remaining_tokens   

  • trimcontext()   - Remove old messages when limit is reached   

  • reset()   - Clear conversation history



Implement trimming strategy (choose ONE):


Option A: Sliding Window (Simpler)


  • Keep only the most recent N messages

  • Always preserve system message if present

  • Token-aware (not just message count)



Option B: Priority-Based (Advanced)


  • Keep: first user message + last 3 exchanges

  • Remove middle messages when limit reached

  • Keeps conversation bookends



Token counting:


  • Count tokens for each message

  • Track total tokens in conversation

  • Trigger trimming at 80% of limit



Testing:


  • Create a 10-turn conversation that exceeds 1000 tokens

  • Show that trimming happens automatically

  • Verify messages are removed correctly



Deliverable:


  • Complete ConversationManager class

  • Test showing trimming in action

  • Output showing token counts at each turn

  • Explanation (150 words) of your trimming strategy



Example output:



Turn 1: 245 tokens (remaining: 7755)
Turn 2: 512 tokens (remaining: 7243)
...
Turn 8: 6845 tokens (remaining: 1155)
[TRIMMING] Removed 2 old messages
Turn 9: 4521 tokens (remaining: 3479)





Task 2: Output Control System (15 Marks)


Objective: Control LLM behavior based on query type.


Requirements:


  • Implement query type detection:

  • Factual: Questions with clear answers ("What is...?", "How many...?")

  • Creative: Open-ended requests ("Write...", "Imagine...", "Create...")

  • Conversational: Greetings, chat, general discussion



Set parameters based on query type:


Factual queries:


  • Temperature: 0.0

  • Max tokens: 150-200

  • Reasoning: Want deterministic, concise answers



Creative queries:


  • Temperature: 0.7-1.0

  • Max tokens: 500-1000

  • Reasoning: Want varied, detailed responses



Conversational:


  • Temperature: 0.5

  • Max tokens: 200-300

  • Reasoning: Balanced, natural responses



Implement functions:


  • detect_query_type(query) → str   # Return: "factual", "creative", or "conversational"   

  • get_parameters(query_type) → dict   # Return: {"temperature": X, "max_tokens": Y}



Testing:


  • Test with at least 3 examples of each type

  • Show how temperature affects output (run same query with different temps)

  • Analyze the differences



Deliverable:


  • Query type detection function

  • Parameter configuration function

  • 9+ test examples (3 per type)

  • Temperature comparison showing different outputs

  • Analysis (200 words) on how parameters affect responses



Example output:



Query: "What is the capital of France?"
Type: factualParameters: temperature=0.0, max_tokens=150
Query: "Write a creative story about a robot"
Type: creative
Parameters: temperature=0.9, max_tokens=800





Task 3: Memory Testing and Context Preservation (15 Marks)


Objective: Verify that conversation memory actually works.


Requirements:


Design a conversation that tests memory:


  • Turn 1: User introduces information ("My name is X, I live in Y")

  • Turn 2-5: General conversation

  • Turn 6: Ask about the earlier information ("What's my name?")

  • Agent should remember!



Test multi-step reasoning:


  • Turn 1: "I have 5 apples"

  • Turn 2: "I bought 3 more"

  • Turn 3: "Then I ate 2"

  • Turn 4: "How many do I have now?"

  • Agent should track the math across turns



Test what happens when context is trimmed:


  • Create conversation that exceeds limit

  • Ask about information from early messages (that got trimmed)

  • Agent should gracefully say it doesn't remember



Edge cases:


  • What if first message is very long (>1000 tokens)?

  • What if every message is 100 tokens (slow buildup to limit)?



Deliverable:


  • 3 test conversations demonstrating memory

  • Edge case examples

  • Analysis (200 words) on memory limitations and solutions





Task 4: Integrated RAG + Conversation Agent (30 Marks)


Objective: Build a complete agent combining everything.


Requirements:


Implement `ConversationalAgent` class:


  • init(knowledge_base)   # Initialize with knowledge base and conversation manager   

  • chat(user_message, use_rag=True)   # Main interface - handles everything   

  • determineif_rag_needed(query)   # Decide: Does this need knowledge base?   

  • formatprompt(query, rag_context=None)   # Create final prompt with context   

  • get_session_stats()   # Return conversation and cost statistics



Smart RAG triggering:


  • Detect when user asks about knowledge base topics

  • Skip RAG for general chat ("Hello", "How are you?")

  • Use RAG for factual questions about your documents



Conversation flow:



User Query
   ↓

Add to Conversation History

   ↓

Detect Query Type → Set Parameters

   ↓

Check if RAG is Needed

   ↓

Retrieve Documents (if required)

   ↓

Format Prompt (Conversation History + RAG Context)

   ↓

Call Claude API

   ↓

Add Response to Conversation History

   ↓

Trim History (if needed)

   ↓

Return Response



Testing:


  • Create a 10-turn conversation that:

  • Starts with general chat

  • Asks knowledge base questions (triggers RAG)

  • References earlier conversation

  • Includes follow-up questions

  • Tests memory and RAG together



Deliverable:


  • Complete ConversationalAgent class

  • 10-turn demonstration conversation

  • Examples showing RAG being triggered/skipped

  • Architecture explanation (300 words)



Example conversation:



Turn 1

User: Hi! I'm learning about Python.

Agent: Hello! That's great that you're learning 

Python.[No RAG needed — general greeting]

================================================

Turn 2

User: What are variables in Python?

Agent: [RAG triggered] Based on the documentation, variables in Python are used to store data values.

Sources: [doc2: Python variables]

================================================

Turn 3

User: Can you give me an example?

Agent: [Uses previous context] Sure! Building on what I just explained, here’s an example of variables in Python…[No RAG — answering from previous context]

================================================

Turn 4

User: What was I learning about?

Agent: You mentioned you're learning about Python.[Memory test — should remember Turn 1]




Task 5: Cost Tracking and Session Management (10 Marks)


Objective: Track costs across the entire conversation session.


Requirements:


Track for each turn:


  • Input tokens (conversation history + RAG context if used)

  • Output tokens (agent's response)

  • Cost for that turn

  • Cumulative cost


Implement session statistics:



{ 'total_turns': N, 'total_input_tokens': X, 'total_output_tokens': Y, 'total_cost_usd': Z, 'rag_queries': R, 'average_cost_per_turn': A }



Analyze cost factors:


  • How much does conversation history add to each turn?

  • How much does RAG add to costs?

  • What's the most expensive turn and why?



Projections:

If average conversation is 15 turns And you serve 50 conversations per day What's the monthly cost?



Deliverable:


  • Cost tracking implementation

  • Turn-by-turn cost breakdown

  • Monthly projection



Example output:



Turn 1: Input=245, Output=123 → $0.00256
Turn 2: Input=378, Output=145 → $0.00331 (cumulative: $0.00587)
Turn 3: Input=523, Output=156 → $0.00391 (cumulative: $0.00978)
...
Total session cost: $0.0245
Average per turn: $0.00245





Task 6: Final Demo and Edge Cases (15 Marks)


Objective: Create polished demo with error handling.


Requirements:


Create complete demo conversation:


  • 15+ turns

  • Mix of general chat + knowledge queries

  • Tests memory, RAG, and parameter control

  • Shows costs throughout



Handle edge cases:


  • Empty user input → Ask user to provide input

  • Very long input (>2000 tokens) → Warn or truncate

  • No RAG results found → Agent admits it doesn't know

  • API error → Graceful error message


Polish presentation:


  • Clear output formatting

  • Helpful labels ("Using RAG", "Trimmed context", "Cost: $X")

  • Summary at end



Deliverable:


  • 15-turn demo conversation

  • Edge case handling examples

  • Final summary of session





Bonus Tasks 


Bonus A: Conversation Summarization


  • When context limit is reached, use Claude to summarize old messages

  • Replace old messages with summary

  • Continue with summary + recent messages




Bonus B: Adaptive Context Window


  • Start with small context window

  • Expand when conversation gets complex

  • Contract when simple chat




Bonus C: Export Conversation


  • Save full conversation to file (JSON or Markdown)

  • Include metadata (timestamps, costs, RAG usage)

  • Reload conversation in new session


________________________________________________________________________________


Deliverables


You must submit:


1. Code (Required)


Jupyter Notebook (.ipynb)


  • All code with implementation

  • Clear task sections

  • All outputs visible

  • Runs without errors




2. Knowledge Base (Optional)

If different from Assignment 1, then include your documents




3. Report (Required)

Short report (3–5 pages) with:


  • System architecture overview

  • Implementation approach for each task

  • Key findings and experiments

  • Challenges and solutions

  • Learnings and insights


Format: PDF





Submission Guidelines


Submit via: LMS (Moodle / Google Classroom)


File Naming: <YourName>_LLM_Assignment2.zip


Inside ZIP: /notebook.ipynb/report.pdf/knowledge_base/ (optional)


Deadline: 7 days from release


Late Policy:


  • <24hrs: -10%

  • 24-48hrs: -20%

  • >48hrs: Not accepted





Important Instructions


  • Build on Assignment 1 – Reuse your vector database and RAG system

  • Test thoroughly – Don't just assume things work

  • Comment your code – Explain your logic

  • Track costs – Monitor API usage

  • Handle errors – Don't let your code crash





Call to Action

Ready to transform your business with AI-powered intelligence that accelerates insights, enhances decision-making, and unlocks the full value of your data?


Codersarts is here to help you turn complex data workflows into efficient, scalable, and evidence-driven AI systems that empower teams to make smarter, faster, and more confident decisions.


Whether you’re a startup looking to build AI-driven products, an enterprise aiming to optimize operations through data science, or a research organization advancing innovation with intelligent data solutions, we bring the expertise and experience needed to design, develop, and deploy impactful AI systems that drive measurable business outcomes.




Get Started Today



Schedule an AI & Data Science Consultation:

Book a 30-minute discovery call with our AI strategists and data science experts to discuss your challenges, identify high-impact opportunities, and explore how intelligent AI solutions can transform your workflows and performance.




Request a Custom AI Demo:

Experience AI in action with a personalized demonstration built around your business use cases, datasets, operational environment, and decision workflows — showcasing practical value and real-world impact.









Transform your organization from data accumulation to intelligent decision enablement — accelerating insight generation, improving operational efficiency, and strengthening competitive advantage.


Partner with Codersarts to build scalable AI solutions including RAG systems, predictive analytics platforms, intelligent automation tools, recommendation engines, and custom machine learning models that empower your teams to deliver exceptional results.


Contact us today and take the first step toward next-generation AI and data science capabilities that grow with your business ambitions.




Comments


bottom of page