A Complete Guide to Creating a Multi-Agent Book Writing System - Part 2

ganesh90
Jun 9, 2025
18 min read

Updated: Jun 10, 2025

Prerequisite: This is a continuation of the blog Part 1: A Complete Guide to Creating a Multi-Agent Book Writing System

🧠 LLMAgent: The Brain Behind All Agents

Imagine your AI project is a team of superheroes. Each one — the writer, the researcher, the planner — has a special power. But they all share the same brain: a powerful language model.

That shared brain? It’s set up by the LLMAgent.

LLM_MODEL = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
class LLMAgent:
    """Base class for all agents using a language model."""
    def init(self, model_name: str = LLM_MODEL):
        """Initialize the language model agent."""
        logger.info(f"Initializing LLM agent with model {model_name}")
        # Memory optimization configuration
        load_options = {
            "torch_dtype": torch.float16 if DEVICE == "cuda" else torch.float32,
            "device_map": "auto",
            "low_cpu_mem_usage": True
        }
        # The compression miracle
        load_options["load_in_4bit"] = True
        # Initialize our text-to-numbers translator
        self.tokenizer = AutoTokenizer.from_pretrained(model_name)
        # Ensure we have a padding token
        if self.tokenizer.pad_token is None:
            self.tokenizer.pad_token = self.tokenizer.eos_token
        # Load the brain with memory optimization
        load_options["bnb_4bit_compute_dtype"] = torch.float16
        self.model = AutoModelForCausalLM.from_pretrained(model_name, **load_options)
        # Move to the best available device
        self.device = "cuda" if torch.cuda.is_available() else "cpu"
        self.model.to(self.device)
        logger.info(f"Model loaded on {self.device}")

Let’s see how it works, step by step 👇

🛠️ Choose Your Brain (aka the Model)

LLM_MODEL = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"

We are using TinyLlama, a lightweight but smart language model — perfect when you want decent performance without needing a GPU cluster.

🧩 Think of it as using a fuel-efficient car instead of a racecar — still gets the job done, just more economically.

🔌 Step 2: Set Up the Agent

def init(self, model_name: str = LLM_MODEL):

When any agent (like WriterAgent, ResearcherAgent) is initialized, this code runs.

📦 Load the Model

load_options = {
    "torch_dtype": torch.float16 if DEVICE == "cuda" else torch.float32,
    "device_map": "auto",
    "low_cpu_mem_usage": True
}

We are making smart decisions:

Use half precision on GPU to save memory (float16)
Let Hugging Face auto-map layers to devices
Reduce CPU RAM usage

🧠 Pro Tip: These options let you run big models on limited hardware.

🧪 4-bit Magic

load_options["load_in_4bit"] = True
load_options["bnb_4bit_compute_dtype"] = torch.float16

We are using 4-bit quantization, a compression trick that makes huge models run on smaller machines — like fitting an elephant into a backpack 🐘🎒.

🔠 Tokenizer Setup

self.tokenizer = AutoTokenizer.from_pretrained(model_name)
if self.tokenizer.pad_token is None:
    self.tokenizer.pad_token = self.tokenizer.eos_token

Tokenizer turns text into numbers (the only thing the model understands).
We make sure it has a padding token to handle batch processing without errors.

🧠 Load the Model

self.model = AutoModelForCausalLM.from_pretrained(model_name, **load_options)
self.device = "cuda" if torch.cuda.is_available() else "cpu"
self.model.to(self.device)

We load the model with all the settings we configured.
Then, move it to the best device — GPU if available, otherwise CPU.

🧩 This means every subclassed agent now has a smart LLM ready to go on the best hardware.

The Text Generation Engine - Where Words Come to Life

def generate(self, prompt: str, max_length: int = 512) -> str:
    """Generate text based on the prompt."""
    logger.info(f"Generating text for prompt: {prompt[:50]}...")
    # Prepare the input for our AI brain
    tokenized_inputs = self.tokenizer(prompt, return_tensors="pt", padding=True)
    inputs = {
        'input_ids': tokenized_inputs.input_ids.to(self.device),
        'attention_mask': tokenized_inputs.attention_mask.to(self.device)
    }

    # Generate the magic
    with torch.no_grad():
        outputs = self.model.generate(
            inputs['input_ids'],
            attention_mask=inputs['attention_mask'],
            max_new_tokens=max_length,
            num_return_sequences=1,
            temperature=0.1,      # Low creativity for consistency
            top_p=0.80,          # Consider top 80% of possibilities
            do_sample=True,
            pad_token_id=self.tokenizer.pad_token_id or self.tokenizer.eos_token_id

        )
    # Decode the AI's thoughts back to human language
    generated_text = self.tokenizer.decode(outputs[0], skip_special_tokens=True)
    # Extract only the new text (clever trick!)
    response = generated_text[len(self.tokenizer.decode(inputs['input_ids'][0], skip_special_tokens=True)):]
    logger.info(f"Generated {len(response)} characters")
    return response.strip()

Imagine if you could have a conversation with a computer that's read millions of books, understands countless topics, and can express thoughts as clearly as your smartest friend. What if I told you that the function we're about to explore is essentially teaching a machine to think out loud – taking a spark of an idea (your prompt) and turning it into coherent, intelligent text?

We're diving into one of the most magical functions in all of AI: text generation! This isn't just about making computers spit out words – this is about understanding how machines learn to communicate, reason, and express complex ideas in human language.

We're going to break down every single line of code that transforms your simple question into an AI's thoughtful response. By the end of this journey, you'll understand the secret sauce behind ChatGPT, writing assistants, and every AI that can hold a conversation! 🚀

Your AI's Voice Box

def generate(self, prompt: str, max_length: int = 512) -> str:
    """Generate text based on the prompt."""

Think of this function as the AI's voice box – the magical translator that takes your ideas and helps the AI express its thoughts in perfect human language!

This function is like having a brilliant conversation partner who:

Understands exactly what you're asking
Thinks carefully about the best response
Expresses their thoughts clearly and coherently
Never gets tired or loses focus

The Promise: Give me some text to start with (prompt) and tell me how much you want me to write (max_length), and I'll return beautifully crafted text that continues your thought!

The Thoughtful Announcer - "Let Me Think About This..."

logger.info(f"Generating text for prompt: {prompt[:50]}...")

Every great conversation starts with acknowledgment! 🤝

🍽️ Like a thoughtful dinner guest who says "That's an interesting question about machine learning..." before diving into their response. The AI is politely acknowledging what you've asked!

Why [:50]? The preview truncation is genius:

Full prompt: "Write a comprehensive guide to machine learning that covers all the basics and advanced topics for beginners and experts alike..."
Logged preview: "Write a comprehensive guide to machine learning..."

Benefits of this approach:

Keeps logs readable (no 1000-character prompt spam)
Preserves privacy (doesn't log sensitive full prompts)
Provides context (you can see what the AI is working on)
Helps debugging (track which prompts cause issues)

Pro Tip: 💡 This is a pattern you'll see in professional AI systems – always log enough context to debug problems, but not so much that you overwhelm your log files or compromise user privacy!

📺 Like a talk show host who says "So you're asking about artificial intelligence..." before giving their response – setting the stage for what's coming!

The Language Translator - Converting Human Thoughts to AI Understanding

# Prepare the input for our AI brain
tokenized_inputs = self.tokenizer(prompt, return_tensors="pt", padding=True)
inputs = {
    'input_ids': tokenized_inputs.input_ids.to(self.device),
    'attention_mask': tokenized_inputs.attention_mask.to(self.device)
}

This is where the magic begins! ✨ We're essentially translating human language into "AI language" – kind of like how you might convert English to French, except we're converting to math!

The Tokenization Process:

🔤 Step 1: Breaking Down the Language

tokenized_inputs = self.tokenizer(prompt, return_tensors="pt", padding=True)

🍕 Imagine you have a delicious sentence-pizza: "Machine learning is amazing!" The tokenizer carefully cuts it into perfect bite-sized pieces that the AI can digest:

Original: "Machine learning is amazing!"

Tokens: ["Machine", "learning", "is", "amazing", "!"]

Numbers: [15496, 6044, 374, 8056, 0]

What each parameter does:

return_tensors="pt": "Give me PyTorch tensors" (the format our AI model loves)
padding=True: "Make all sequences the same length" (like making sure all pizza slices fit in the same box)

🧠 Like a simultaneous interpreter at the UN who breaks down complex speeches into concepts that can be translated perfectly!

The Smart Input Preparation:

🎯 Creating the Perfect Input Package:

inputs = {
    'input_ids': tokenized_inputs.input_ids.to(self.device),
    'attention_mask': tokenized_inputs.attention_mask.to(self.device)
}

📦 Like preparing a carefully labeled package for express delivery:

input_ids: The actual message content

What it contains: The numbers representing your text tokens
Like: The letter inside an envelope

attention_mask: The "pay attention here" instructions

What it contains: 1s and 0s telling the AI which parts are real content vs. padding
Like: A highlighter marking the important parts of a document

.to(self.device): The delivery address

What it does: Sends the data to the right processing unit (GPU or CPU)
Like: Making sure your package goes to the right warehouse for processing

🏭 Like a logistics company that automatically routes packages to the fastest processing center (GPU if available, CPU if not)!

Why this matters: AI models are picky eaters – they need their data served in exactly the right format, on the right device, with clear instructions about what to pay attention to!

The AI Brain Activation - "Let Me Think About This Carefully..."

# Generate the magic
with torch.no_grad():
    outputs = self.model.generate(
        inputs['input_ids'],
        attention_mask=inputs['attention_mask'],
        max_new_tokens=max_length,
        num_return_sequences=1,
        temperature=0.1,      # Low creativity for consistency
        top_p=0.80,          # Consider top 80% of possibilities
        do_sample=True,
        pad_token_id=self.tokenizer.pad_token_id or self.tokenizer.eos_token_id
    )

This is the moment where artificial intelligence comes alive! 🧠⚡ Your prompt goes in, and the AI's neural networks light up with billions of calculations to craft the perfect response.

The Memory-Saving Wrapper:

🧠 The torch.no_grad() Context:

with torch.no_grad():

📚 Like telling your brain "We're just reading and responding, not studying for a test." This tells PyTorch "We're generating text, not learning new information, so don't waste memory tracking how to improve."

What this saves:

Memory usage: Doesn't store gradients (learning information)
Processing time: Skips unnecessary calculations
System stability: Prevents memory overflow on large generations

Pro Tip: 💡 Always use torch.no_grad() during inference (using the model) vs. training (teaching the model). It's like the difference between using a calculator vs. learning math!

The Generation Parameters - AI Personality Control:

🎛️ The AI's Personality Control Panel:

Each parameter is like adjusting different aspects of how your AI "thinks":

🌡️ temperature=0.1 - The Creativity Knob:

temperature=0.1 # Low creativity for consistency

⛅ Like the temperature of a conversation:

Low (0.1): Cool, calm, consistent responses (like a careful professor)
High (1.0+): Hot, creative, unpredictable responses (like an excited artist)

Real examples:

# Temperature 0.1 (our setting): "Machine learning is a subset of artificial intelligence that..."

# Temperature 1.0: "Machine learning? Oh wow, it's like teaching computers to dream!"

🎯 top_p=0.80 - The Consideration Filter:

top_p=0.80 # Consider top 80% of possibilities

🍽️ Like a smart waiter who only suggests the best 80% of menu items, ignoring the weird stuff that nobody orders.

How it works:

AI considers: "machine", "artificial", "computer" (top probability words)
AI ignores: "banana", "purple", "dinosaur" (low probability words)
Result: Coherent, sensible text that stays on topic

🎲 do_sample=True - The Dice Roll:

do_sample=True

📝 Instead of always picking the #1 most likely word (boring!), the AI rolls weighted dice among the top options, creating natural variation.

Without sampling: "The cat sat on the mat. The cat sat on the mat. The cat sat..." With sampling: "The cat rested on the soft mat. The feline lounged comfortably..."

🔢 max_new_tokens=max_length - The Word Budget:

max_new_tokens=max_length

📱 Like telling the AI "You have 512 words to make your point – use them wisely!"

Why "new" tokens? It only counts the AI's response, not your original prompt. Smart!

🏁 num_return_sequences=1 - How Many Drafts:

num_return_sequences=1

Writer's Multiple Drafts: ✍️ Like asking a writer for one polished response instead of three different versions to choose from.

⏹️ pad_token_id - The Stop Sign:

pad_token_id=self.tokenizer.pad_token_id or self.tokenizer.eos_token_id

🚦 Like having a smart traffic light that knows when to say "the conversation is complete" vs. "keep going."

The Mind Reader - Decoding AI Thoughts Back to Human Language

# Decode the AI's thoughts back to human language
generated_text = self.tokenizer.decode(outputs[0], skip_special_tokens=True)

The reverse magic! 🪄 Now we translate the AI's mathematical thoughts back into beautiful human language!

The Decoding Process:

🌍 Like having a Star Trek universal translator that converts alien mathematical language back into perfect English!

What's happening:

AI Output: [15496, 6044, 374, 8056, 25, 362, 8147, 5507, ...]

Decoded Text: "Machine learning is amazing! It uses algorithms to..."

🔍 Breaking Down the Parameters:

outputs[0]: The first (and only) sequence we generated

Like: Picking the first draft from a stack of papers

skip_special_tokens=True: Hide the behind-the-scenes tokens

What it removes: [PAD], [CLS], [SEP] and other "backstage" tokens
Like: Editing out the "um" and "uh" from a speech recording
Result: Clean, readable text without AI housekeeping tokens

Magic Moment: 🎭 This is where pure mathematics becomes human communication! Billions of calculations resolve into words you can read and understand!

The Clever Editor - Extracting Just the New Thoughts

# Extract only the new text (clever trick!)
response = generated_text[len(self.tokenizer.decode(inputs['input_ids'][0], skip_special_tokens=True)):]

This is one of the most clever bits of code you'll ever see! 🧠✨

The Problem This Solves:

Echo Chamber Issue: 📢 The AI model returns EVERYTHING – your original prompt + its response:

Your prompt: "Write about machine learning:"

AI returns: "Write about machine learning: Machine learning is a fascinating field that..."

What you want: "Machine learning is a fascinating field that..."

🏥 Like a surgeon who carefully removes only the appendix, leaving everything else perfectly intact!

Step-by-step breakdown:

inputs['input_ids'][0]: Get the original prompt tokens
self.tokenizer.decode(...): Convert back to text to measure length
len(...): Count characters in original prompt
generated_text[len(...):]: Slice off everything from that point forward

String Slicing Magic:

generated_text = "Write about ML: Machine learning is amazing!"
original_length = len("Write about ML: ")  # 16 characters
response = generated_text[16:]  # "Machine learning is amazing!"

📰 Like an editor who removes the interview question and keeps only the expert's answer for publication!

Why this is brilliant:

Clean responses without prompt repetition
Exact extraction using mathematical precision
Language agnostic (works in any language)
Token-perfect accuracy (no guessing where to cut)

Pro Tip: 💡 This technique works because tokenization is deterministic – the same text always produces the same tokens, so we can measure exactly!

The Accomplishment Announcer - "Mission Accomplished!"

logger.info(f"Generated {len(response)} characters")
return response.strip()

The victory lap! 🏆 Time to celebrate the successful transformation of your idea into AI-generated text!

The Performance Reporter:

📊 The Character Counter:

logger.info(f"Generated {len(response)} characters")

🏎️ Like a race car driver announcing their lap time – it helps you understand performance and spot potential issues!

What the numbers tell you:

0-50 characters: "Hmm, very short response – maybe the prompt was unclear?"
200-800 characters: "Perfect! Rich, detailed response"
1000+ characters: "Wow, the AI was really inspired by this prompt!"

Debugging gold: 🔍 When responses seem weird, check the character count first – it often reveals the problem!

The Final Polish:

✨ The .strip() Cleanup:

return response.strip()

✂️ Like a professional editor who removes trailing spaces and makes everything look perfectly formatted!

What .strip() removes:

Leading spaces: " Hello world" → "Hello world"
Trailing spaces: "Hello world " → "Hello world"
Newlines: "\nHello world\n" → "Hello world"

Why this matters: Professional text generation should never have messy whitespace at the edges!

📚 OutlineAgent: Your Book's Blueprint Architect

Think of OutlineAgent as the planner or architect for your machine learning book. Before anyone writes a single word, this agent figures out what the chapters will be and what each one should talk about.

import re
import json

class OutlineAgent(LLMAgent):
    """Agent responsible for creating the book outline."""
    def create_outline(self) -> List[Dict[str, str]]:
        """Create a detailed outline for the ML book."""
        logger.info("Creating book outline")
        prompt = """
        Create a detailed outline for a 3-chapter book titled "Introduction to Machine Learning".
        For each chapter, provide a title and a brief description of the content.
        The chapters should cover:
        1. What is Machine Learning
        2. Supervised Learning
        3. Unsupervised Learning
        Format your response as a JSON array of dictionaries, where each dictionary has 'title' and 'description' keys.

        """
        response = self.generate(prompt, max_length=1000)
        # Parse the AI's response with multiple fallbacks
        try:
            # Try to extract JSON from the messy AI output
            json_pattern = r'\[\s*(\{.*?\}\s*,?\s*)+\]'
            json_match = re.search(json_pattern, response, re.DOTALL)
            if json_match:
                json_str = json_match.group(0)
                outline = json.loads(json_str)
                print(outline)
            else:
                # First fallback - predefined outline
                logger.warning("Could not parse JSON from response, using default outline")
                outline = [
                    {
                        "title": "What is Machine Learning",
                        "description": "Introduction to machine learning concepts."
                    },
                    {
                        "title": "Supervised Learning",
                        "description": "Understanding supervised learning algorithms."
                    },
                    {
                        "title": "Unsupervised Learning",
                        "description": "Exploring unsupervised learning methods."
                    },
                ]
        except Exception as e:
            logger.error(f"Error parsing outline: {e}")
            # Second fallback - detailed predefined outline
            outline = [
                {
                    "title": "What is Machine Learning",
                    "description": "Introduction to machine learning concepts, history, and applications."

                },
                {
                    "title": "Supervised Learning",
                    "description": "Understanding supervised learning algorithms and techniques."

                },
                {
                    "title": "Unsupervised Learning",
                    "description": "Exploring unsupervised learning methods and clustering."
                },
            ]
        logger.info(f"Created outline with {len(outline)} chapters")
        return outline

🧠 Ask the AI Nicely

prompt = """
Create a detailed outline for a 3-chapter book titled "Introduction to Machine Learning".
...
"""

We will craft a very specific prompt — like giving a task to an assistant:

"Please outline this book in 3 chapters, give me a title and a short description for each."

And just like that, your LLM (Large Language Model) knows exactly what you're expecting — formatted in JSON, no less!

🛠 Let the AI Generate

response = self.generate(prompt, max_length=1000)

This sends the prompt to the model, which generates a response — ideally a nice, clean JSON list like this:

[

{"title": "What is Machine Learning", "description": "Intro..."},

...

]

🧹 Clean Up AI Mess (If Needed)

AIs aren’t always neat. So you’ve included a smart rescue plan:

json_pattern = r'\[\s*(\{.*?\}\s*,?\s*)+\]'
json_match = re.search(json_pattern, response, re.DOTALL)

This clever use of regular expressions tries to extract just the JSON part from the response — even if it’s wrapped in extra text like:

"Sure! Here's your outline: \n [ ... ]"

🚨 Fallback Plans — Just in Case

Let’s say the AI response is a mess, or it fails JSON parsing (it happens!). We have got not one, but two fallback plans:

Fallback 1: Default Short Outline

outline = [
  {"title": "What is Machine Learning", ...}
]

If the JSON fails to parse but your regex fails silently — you still return something reasonable.

Fallback 2: More Detailed Outline - If an exception is raised while parsing

This gives even more helpful descriptions. It’s like saying, “Hey, if the assistant messes up, we will step in and hand-write the outline myself.”

📊 Logging Every Step

We log everything using logger.info, logger.warning, and logger.error, which is:

Great for debugging
Helpful for monitoring behavior in real-time

logger.info("Creating book outline")
logger.warning("Could not parse JSON from response, using default outline")
logger.error(f"Error parsing outline: {e}")

🔍 Agent 2 - Meet the Detective of Your Writing Team: ResearcherAgent

We're diving into the ResearcherAgent – a AI investigator that combines the analytical skills of Sherlock Holmes with the speed of a supercomputer and the thoroughness of the world's best librarian.

This is about intelligent information discovery that understands context, relevance, and meaning. We will build AI systems that can research any topic with good precision! 🚀

class ResearcherAgent:
    """Agent responsible for retrieving relevant information for each chapter."""
    def init(self, rag_system: RAGSystem):
        """Initialize the researcher agent with a RAG system."""
        logging.info("Initializing researcher agent")
        self.rag = rag_system
    def research(self, chapter: Dict[str, str]) -> List[Document]:
        """Research content for a specific chapter."""
        title = chapter["title"]
        description = chapter["description"]
        logging.info(f"Researching content for chapter: {title}")
        # Construct a smart search query
        query = f"{title}. {description}"
        # Retrieve the most relevant documents
        documents = self.rag.retrieve(query, k=TOP_K_RESULTS)
        # Show the user what we found (transparency is key!)
        print(f"\n\n{'='*80}")
        print(f"RESEARCH RESULTS FOR CHAPTER: {title}")
        print(f"{'='*80}")
        for i, doc in enumerate(documents):
            print(f"\nDocument {i+1}:")
            print(f"Source: {doc.metadata.get('source', 'Unknown')}")
            content_preview = doc.page_content[:200] + "..." if len(doc.page_content) > 200 else doc.page_content
            print(f"Content: {content_preview}")
            print('-' * 50)
        logging.info(f"Retrieved {len(documents)} documents for chapter {title}")
        return documents

The Detective Class: Your AI Research Partner

class ResearcherAgent:
    """Agent responsible for retrieving relevant information for each chapter."""

Think of this class as hiring the world's most competent research assistant – someone who has access to every document in your library, understands exactly what makes information relevant, and can work 24/7 without ever getting bored or distracted!

This agent embodies the perfect research workflow:

Understands your request with contextual intelligence
Searches systematically through vast information repositories
Evaluates relevance using sophisticated AI algorithms
Presents findings with complete transparency
Maintains detailed records of the investigation process

🎓 Like having a brilliant graduate student who specializes in your exact field, has read every paper in the library, and can instantly recall the most relevant information for any question you ask!

The Detective's Badge - Establishing Credentials

def init(self, rag_system: RAGSystem):
    """Initialize the researcher agent with a RAG system."""
    logging.info("Initializing researcher agent")
    self.rag = rag_system

🎯 The Initialization Process:

👮‍♀️ Like a detective graduating from the academy and being assigned their first patrol car (the RAG system) and badge number (logging confirmation).

📝 The Official Announcement:

logging.info("Initializing researcher agent")

What this does: Creates a permanent record that a new research detective has joined the force!

🧰 The Equipment Assignment:

self.rag = rag_system

🔍 Like assigning a detective their standard equipment:

Badge (authorization to access information)
Radio (communication with the central database)
Forensics kit (tools to analyze and understand documents)
Case files (access to all available evidence)

What self.rag represents:

The detective's memory palace - instant access to all documents
The search warrant - permission to investigate any topic
The evidence database - organized, searchable information
The analysis lab - tools to understand document relevance

Dependency Injection Pattern: 🏗️ This is a sophisticated design pattern where we "inject" the RAG system as a dependency. Like giving the detective access to the police database – they don't need to build it themselves, they just need to know how to use it!

Pro Tip: 💡 Notice how the ResearcherAgent doesn't create its own RAG system – it receives one that's already been set up. This is brilliant because:

Separation of concerns (research vs. document management)
Reusability (same RAG system can serve multiple agents)
Testability (can inject mock systems for testing)
Flexibility (can swap in different RAG implementations)

The Case Assignment - Understanding the Mission

def research(self, chapter: Dict[str, str]) -> List[Document]:
    """Research content for a specific chapter."""
    title = chapter["title"]
    description = chapter["description"]
    logging.info(f"Researching content for chapter: {title}")

The moment our detective receives their case assignment! 📋 Time to understand exactly what we're investigating.

The Case File Analysis:

🗂️ Extracting Key Information:

logger.info("Creating book outline")
logger.warning("Could not parse JSON from response, using default outline")
logger.error(f"Error parsing outline: {e}")

title = chapter["title"]
description = chapter["description"]

📑 Like a detective opening a new case file and immediately extracting the crucial details:

Example case file:

chapter = {
    "title": "Supervised Learning",
    "description": "Understanding supervised learning algorithms and techniques"
}
# Detective extracts:
title = "Supervised Learning"  # The main subject of investigation
description = "Understanding supervised learning algorithms and techniques"  # The specific angle

Why separate these? 🤔

Title: The broad topic (like "robbery case")
Description: The specific focus (like "jewelry store robbery with security footage")
Combined: They'll form the perfect search strategy

📢 The Case Announcement:

logging.info(f"Researching content for chapter: {title}")
# Construct a smart search query
query = f"{title}. {description}"

📻 "Detective Smith here, beginning investigation into the Supervised Learning case. All units be advised."

What this accomplishes:

Transparency for users watching the process
Debugging breadcrumbs when tracing system behavior
Performance monitoring (how long does each research task take?)
Professional documentation of the investigation process

Real-world log example:

2024-01-15 14:32:15 - INFO - Researching content for chapter: Supervised Learning
2024-01-15 14:32:16 - INFO - Researching content for chapter: Unsupervised Learning  
2024-01-15 14:32:17 - INFO - Researching content for chapter: Neural Networks

Pro Tip: 💡 The f-string formatting (f"Researching content for chapter: {title}") is both readable and efficient. It's like having a template for police reports that automatically fills in the suspect's name!

The Smart Query Construction - Crafting the Perfect Investigation Strategy

# Construct a smart search query
query = f"{title}. {description}"

This single line represents sophisticated information science! 🧠✨ Our detective is crafting the perfect question to get the best possible evidence.

✅ Smart approach (title + description):

query = "Supervised Learning. Understanding supervised learning algorithms and techniques"

# Finds: Focused articles about algorithms, technique explanations, implementation guides

The Magic of Context: 🪄

Vague order: "I want pasta" (gets you random pasta dish)
Specific order: "I want pasta. Something with mushrooms and cream sauce" (gets you exactly what you're craving)

How this improves search results:

🎪 The Semantic Magic:

Title provides topic scope ("We're talking about Supervised Learning")
Description adds context ("Specifically about algorithms and techniques")
Combined query leverages semantic search (finds conceptually related content)
RAG system understands intent (returns highly relevant documents)

🎓 Like formulating a research question that combines the broad field ("Machine Learning") with the specific investigation ("algorithm effectiveness in supervised tasks").

Trivia Time: 🤓 Modern AI search systems are trained on natural language, so well-formed sentences often perform better than keyword salad!

The Investigation - Deploying the Detective's Tools

# Retrieve the most relevant documents
documents = self.rag.retrieve(query, k=TOP_K_RESULTS)

This is where our detective puts on their investigative hat and gets to work! 🕵️‍♂️ Time to deploy the sophisticated search technology and find the best evidence.

The High-Tech Investigation:

🔬 CSI: Document Investigation:

🚨 Like a CSI team deploying their most advanced forensic tools to analyze evidence – except instead of DNA analysis, we're doing semantic similarity analysis!

What happens behind the scenes:

Query analysis - Understanding what we're really looking for
Vector conversion - Turning the question into mathematical form
Similarity search - Finding documents with related "mathematical fingerprints"
Relevance ranking - Ordering results by how well they match
Top-K selection - Returning the best evidence (usually 15 documents)

🎯 The k=TOP_K_RESULTS Parameter:

👥 Like assembling a task force of the top 15 most relevant experts for your case, rather than interviewing everyone in the city!

Why TOP_K_RESULTS (usually 15)?

Quality over quantity (15 highly relevant docs beat 100 random ones)
Processing efficiency (manageable amount for AI to synthesize)
Attention limits (AI models have context window constraints)
User experience (enough depth without information overload)

Database Query Comparison: 💾

Traditional database (keyword matching)

SELECT * FROM documents WHERE content LIKE '%supervised learning%'

Our RAG system (semantic understanding)

rag.retrieve("Supervised Learning. Understanding algorithms...", k=15)

Find documents about: ML algorithms, training methods, classification techniques even if they don't contain the exact words "supervised learning"!

The Retrieval Magic: ✨

What makes this retrieval special:

Semantic understanding (finds meaning, not just keywords)
Context awareness (understands the relationship between concepts)
Relevance scoring (ranks results by how well they match)
Diversity balancing (avoids returning 15 nearly-identical documents)

Library Research Comparison: 📚

Traditional library: "Find all books with 'supervised learning' in the title"
Our AI detective: "Find the most relevant information about supervised learning algorithms and techniques, regardless of exact wording"

Pro Tip: 💡 The RAG system doesn't just find documents that contain your exact words – it finds documents that are conceptually related to your query. This is why modern AI search is so much more powerful than traditional keyword search!

The Evidence Presentation - Transparency in Action

# Show the user what we found (transparency is key!)
print(f"\n\n{'='*80}")
print(f"RESEARCH RESULTS FOR CHAPTER: {title}")
print(f"{'='*80}")
for i, doc in enumerate(documents):
    print(f"\nDocument {i+1}:")
    print(f"Source: {doc.metadata.get('source', 'Unknown')}")
    content_preview = doc.page_content[:200] + "..." if len(doc.page_content) > 200 else doc.page_content
    print(f"Content: {content_preview}")
    print('-' * 50)

This is where our detective becomes a master communicator! 📢 Time to present the evidence with complete transparency and professional formatting.

The Professional Case Report:

🎪 The Dramatic Header:

# Construct a smart search query
print(f"\n\n{'='*80}")
print(f"RESEARCH RESULTS FOR CHAPTER: {title}")
print(f"{'='*80}")

⚖️ Like a lawyer dramatically presenting evidence: "Ladies and gentlemen of the jury, I present to you the evidence for the case of Supervised Learning!"

Visual formatting breakdown:

\n\n: Creates breathing room (like a dramatic pause)
{'='*80}: Creates a solid line of 80 equal signs (professional border)
Title in caps: Commands attention (this is important!)
Symmetrical borders: Professional document formatting

Example output:

================================================================================
RESEARCH RESULTS FOR CHAPTER: Supervised Learning
================================================================================

🗂️ The Evidence Catalog:

for i, doc in enumerate(documents):
    print(f"\nDocument {i+1}:")
    print(f"Source: {doc.metadata.get('source', 'Unknown')}")
    content_preview = doc.page_content[:200] + "..." if len(doc.page_content) > 200 else doc.page_content
    print(f"Content: {content_preview}")
    print('-' * 50)

🏛️ Like a detective walking the jury through the evidence room, showing each piece of evidence with detailed explanations.

🔍 Breaking Down the Evidence Presentation:

The Document Counter:

for i, doc in enumerate(documents):
    print(f"\nDocument {i+1}:")

Exhibit Labeling: Like numbering evidence in court: "Exhibit A", "Exhibit B", etc. The enumerate() function automatically counts from 0, but i+1 makes it human-friendly (1, 2, 3...).

The Source Citation:

print(f"Source: {doc.metadata.get('source', 'Unknown')}")

Academic Integrity: 📚 Like citing your sources in a research paper. The .get('source', 'Unknown') is defensive programming – if metadata is missing, we gracefully show "Unknown" instead of crashing.

The Smart Content Preview:

content_preview = doc.page_content[:200] + "..." if len(doc.page_content) > 200 else doc.page_content

print(f"Content: {content_preview}")

Movie Trailer Strategy: 🎬 Like showing a compelling trailer instead of the full movie – give enough content to understand relevance without overwhelming the viewer.

The logic breakdown:

If document > 200 characters: Show first 200 + "..." (teaser)
If document ≤ 200 characters: Show the whole thing (complete picture)
Result: Consistent, readable output regardless of document length

The Visual Separator:

print('-' * 50)

Clean Organization: Like putting each piece of evidence in its own labeled box. Creates visual separation between documents for easy reading.

The Complete Evidence Presentation Example:

================================================================================
RESEARCH RESULTS FOR CHAPTER: Supervised Learning
================================================================================

Document 1:
Source: dataset/ml_fundamentals.pdf

Content: Supervised learning is a machine learning paradigm where algorithms learn from labeled training data to make predictions on new, unseen data. This approach requires...

--------------------------------------------------

Document 2:
Source: dataset/algorithms_guide.txt

Content: Classification and regression are the two main types of supervised learning tasks. In classification, the goal is to predict discrete categories or classes...

--------------------------------------------------

Document 3:
Source: dataset/neural_networks.md

Content: Neural networks can be applied to supervised learning problems by adjusting weights and biases through backpropagation during the training process...

--------------------------------------------------

Why This Presentation is Brilliant: ✨

🎯 User Benefits:

Complete transparency (see exactly what the AI found)
Source verification (check the credibility of information)
Content preview (understand relevance before diving deeper)
Professional formatting (easy to read and understand)

🔧 Developer Benefits:

Debugging gold (see exactly what the search returned)
Quality validation (spot irrelevant or low-quality results)
Performance insight (understand search effectiveness)
Trust building (users can verify the AI's work)

🎓 Like a perfectly formatted bibliography where you can see not just the source, but also a preview of the relevant content from each source!

The Case Closure - Professional Documentation

logging.info(f"Retrieved {len(documents)} documents for chapter {title}")

return documents

Every great detective closes their case with professional documentation! 📋 Time to record the results and deliver the evidence to the client.

The Official Case Report:

📊 The Statistical Summary:

logging.info(f"Retrieved {len(documents)} documents for chapter {title}")

Police Report Completion: 📝 Like a detective filing their final report: "Investigation complete. Retrieved 15 pieces of evidence for the Supervised Learning case."

Why this logging matters:

📈 Performance Monitoring:

0 documents: "Houston, we have a problem!" (search failed)
1-5 documents: "Limited evidence found" (might need broader search)
10-15 documents: "Perfect investigation!" (good depth of evidence)
15+ documents: "Jackpot!" (rich information available)

🐛 Debugging Intelligence:

INFO: Retrieved 15 documents for chapter Supervised Learning
INFO: Retrieved 12 documents for chapter Neural Networks  
INFO: Retrieved 14 documents for chapter Deep Learning

# Problem indicator:
INFO: Retrieved 0 documents for chapter Quantum Computing

📋 Audit Trail Creation: This creates a permanent record of:

When the research was conducted
What was being researched
How many documents were found
Which chapter the research supports

🎁 The Evidence Delivery:

return documents

📦 Like a detective carefully handing over all the collected evidence to the prosecutor (in our case, the Writer Agent who will use this research).

What gets returned:

Organized document collection (ready for immediate use)
Metadata preserved (source information intact)
Relevance-ranked order (best evidence first)
Processing-ready format (no additional cleanup needed)

The Return Type Promise: 🤝

-> List[Document]

This type hint is like a contract: "I promise to return a list of Document objects, never None, never a string, always a proper list you can iterate over safely."

Part 3 is available at: https://www.codersarts.com/post/a-complete-guide-to-creating-a-multi-agent-book-writing-system-part-3

Transform Your Projects with Codersarts

Whether you're looking to implement RAG systems for your organization, need help with complex AI projects, or want to build custom multi-agent systems, the experts at Codersarts are here to help. From academic assignments to enterprise-level AI solutions, we provide:

Custom RAG Implementation: Tailored document processing and retrieval systems
Multi-Agent System Development: Complex AI workflows for your specific needs
AI Training & Consulting: Learn to build and deploy production-ready AI systems
Research Support: Get help with cutting-edge AI research and development

Don't let complex AI implementations slow down your innovation. Connect with Codersarts today and turn your AI ideas into reality!

Ready to get started? Visit Codersarts.com or reach out to our team to discuss your next AI project. The future of intelligent automation is here – let's build it together!

🧠 LLMAgent: The Brain Behind All Agents

🛠️ Choose Your Brain (aka the Model)

🔌 Step 2: Set Up the Agent

📦 Load the Model

🧪 4-bit Magic

🔠 Tokenizer Setup

🧠 Load the Model

The Text Generation Engine - Where Words Come to Life

Your AI's Voice Box

The Thoughtful Announcer - "Let Me Think About This..."

The Language Translator - Converting Human Thoughts to AI Understanding

The Tokenization Process:

The Smart Input Preparation:

The AI Brain Activation - "Let Me Think About This Carefully..."

The Memory-Saving Wrapper:

The Generation Parameters - AI Personality Control:

The Mind Reader - Decoding AI Thoughts Back to Human Language

The Decoding Process:

The Clever Editor - Extracting Just the New Thoughts

The Problem This Solves:

String Slicing Magic:

The Accomplishment Announcer - "Mission Accomplished!"

The Performance Reporter:

📊 The Character Counter:

The Final Polish:

📚 OutlineAgent: Your Book's Blueprint Architect

🧠 Ask the AI Nicely

🛠 Let the AI Generate

🧹 Clean Up AI Mess (If Needed)

🚨 Fallback Plans — Just in Case

Fallback 1: Default Short Outline

Fallback 2: More Detailed Outline - If an exception is raised while parsing

📊 Logging Every Step

🔍 Agent 2 - Meet the Detective of Your Writing Team: ResearcherAgent

The Detective Class: Your AI Research Partner

The Detective's Badge - Establishing Credentials

📝 The Official Announcement:

🧰 The Equipment Assignment:

The Case Assignment - Understanding the Mission

The Case File Analysis:

🗂️ Extracting Key Information:

The Smart Query Construction - Crafting the Perfect Investigation Strategy

The Investigation - Deploying the Detective's Tools

The High-Tech Investigation:

The Evidence Presentation - Transparency in Action

The Professional Case Report:

The Complete Evidence Presentation Example:

The Case Closure - Professional Documentation

The Official Case Report:

Part 3 is available at: https://www.codersarts.com/post/a-complete-guide-to-creating-a-multi-agent-book-writing-system-part-3

Transform Your Projects with Codersarts

Comments