Building a Complete RAG Search and Answer System

Mar 24
5 min read

Course: RAG from Scratch

Level: Medium to Advanced

Type: Individual

Duration: 7 to 10 days

Objective

This assignment tests your ability to build the retrieval and generation stages of a RAG pipeline from scratch. You will implement cosine similarity without external vector search libraries, build a similarity search function, design a grounding-focused prompt template, and assemble a complete end-to-end RAG system that retrieves context and generates accurate, grounded answers. You will also implement and test refusal logic for out-of-scope queries.

Tasks

Task 1: Embedding Generation (15 marks)

Load the chunks.json file produced in Assignment 1 (or use any chunked knowledge base with at least 30 chunks).
Write a get_embeddings() function that calls the OpenAI API using text-embedding-3-small and returns a list of 1536-dimensional vectors.
Embed all chunks and store the result as a list of dictionaries, where each dictionary contains the original chunk fields plus an embedding key.
Print the total number of chunks embedded, the vector dimension, and an estimate of the API tokens consumed.

Task 2: Cosine Similarity from Scratch (15 marks)

Implement a cosine_similarity(vec_a, vec_b) function using only Python's built-in math module. Do not use NumPy or any vector library.
Verify your implementation with three test cases: identical vectors (expected: 1.0), perpendicular vectors (expected: 0.0), and a known pair of semantically similar sentences.
Print the test results showing the expected and actual values for each case.

Task 3: Similarity Search Function (20 marks)

Write a similarity_search(query, embedded_chunks, top_k=3, threshold=0.3) function that embeds the query, computes cosine similarity against every chunk, and returns the top_k results above the threshold.
Each result must include the chunk text, similarity score, source document filename, and chunk index.
Test your function with at least five diverse queries spanning different topics in your knowledge base. Display the top 3 results for each query.
For one query, display the full similarity score distribution across all chunks (not just the top 3) and comment on what the distribution tells you about retrieval quality.

Task 4: RAG Prompt Template (15 marks)

Design a system prompt that contains at least five explicit grounding rules. The rules must cover: context-only answers, no use of prior knowledge, refusal when the answer is not found, source citation, and handling partial information.
Write a build_rag_prompt(query, retrieved_chunks) function that formats the retrieved chunks and query into the correct message structure for the OpenAI chat API.
Print the fully formatted prompt for one example query so the structure is clearly visible.

Task 5: Complete RAG Pipeline (20 marks)

Write a rag_query(query, embedded_chunks) function that calls similarity_search(), calls build_rag_prompt(), calls the OpenAI chat API using gpt-4o-mini-2024-07-18, and returns the answer along with the sources used.
Run the pipeline on at least six queries: three where the answer is clearly in the knowledge base, and three that are out of scope.
For two queries, compare the RAG answer against a direct LLM answer (same question without retrieval). Display both answers side by side and explain the difference.

Task 6: Refusal Logic and Threshold Analysis (15 marks)

Implement a similarity threshold in your pipeline. If the highest similarity score is below the threshold, the system must return a clear refusal message rather than generating an answer.
Test the refusal logic with at least three out-of-scope queries and confirm the system refuses each one correctly.
Experiment with threshold values of 0.20, 0.30, and 0.40 using the same set of queries. Record which queries pass and fail at each threshold and write a short analysis (100 to 150 words) recommending the best threshold for your knowledge base.

Evaluation Rubric

Criteria	Marks
Embedding Generation	15
Cosine Similarity Implementation	15
Similarity Search Function	20
RAG Prompt Template	15
Complete RAG Pipeline	20
Refusal Logic and Threshold Analysis	15
Total	100

Deliverables

A Jupyter Notebook (.ipynb) containing all code, outputs, test results, and markdown explanations.
An embedded_chunks.json file containing all chunks with their embedding vectors.
A side-by-side comparison table (in the notebook) showing RAG vs direct LLM answers for two queries.
A threshold analysis section (100 to 150 words) with a recommended threshold value and justification.

Submission Guidelines

Submit your work via the course LMS (for example, Moodle or Google Classroom).

File Naming Convention: <YourName>_RAG_Assignment2.zip

Inside the ZIP:

notebook.ipynb
embedded_chunks.json
comparison_table.pdf (or included in the notebook)

Deadline: 7 days from the date of release.

Late Submission Policy

Up to 24 hours late: 10% penalty applied to the final mark.
24 to 48 hours late: 20% penalty applied to the final mark.
Beyond 48 hours: submission will not be accepted.

Important Instructions

You must implement cosine similarity from scratch in Task 2. Using numpy.dot or scipy.spatial.distance will not receive full marks for that task.
For Task 5, the with-retrieval and without-retrieval answers must use the same model (gpt-4o-mini-2024-07-18) to make the comparison fair.
Your notebook must be fully runnable from top to bottom without errors. Use a .env file for the API key and load it with python-dotenv.
Structure-aware markdown cells are expected: use headers to separate each task and include brief explanations before and after each code block.
Plagiarism of any kind will result in disqualification from the assignment.

Guidance and Tips

Verify your cosine similarity implementation against known values before using it in the search function.
Test your search function with queries that are phrased differently from the document text to confirm it works semantically, not just by keyword match.
When designing the system prompt, think about edge cases: what should the model do if the context is partially relevant? What if two chunks contradict each other?
Do not just implement — analyse deeply. A well-reasoned threshold analysis with clear evidence will score higher than one with a correct threshold but no explanation.
Think from a user perspective. A RAG system that refuses correctly is more trustworthy than one that always produces an answer.

Bonus (Optional — up to +10 Marks)

Extend the rag_query() function to cite the exact source document and chunk index in every answer.
Add a re-ranking step that scores retrieved chunks by relevance before passing them to the prompt.
Visualise similarity score distributions for multiple queries as overlapping histograms and comment on what patterns you observe.

Instructor Note

This assignment is intentionally open-ended in places. In real-world RAG systems, there is no single correct threshold, prompt, or pipeline design. What matters is that you can explain your choices, back them up with evidence from your test results, and demonstrate a clear understanding of why each component of the pipeline exists. A thoughtful analysis of a moderate implementation will always score better than a high-quality implementation with no explanation.

Call to Action

Ready to transform your business with AI-powered intelligence that accelerates insights, enhances decision-making, and unlocks the full value of your data?

Codersarts is here to help you turn complex data workflows into efficient, scalable, and evidence-driven AI systems that empower teams to make smarter, faster, and more confident decisions.

Whether you’re a startup looking to build AI-driven products, an enterprise aiming to optimize operations through data science, or a research organization advancing innovation with intelligent data solutions, we bring the expertise and experience needed to design, develop, and deploy impactful AI systems that drive measurable business outcomes.

Get Started Today

Schedule an AI & Data Science Consultation:

Book a 30-minute discovery call with our AI strategists and data science experts to discuss your challenges, identify high-impact opportunities, and explore how intelligent AI solutions can transform your workflows and performance.

Request a Custom AI Demo:

Experience AI in action with a personalized demonstration built around your business use cases, datasets, operational environment, and decision workflows — showcasing practical value and real-world impact.

Email: contact@codersarts.com

Transform your organization from data accumulation to intelligent decision enablement — accelerating insight generation, improving operational efficiency, and strengthening competitive advantage.

Partner with Codersarts to build scalable AI solutions including RAG systems, predictive analytics platforms, intelligent automation tools, recommendation engines, and custom machine learning models that empower your teams to deliver exceptional results.

Contact us today and take the first step toward next-generation AI and data science capabilities that grow with your business ambitions.