Build an AI Quiz Generator with GPT-4o-mini, FastAPI, and React: Architecture Deep Dive
- 18 hours ago
- 12 min read
Updated: 7 hours ago

The Problem Every Learner Knows Too Well
You finish reading a dense chapter, sit through a two-hour YouTube lecture, or download a 60-page PDF from your professor. You understand the material — or at least you think you do. But actually testing that understanding? That means manually writing questions, hunting down a quiz platform, and spending more time on question creation than on learning itself. Most people skip the self-testing step entirely, which is exactly why retention suffers.
The AI Quiz Generator solves this. It is a full-stack web application that accepts text, PDFs, YouTube URLs, or images as input and automatically generates structured quizzes — multiple-choice, true/false, or fill-in-the-blank — using GPT-4o-mini. Results appear instantly in a clean React UI, and every quiz you generate can be saved to a local history for later review.
Real-world use cases include:
Students turning handwritten class notes or downloaded slides into practice quizzes
Teachers generating review questions directly from PDF reading materials
Bootcamp learners building comprehension checks out of YouTube coding tutorials
Course creators converting long-form content into embedded assessments
Tutors running lightweight quiz workflows for their students without an LMS
Solo founders rapidly prototyping study tools for niche learning communities
This post covers the full architecture, recommended tech stack, implementation phases, and the real challenges you will hit when building this system. It does not include the full source code — that is available in the complete course on labs.codersarts.com.
📄 Before you dive in — grab the free PRD template that maps out this entire system: architecture, API spec, sprint plan, and system prompt. [Download the free PRD]
How It Works: The Core Concept
At its heart, the AI Quiz Generator is a structured extraction pipeline. The user provides a learning artifact — raw text, a PDF, a YouTube video URL, or an image — and the system's job is to extract the semantic content from that artifact and then prompt an LLM to generate pedagogically sound quiz questions from it.
Why the Naive Approach Fails
The obvious approach is to dump the entire source material directly into a prompt and ask GPT-4 to produce questions. This breaks immediately in the real world for several reasons:
Token limits. A 50-page PDF contains roughly 25,000–35,000 tokens. GPT-4o-mini has a 128k context window, but pushing that much content in on every request is expensive and slow.
Quality degrades with length. The model loses focus on specific details when the prompt is too long. Quiz questions become vague, generic, or repetitive.
Inconsistent output format. Without strict schema enforcement, GPT-4o-mini sometimes returns Markdown, sometimes JSON, sometimes a numbered list. Your frontend cannot parse all three.
Multi-modal inputs require different handling. A YouTube URL is not text — you need to fetch a transcript. An image is not text — you need to encode it for Vision. A generic "paste it in" approach cannot handle these cases.
How This Architecture Solves It
Instead of a single naive prompt call, the system uses dedicated preprocessors for each input type, followed by a shared quiz generator service that always enforces a Pydantic schema on the output.
┌─────────────────────────────────────────────────────────┐
│ INPUT LAYER │
│ Text │ PDF Upload │ YouTube URL │ Image Upload │
└───┬────┴──────┬───────┴──────┬────────┴───────┬────────┘
│ │ │ │
▼ ▼ ▼ ▼
┌──────┐ ┌─────────┐ ┌──────────────┐ ┌─────────────┐
│ Raw │ │ PyMuPDF │ │ youtube- │ │ Pillow / │
│ text │ │ extract │ │ transcript- │ │ base64 │
│ │ │ text │ │ api │ │ encode │
└──┬───┘ └────┬────┘ └──────┬───────┘ └──────┬──────┘
│ │ │ │
└───────────┴──────────────┴──────────────────┘
│
▼
┌───────────────────────────┐
│ QUIZ GENERATOR SERVICE │
│ (system prompt + schema) │
└──────────────┬────────────┘
│
▼
┌───────────────────────────┐
│ GPT-4o-mini API call │
└──────────────┬────────────┘
│
▼
┌───────────────────────────┐
│ Pydantic validation │
└──────────────┬────────────┘
│
▼
┌───────────────────────────┐
│ SQLite persistence │
│ (history module) │
└──────────────┬────────────┘
│
▼
┌───────────────────────────┐
│ React UI / My Quizzes │
└───────────────────────────┘
Think of it like a kitchen: no matter whether the chef receives fresh vegetables, frozen produce, or canned goods, a prep station processes each ingredient into the same standard format before it reaches the stove. The stove (GPT-4o-mini) always sees clean, normalised input and always produces structured JSON output.
System Architecture Deep Dive
Architecture Overview
The application is structured across five distinct layers, each with a clear responsibility boundary.
Frontend (React 18 + TypeScript + Vite + Tailwind CSS): Collects the user's input mode (text / PDF / YouTube / image), source content, and quiz settings (question type, question count). Sends FormData requests to FastAPI endpoints and renders the returned quiz as an interactive component. Also manages the "My Quizzes" history view.
Backend (Python + FastAPI): Exposes four generation endpoints (one per input type) plus three history endpoints (save, list, delete). Delegates content extraction to dedicated processor modules and quiz generation to the shared generator service. Handles CORS, validation, and error responses.
AI Layer (OpenAI GPT-4o-mini): Receives a constructed prompt containing the extracted content and quiz configuration. Returns structured JSON quiz data. The system prompt enforces strict output format and is the single point where all quiz generation logic lives.
Data Layer (SQLAlchemy + SQLite): Persists quiz records as JSON text in a local SQLite database. Exposes save, list, and delete operations through the history endpoints.
External APIs / Libraries: PyMuPDF for PDF text extraction, youtube-transcript-api for YouTube caption retrieval, Pillow for image encoding, and the OpenAI Python SDK for model calls.
Component Reference Table
Component | Role | Technology Options |
Frontend framework | SPA UI, routing, state | React 18, Next.js, Vue 3, Svelte |
Styling | Layout, responsive UI | Tailwind CSS, MUI, Chakra UI |
Backend framework | REST API, request routing | FastAPI, Flask, Django REST, Express |
Schema validation | Enforce JSON structure on AI output | Pydantic v2, Zod, Joi, JSON Schema |
LLM provider | Quiz generation from content | OpenAI GPT-4o-mini, Claude 3.5 Haiku, Gemini Flash |
PDF processor | Extract readable text from uploads | PyMuPDF, pdfplumber, pypdf |
YouTube processor | Fetch video transcript | youtube-transcript-api, AssemblyAI, Whisper |
Image processor | Encode images for Vision API | Pillow, base64 (stdlib), Sharp (Node) |
ORM | Map Python classes to DB tables | SQLAlchemy, Tortoise ORM, Prisma |
Database | Persist quiz history | SQLite, PostgreSQL, MongoDB |
Data Flow Walkthrough
User selects input mode and fills in the form in the React UI.
React builds a FormData object (or JSON body for text input) and sends a POST request to the appropriate FastAPI endpoint.
FastAPI receives the request and routes it to the matching processor (PDF → PyMuPDF, YouTube → youtube-transcript-api, Image → Pillow+base64, Text → pass-through).
The processor returns a plain-text string representing the source content.
The quiz generator service constructs a prompt: system prompt (defines JSON schema and quiz rules) + user prompt (extracted content + configuration options).
The service calls the OpenAI API with the constructed messages array.
GPT-4o-mini returns a completion. The service parses the response text as JSON.
Pydantic validates the parsed JSON against the quiz schema. If validation fails, the service retries once.
The validated quiz object is returned as the FastAPI JSON response.
React receives the quiz, renders it as an interactive card deck, and offers a "Save" button.
Clicking "Save" sends the quiz to the history save endpoint, which writes it to SQLite via SQLAlchemy.
Non-Obvious Design Decisions
Decision 1: One endpoint per input type, not a single unified endpoint. It is tempting to build one /generate endpoint that inspects the request and dispatches internally. The problem is that FastAPI's request parsing for file uploads (multipart FormData) conflicts with JSON body requests in a single function signature. Separate endpoints keep each input type's validation, processor, and error handling fully isolated and much easier to test.
Decision 2: Store quiz questions as a JSON text column in SQLite, not as normalised rows. A quiz has a variable number of questions, each with a variable number of answer options. Normalising this into a proper relational schema (quiz → questions → options) adds three joins and significant complexity. Since the entire quiz is always read and written as a unit, serialising the questions array as a JSON string in a single TEXT column is simpler, equally performant at this scale, and trivially reversible if you migrate to PostgreSQL later.
Tech Stack Recommendation
Stack A — Beginner / Weekend Prototype
This stack minimises setup friction. Every component runs locally with no cloud services required beyond the OpenAI API key.
Layer | Technology | Why |
Frontend | React 18 + Vite | Fast setup, no SSR complexity |
Styling | Tailwind CSS (CDN) | No build step needed initially |
Backend | FastAPI (Python) | Async, auto-docs, simple to learn |
Validation | Pydantic v2 | Bundled with FastAPI |
LLM | OpenAI GPT-4o-mini | Low cost, fast, reliable JSON output |
PyMuPDF | Best text fidelity, pip installable | |
YouTube | youtube-transcript-api | No API key needed |
Database | SQLite (file-based) | Zero config, ships with Python |
Estimated monthly cost (Stack A): ~$5–$15 in OpenAI API usage for moderate personal use. No hosting cost if run locally.
Stack B — Production-Ready
This stack is designed to serve multiple concurrent users, support authentication, and deploy reliably.
Layer | Technology | Why |
Frontend | React 18 + Vite + TypeScript | Type safety, better DX at scale |
Styling | Tailwind CSS + shadcn/ui | Consistent, accessible component library |
Backend | FastAPI + Uvicorn + Gunicorn | Production WSGI/ASGI server setup |
Validation | Pydantic v2 with strict mode | Catches edge cases in AI output |
LLM | OpenAI GPT-4o-mini + retry logic | Rate limit handling, fallback prompts |
PyMuPDF + chunking | Token-safe extraction for long docs | |
YouTube | youtube-transcript-api + caching | Avoid redundant API calls |
Database | PostgreSQL + SQLAlchemy | ACID compliance, concurrent writes |
Containerisation | Docker + Docker Compose | Reproducible deploys |
Hosting | Railway / Render / AWS Lightsail | $10–$25/month, easy CI/CD |
Estimated monthly cost (Stack B): $15–$40 (hosting) + $20–$80 (OpenAI, depending on volume) = $35–$120/month for a real user-facing app.
Implementation Phases
Phase 1: Backend Foundation — Text-to-Quiz
What you are building: The FastAPI project skeleton, the OpenAI integration, the quiz generator service, and the first working endpoint that accepts plain text and returns a structured quiz JSON.
Key technical decisions:
How to structure your system prompt so GPT-4o-mini reliably returns valid JSON every time (not Markdown, not prose)
Whether to use OpenAI's response_format: json_object parameter or enforce the schema through prompt engineering alone
What your Pydantic quiz schema looks like: question types, option arrays, correct answer indexing
How to configure CORS so your React frontend can call the API during local development
Getting GPT-4o-mini to produce consistent, parse-safe JSON across all three quiz types — without ever returning Markdown fences or extra commentary — is covered in detail in the full course with working, tested code.
Phase 2: Multi-Format Input — PDF, YouTube, and Image Processors
What you are building: Three dedicated processor modules (PDF, YouTube, Image) and their corresponding FastAPI endpoints. Each processor extracts text (or encodes the image) and passes the normalised content to the shared quiz generator service.
Key technical decisions:
How to handle PDFs that exceed the token limit (chunking strategy: first N pages, sliding window, or semantic chunking)
How to deal with YouTube videos that have no captions available or have auto-generated captions with poor punctuation
How to send images to GPT-4o-mini Vision: base64 encoding in the message content array, not as a file upload to OpenAI's Files API
Whether to validate uploaded file types on the FastAPI side before processing
Encoding images correctly for the Vision API — including aspect ratio normalisation and byte size limits — is covered in detail in the full course with working, tested code.
Phase 3: React Frontend — Quiz UI
What you are building: The complete React frontend: an input panel with tabs for each input mode, a FormData upload handler, a quiz result component that renders questions and answers, and basic client-side state management.
Key technical decisions:
How to structure React state across input mode, loading, and result views without a state management library
How to handle FormData file uploads in TypeScript with proper typing for File | null
How to render the three quiz types (multiple-choice, true/false, fill-in-the-blank) from a single question component
Whether to reveal correct answers immediately or only after the user submits all responses
Building the multi-mode input panel with proper TypeScript types and FormData handling is covered in detail in the full course with working, tested code.
Phase 4: Quiz History — Save, List, and Delete
What you are building: The SQLAlchemy model, SQLite database setup, and three history endpoints (POST /history, GET /history, DELETE /history/{id}). On the frontend, a "My Quizzes" view that loads saved quizzes and lets users revisit or delete them.
Key technical decisions:
How to serialise nested Pydantic quiz objects to a TEXT column in SQLite and deserialise them on read
How to keep the SQLAlchemy session lifecycle clean in an async FastAPI context
Whether to paginate the history list endpoint from the start or add it later
How to handle the delete confirmation flow in the React UI without a modal library
Persisting nested quiz data cleanly in SQLite and deserialising it back to TypeScript types on the frontend is covered in detail in the full course with working, tested code.
Phase 5: Polish, Error Handling, and Deployment
What you are building: Robust error handling (network errors, OpenAI timeouts, empty transcript responses), loading states, user feedback toasts, environment variable management, a Docker Compose setup, and a deployment walkthrough to a platform like Railway or Render.
Key technical decisions:
How to implement retry logic on the OpenAI call when the JSON response fails Pydantic validation
How to pass secrets (OPENAI_API_KEY, DATABASE_URL) through environment variables in Docker
Whether to serve the React build from FastAPI's static file mounting or deploy frontend and backend separately
How to set up health check endpoints so the hosting platform can monitor the service
Containerising the full stack with Docker Compose and deploying to a live URL is covered in detail in the full course with working, tested code.
Common Challenges
Building this system surfaces several non-obvious problems that are easy to overlook from the architecture diagram but painful in practice.
Challenge 1: GPT-4o-mini Returns Non-JSON Occasionally
Root cause: Even with explicit JSON instructions, the model sometimes wraps its output in Markdown code fences (```json) or adds a preamble sentence before the JSON object.
Fix: Strip Markdown fences with a regex before parsing. Implement a retry loop (max 2 attempts) that sends the raw output back to the model and asks it to return only the JSON with no additional text.
Challenge 2: Long PDFs Exceed Usable Token Budget
Root cause: PyMuPDF extracts all text verbatim, including headers, footers, and page numbers repeated on every page. A 40-page document often contains far more tokens than actually represent learning content.
Fix: Extract only the first N pages configurable by the user. Strip repeated header/footer patterns using a line-frequency heuristic. For advanced use, implement a sliding window that generates separate quiz batches per chunk and merges results.
Challenge 3: YouTube Videos with No Transcript Available
Root cause: Not all YouTube videos have captions. Auto-generated captions exist for most English content, but educational videos in other languages or with disabled captions will throw a TranscriptsDisabled exception.
Fix: Catch the exception and return a clear, user-friendly error message. Optionally, fall back to the video's description text as a partial content source. Display a hint in the UI explaining why no transcript was found.
Challenge 4: Image Encoding Fails for Large Files
Root cause: The OpenAI Vision API enforces a per-image size limit. JPEG and PNG files from modern smartphones can easily exceed 5 MB. Pillow will encode them without error, but the API will reject the request.
Fix: Resize images to a maximum dimension (e.g., 1024 × 1024 pixels) before base64 encoding. Validate file size on the FastAPI side and return a 400 error with guidance before the API call is made.
Challenge 5: Pydantic Schema Drift Between Backend and Frontend
Root cause: As you iterate, you may add a new field to the Pydantic quiz model (e.g., difficulty) without updating the TypeScript interface on the React side. The app silently renders incomplete data.
Fix: Maintain a shared schema definition. In a monorepo, consider generating TypeScript types from the Pydantic models using datamodel-code-generator or writing a small script that exports the schema as a JSON Schema file that TypeScript can consume.
Challenge 6: SQLAlchemy Session Conflicts in Async FastAPI
Root cause: SQLAlchemy's standard synchronous session is not safe to use directly in async FastAPI route handlers. Using Session directly in an async def function blocks the event loop.
Fix: Use databases or SQLAlchemy's async extension (AsyncSession with create_async_engine). Alternatively, use a thread pool executor to run synchronous DB operations in a separate thread without blocking the event loop.
Challenge 7: React State Desync on History Delete
Root cause: After deleting a quiz from the history list, some implementations only remove the item from the local state array by index. If the list was fetched once on mount and never re-synced, the index can desync from the server state after partial deletes.
Fix: Re-fetch the history list from the API after every delete operation rather than performing optimistic client-side removals. For a production app, implement optimistic updates with a rollback on API error.
Solving these issues took us over 40 hours of testing, debugging, and iterating across multiple builds — the course walks you through each fix with working, production-tested code.
Ready to Build This Yourself?
Understanding an architecture is a long way from shipping working software. The gap between "I understand how this works" and "I have a running app I can show people" is filled with environment configuration issues, model output quirks, TypeScript type errors, and deployment surprises — none of which appear in architecture diagrams.
The AI Quiz Generator course on Codersarts Labs bridges that gap for you.
What is included:
✅ Full, production-ready source code (Python + React + TypeScript)
✅ Step-by-step tutorials for every phase of the build
✅ Pydantic schemas, SQLAlchemy models, and TypeScript types — all kept in sync
✅ Tested OpenAI prompt templates for all three quiz types
✅ Docker + Docker Compose setup for local and cloud deployment
✅ Deployment walkthrough (Railway / Render, zero DevOps experience required)
✅ Lifetime access with all future updates included
✅ Community support and Q&A access with the Codersarts team
Pricing
Self-Paced Course — $30.00. Everything above, work through it at your own pace.
Want someone to build through it with you? Book a 1:1 Guided Session for $20/hour — a live session with the Codersarts team where we build the app together and answer your questions in real time.
Conclusion
The AI Quiz Generator is a full-stack application that chains together four input processors (text, PDF, YouTube, image), a shared GPT-4o-mini quiz generation service, Pydantic schema validation, SQLite persistence, and a React frontend — all wired together with FastAPI. The key architectural insight is that separating content extraction from quiz generation keeps each component testable and swappable without touching the rest of the system.
If you are starting fresh, use Stack A (FastAPI + React + SQLite + GPT-4o-mini, running locally) to get your first quiz generating in a weekend. Once you have proven the concept, graduating to Stack B with Docker, PostgreSQL, and a cloud deployment is a straightforward extension.
The complete course with all source code, video walkthroughs, and deployment guides is available at labs.codersarts.com. Start building today.



Comments