How to Build a Multilingual Insurance Claims Triage System with Sarvam Vision and Sarvam-105B
- May 13
- 12 min read

Introduction
Every day, a claims processor at a mid-sized Indian health insurer opens a WhatsApp message and finds a photograph of a doctor's prescription — written in looping Devanagari on a torn notepad, half-smudged, with Latin drug abbreviations mixed in. She types what she can read into a legacy system, guesses at the rest, and moves to the next claim. Multiply that across millions of motor and health claims annually, in Hindi, Marathi, Tamil, Bengali, Gujarati, and seven other regional languages, and you begin to understand why Indian insurers spend ₹150–₹400 processing each claim manually — and why their loss ratios and policyholder NPS scores both suffer.
The Multilingual Insurance Claims Triage System — Powered by Sarvam AI is an end-to-end claims intake and triage platform that ingests handwritten doctor notes, hospital bills, FIRs, and policyholder declarations in 11 Indian languages, structures them with Sarvam Vision and SarvamParse, and routes them through Sarvam-105B for ICD-10 coding, completeness scoring, and fraud-signal detection.
Real-world use cases this architecture covers:
Health insurers triaging cashless and reimbursement claims with hospital bills and discharge summaries
Motor insurers processing accident claims with FIRs in regional languages and handwritten panchnamas
TPAs (Medi Assist, Vidal Health, Health India) automating bill scrutiny and pre-auth approvals
Personal-accident and group-health claims with bulk hospital records uploaded by employers
Travel-insurance claims involving mixed-language medical documents from cross-border treatment
Fraud-control units running offline batch reviews of suspected claims
This blog covers the architecture, technology stack, and implementation phases of this system. It does not include full production source code — that is available in the complete course on labs.codersarts.com.
📄 Before you dive in — grab the free PRD template that maps out this entire system: architecture, API spec, sprint plan, and system prompt. [Download the free PRD]
How It Works: Core Concept
The Problem with the Naive Approach
The obvious solution is: OCR the document, feed the text to an LLM, ask for ICD-10 codes. This fails in practice for three compounding reasons.
Generic OCR cannot handle Indian handwriting. Tesseract and most cloud OCR engines were trained predominantly on printed Latin text. Handwritten Devanagari — where cursive joining, half-consonants, and diacritics vary dramatically between writers — produces character error rates above 30% even on clean scans. Tamil script presents similar challenges. A claims document with 30% character errors produces ICD-10 codes that are clinically meaningless.
Global LLMs cannot map regional medical terms reliably. A doctor in Maharashtra may write "मधुमेह" (diabetes), "BP" (hypertension), or a colloquial abbreviation that has no standard English equivalent. GPT-4 and similar English-centric models are not fine-tuned on Indian clinical vocabulary, and their ICD-10 suggestions carry no confidence scores, making them unsuitable for regulated insurance decisions.
LLMs cannot be trusted for arithmetic. Hospital bills are legal documents. The sum of every line item must equal the billed total. Any model that hallucinates a line-item amount creates a compliance liability. The architecture must layer deterministic arithmetic verification on top of any LLM output.
The Architecture's Solution
The system decomposes the problem into specialist components, each doing what it is best at:
INGESTION
─────────────────────────────────────────────────────────────────────
WhatsApp ──┐
Web Upload ┼──► Document Store ──► Sarvam Vision (OCR)
Email ─────┘ │
▼
SarvamParse
(Bill / FIR → JSON)
│
┌────────────┴────────────┐
▼ ▼
Arithmetic Rule Engine Cross-doc Consistency
(line-item sums) (date / party checks)
└────────────┬────────────┘
▼
TRIAGE
─────────────────────────────────────────────────────────────────────
Claim Bundle
│
▼
Sarvam-105B
┌────────────┴────────────┐
▼ ▼ ▼
ICD-10 Completeness Fraud Signals
Coding Score (flags + reason)
└────────────┬────────────┘
▼
Recommended Path
(Auto-Approve / Refer / Reject)
│
OUTPUT
─────────────────────────────────────────────────────────────────────
┌────────────┴────────────┐
▼ ▼
Mayura Translation Bulbul TTS
(Vernacular → English) (IVR / WhatsApp voice)
│
▼
React Claims Cockpit
+ Immutable Audit LogThink of it as an assembly line in a factory. Each station adds one specific kind of value. The OCR station reads the document. The parsing station structures it. The arithmetic station verifies the numbers. The reasoning station interprets the medical content. No single station is asked to do another's job.
System Architecture Deep Dive
Architecture Overview
The system is organised into five layers, each with a clear responsibility boundary.
Frontend layer — A React 18 + Tailwind claims officer cockpit that presents triage decisions, anomaly flags, translated document summaries, and the ability to override or approve a recommendation. Built with Vite for fast iteration.
Orchestration layer — A FastAPI service implements a state machine that tracks each claim through its lifecycle (received → documents_ingested → structured → triaged → decided → closed). It manages asynchronous document processing via Redis + Celery workers and exposes a REST + webhook API for downstream core-claims system integration.
AI / NLP layer — Sarvam Vision for multilingual OCR, SarvamParse for bill and FIR structuring, Sarvam-105B (Indus) for medical reasoning and fraud detection, Mayura/Sarvam-Translate for vernacular-to-English translation, Saaras v3 for voice-claim intake, and Bulbul v3 for vernacular IVR responses.
Data layer — PostgreSQL stores the claims state machine, structured claim bundles, ICD-10 assignments, fraud flags, and the immutable IRDAI audit log. PII fields are stored encrypted and redacted before being sent to any LLM.
Integration layer — WhatsApp Business API for document collection, outbound SMS/voice for policyholder acknowledgements, and REST webhooks to push normalised English summaries to legacy core-claims systems.
Component Table
Component | Role | Technology Options |
Document Intake | Receive docs from WhatsApp, web, email | WhatsApp Business API, Twilio, AWS S3 multipart |
OCR Engine | Handwritten / scanned Indian-language document recognition | Sarvam Vision (primary), Google Vision (fallback) |
Document Parser | Convert bills and FIRs to structured JSON | SarvamParse, custom prompt chains on Sarvam-105B |
Arithmetic Guard | Verify bill line-item sums, catch discrepancies | Python decimal arithmetic (deterministic, no LLM) |
Claims Reasoner | ICD-10 coding, completeness score, fraud signals | Sarvam-105B (Indus), rule-based pre/post-filters |
Translation Layer | Vernacular → English for claims officer | Mayura, Sarvam-Translate |
Voice Interface | Voice-claim intake, IVR acknowledgements | Saaras v3 (ASR), Bulbul v3 (TTS) |
Orchestration API | State machine, async queue, webhook routing | FastAPI, Celery, Redis |
Claims Database | State, audit log, structured claim bundles | PostgreSQL (primary), Redis (session cache) |
Frontend Cockpit | Claims officer UI, triage review, override | React 18, Tailwind, Vite |
Data Flow Walkthrough
Claim initiation — Policyholder sends documents via WhatsApp or uploads through the web portal. An email listener also polls a dedicated claims mailbox.
Document ingestion — The FastAPI orchestrator receives the document, assigns a claim ID, persists it to object storage, and pushes a job to the Celery queue. Claim status transitions to documents_received.
OCR — A Celery worker sends each document image to Sarvam Vision. The API returns full text with script detection, confidence per word, and bounding boxes. Low-confidence regions are flagged for manual review.
Structuring — Hospital bills and FIRs are sent to SarvamParse, which returns structured JSON: line items (description, quantity, unit price, total) for bills; incident metadata (date, location, parties, IPC sections) for FIRs. Discharge summaries go through Sarvam-105B with a structured extraction prompt.
Arithmetic verification — A deterministic Python rule engine sums all bill line items and compares against the billed total. Discrepancies beyond tolerance are flagged as BILL_SUM_MISMATCH.
Cross-document consistency — Dates of admission, discharge, and accident are cross-checked across all documents. Party name mismatches between the FIR and claim form are flagged.
Triage prompt assembly — The orchestrator assembles a claim bundle: structured bill, discharge summary, ICD terms extracted, fraud flags from rules, and claim form metadata. PII fields are replaced with tokens before this bundle reaches the LLM.
Sarvam-105B triage — The model receives the bundle and returns: ICD-10 codes with confidence, a completeness score (0–100), fraud signals with reasoning, and a recommended path.
Translation — Mayura translates vernacular content in the claim into English for the claims officer cockpit.
Decision and notification — The claims officer reviews and approves/overrides. Bulbul TTS generates a vernacular voice acknowledgement for the policyholder. The full chain is written immutably to the audit log.
Non-Obvious Design Decisions
Decision 1 — Deterministic arithmetic guards before the LLM. It is tempting to ask Sarvam-105B to "verify the bill totals." Don't. LLMs can hallucinate arithmetic even when the numbers are right there in the context. Every fraud signal that depends on a numerical comparison must be computed by Python decimal arithmetic and passed as a fact to the model, not inferred by it. This is what makes the system defensible in an IRDAI dispute.
Decision 2 — PII tokenisation before LLM context. Aadhaar numbers, mobile numbers, and policy numbers must be stripped from the prompt context. A dedicated PII redaction step replaces sensitive fields with stable tokens (e.g., AADHAAR_001) before the claim bundle is sent to Sarvam-105B. The original values are retained in the encrypted PostgreSQL audit ledger, where they can be restored post-decision. This satisfies both DPDP (Digital Personal Data Protection Act) requirements and the practical concern of not sending sensitive citizen data to an external inference endpoint.
Tech Stack Recommendation
Stack A — Beginner / Prototype (Build in a Weekend)
Layer | Technology | Why |
Frontend | React 18 + Vite + Tailwind | Fastest path to a working cockpit |
Backend API | FastAPI (Python 3.11) | Async, auto-docs, easy integration |
Task Queue | Celery + Redis (Docker) | Simple local async without Kubernetes |
OCR | Sarvam Vision API | No model hosting required |
Parsing | SarvamParse API | REST call, returns structured JSON |
LLM Reasoning | Sarvam-105B API | Managed inference, pay-per-call |
Database | PostgreSQL (Docker) | Single container, simple schema |
Deployment | Docker Compose | One docker-compose up to start everything |
Estimated monthly cost (dev/test, <500 claims): API calls $20–40, Docker on a local machine or $10–15 cloud VM = $30–55/month.
Stack B — Production-Ready (Designed to Scale)
Layer | Technology | Why |
Frontend | React 18 + Vite + Tailwind + React Query | Optimistic UI, cache invalidation |
Backend API | FastAPI + Gunicorn + NGINX | Multi-worker, production WSGI |
Task Queue | Celery + Redis Cluster | Horizontal worker scaling |
OCR | Sarvam Vision API + async batch endpoint | Throughput optimised |
Parsing | SarvamParse API | Same, but batched per claim |
LLM Reasoning | Sarvam-105B (Indus) via private API endpoint | Data residency, throughput SLA |
PII Redaction | Custom presidio-based service | Configurable, auditable |
Database | PostgreSQL (RDS / Cloud SQL) | Managed, PITR, read replicas |
Object Storage | S3 / GCS for document images | Durable, lifecycle policies |
Audit Log | Append-only PostgreSQL table + WAL archiving | IRDAI-grade immutability |
Deployment | Docker + Kubernetes (EKS / GKE) | Auto-scaling, rolling deploys |
Estimated monthly cost (production, 50,000 claims/month): Sarvam API calls ~$150–300, RDS ~$80, EKS/GKE ~$120, object storage $10 = $360–510/month, well below the ₹150–400/claim manual cost.
Implementation Phases
Phase 1: Document Intake Pipeline
Build the multi-channel ingestion layer: WhatsApp Business API integration, web file upload endpoint, and email polling. At this stage, all incoming documents are stored to object storage and a claim record is created in PostgreSQL with status documents_received. The key technical decisions here are: how to handle WhatsApp webhook verification, how to manage file deduplication (same document sent twice), and what metadata to capture at intake (timestamp, channel, policyholder identifier, document type hint). You will also set up the Celery + Redis async queue so that all downstream processing is non-blocking — critical for WhatsApp, which has a strict acknowledgement timeout.
How to correctly configure the WhatsApp Business API webhook, handle media downloads, and set up idempotent claim creation is covered in detail in the full course with working, tested code.
Phase 2: Multilingual OCR and Document Structuring
Integrate Sarvam Vision for OCR across all document types and SarvamParse for structured extraction of hospital bills and FIRs. The central challenge in this phase is building the document-type classifier that routes each file — prescription, bill, discharge summary, FIR, or policyholder declaration — to the correct Sarvam API call with the right prompt configuration. You will also implement the confidence-threshold logic: pages where Sarvam Vision returns confidence below a configurable threshold (default 0.70) are flagged for manual review and do not block triage. A second key decision is how to chunk long discharge summaries that exceed the SarvamParse context window — the course covers a section-level summarisation approach.
The exact prompt templates for SarvamParse hospital-bill extraction, including handling of GST lines, package procedures, and consumables, are provided in the full course with working, tested code.
Phase 3: Arithmetic Guards and Cross-Document Consistency
Before any claim bundle reaches Sarvam-105B, it must pass through the deterministic rule engine. Build the arithmetic verifier (Python decimal module, not floats) that sums all bill line items and compares against the billed total. Then build the cross-document consistency checker: extract all date fields from all documents, normalise them to ISO-8601, and flag any date that is outside the tolerance window (default: ±1 day for admission dates). Similarly, extract party names and compare across the FIR and claim form using fuzzy matching. The output of this phase is a list of structured fraud signals (e.g., BILL_SUM_MISMATCH, DATE_INCONSISTENCY, NAME_MISMATCH) that are passed as facts to Sarvam-105B in the next phase.
The fraud-signal data model, rule configuration file format, and the fuzzy name-matching approach for Indian names in multiple scripts is covered in detail in the full course with working, tested code.
Phase 4: Sarvam-105B Triage and Claims Cockpit
Assemble the claim bundle and build the triage prompt for Sarvam-105B. The model receives: structured bill JSON, extracted ICD candidate terms from the discharge summary, the list of deterministic fraud signals, a completeness checklist (required documents vs received), and the policy type. It returns ICD-10 codes with confidence, a completeness score, LLM-generated fraud-signal reasoning, and a recommended path. Build the React claims officer cockpit: a triage queue view, a claim detail panel showing all translated documents, fraud flags with reasoning, and ICD codes, and an override interface. Every override must be logged with reason.
The full Sarvam-105B triage prompt template — including the ICD-10 medical glossary injection, the fraud-reasoning chain-of-thought instruction, and the structured JSON output schema — is provided in the full course with working, tested code.
Phase 5: Vernacular Acknowledgements, Audit Log, and Deployment
Use Mayura/Sarvam-Translate to generate translated claim summaries for the cockpit, and Bulbul v3 to synthesise vernacular voice acknowledgements delivered via IVR or WhatsApp audio. Implement the IRDAI-grade immutable audit log: every state transition, every LLM call (prompt + response), every override, and every PII redaction event is written to an append-only table with a cryptographic hash chain. Finally, package everything with Docker Compose for local deployment and prepare the Kubernetes manifests for production. The course covers PII re-association (restoring tokenised values post-decision for the audit record) and the DPDP-compliant data retention and deletion workflow.
The complete Docker Compose setup, environment variable schema, Kubernetes resource templates, and IRDAI audit log schema are provided in the full course with working, tested code.
Common Challenges
Building this system will surface problems that no tutorial prepares you for. Here are the seven most costly ones.
1. Handwritten Devanagari OCR on low-quality photos Root cause: Mobile camera images from WhatsApp are compressed, skewed, and often taken in poor lighting. Sarvam Vision performs best on upright, high-contrast images. Fix: Add a pre-processing pipeline using OpenCV: deskew, denoise, enhance contrast, and upscale to a minimum 300 DPI equivalent before sending to the Sarvam Vision API. Run a confidence check post-OCR and return a "please resend a clearer photo" response to the policyholder if confidence is too low.
2. ICD-10 mapping confidence calibration Root cause: Sarvam-105B returns high-confidence ICD-10 codes even when the input term is ambiguous (e.g., "chest pain" could be R07, I20, or I21). Without confidence thresholds, the system over-approves claims. Fix: Configure a minimum confidence threshold (default 0.85) below which the claim is routed to REFER regardless of other signals. Maintain a curated override mapping for the 50 most common regional medical terms in your policyholder base.
3. Hospital bill arithmetic — floating-point errors Root cause: Python's native float type produces rounding errors on rupee amounts with paise. 0.1 + 0.2 != 0.3 in floating-point. Fix: Use from decimal import Decimal for all bill arithmetic. Define a tolerance of ±₹1 for total comparisons to account for rounding in the original bill.
4. Long discharge summaries exceeding context window Root cause: ICU discharge summaries from tertiary hospitals routinely exceed 4,000 tokens. Sending the full text to Sarvam-105B in one call risks truncation. Fix: Implement a chunk-and-rollup strategy: split the summary by section (presenting complaint, diagnosis, treatment, discharge advice), summarise each section independently, and send only the structured summaries in the triage bundle.
5. PII re-association after LLM call Root cause: Tokens like AADHAAR_001 are sent to the LLM, but the audit log must record the original values for compliance. If the token map is lost, PII is irrecoverable. Fix: Store the token map in PostgreSQL, encrypted with a per-claim key, with a retention policy matching the DPDP data lifecycle. Re-associate before writing the final audit record.
6. Fraud signal calibration — the precision/recall tradeoff Root cause: Overly sensitive fraud rules block legitimate claims (NPS impact). Overly lax rules miss fraud (loss-ratio impact). There is no universal threshold. Fix: Start with a conservative recall-optimised configuration (catch everything, manual review for REFER cases). Tune precision upward monthly using the outcomes of manually reviewed claims as labelled ground truth.
7. Cross-script name fuzzy matching Root cause: A policyholder's name may appear in Devanagari on the discharge summary and in Latin script on the FIR. Standard fuzzy matching (Levenshtein) fails across scripts. Fix: Transliterate all names to a canonical Latin form using the Sarvam-Translate API before fuzzy comparison. Use rapidfuzz with a token-sort ratio of ≥ 85 as the match threshold.
Solving these issues took us over 60 hours of testing across real motor and health claim documents. The course walks you through each fix with working code.
Ready to Build This Yourself?
Understanding the architecture is the first step. Shipping working, tested code that handles real Indian insurance documents — with IRDAI compliance, PII redaction, and a production-ready triage pipeline — is a different undertaking entirely.
The Multilingual Insurance Claims Triage System course on labs.codersarts.com gives you everything you need to go from architecture to deployment:
✅ Full source code — FastAPI orchestrator, Celery workers, React cockpit, rule engine
✅ Video tutorials walking through every phase step by step
✅ Sample claim documents — motor and health (anonymised), in 5 regional languages
✅ ICD-10 mapping data and curated Indian medical term glossary
✅ Fraud-rule playbook with configurable thresholds
✅ Docker Compose setup — one command to run the full stack locally
✅ Tested API prompt templates for Sarvam Vision, SarvamParse, and Sarvam-105B
✅ IRDAI audit log schema and PII redaction implementation
✅ Deployment walkthrough — Docker to Kubernetes
✅ Lifetime access and free updates as the Sarvam API evolves
✅ Community support via the Codersarts Discord
$79. Everything above.
Need custom fraud rules, or want to design the integration with your TPA's core-claims system? Book a 1:1 Guided Session at $299 — a live walkthrough tailored to your specific insurer or TPA environment.
Conclusion
The multilingual insurance claims triage system described here solves a genuinely hard problem — Indian-language handwritten document understanding, ICD-10 coding, and fraud detection at scale — by assembling specialist components (Sarvam Vision, SarvamParse, Sarvam-105B, deterministic rule engines) into a clean, auditable pipeline rather than asking a single LLM to do everything. The architecture is IRDAI and DPDP compliant by design, and the deterministic arithmetic layer ensures that the system is defensible in any dispute.
If you are starting fresh, begin with Stack A: Docker Compose, the Sarvam managed APIs, and a single FastAPI service. Get to a working triage on five real claim documents before adding the production infrastructure. That working proof of concept is exactly what the full course will help you build — with tested code, sample documents, and an ICD-10 glossary ready to run.



Comments