How to Build a Vernacular Loan Origination System with Sarvam-105B and Sarvam Vision
- May 13
- 13 min read

Introduction
Picture this: a small dairy farmer in Nagpur uploads his Aadhaar card — handwritten back-side details in Devanagari — along with six months of bank statements packed with narrations like UPI/PayTM/SAL CR JAN, EMI ICICI HOM, and ATM WDL KOTAK. Your backend passes it to a global OCR vendor and a frontier LLM. The OCR misreads half the Devanagari glyphs. The LLM hallucinates a salary figure. The application gets flagged for manual review. The farmer waits three weeks. He walks away.
This is the everyday reality for NBFCs and Small Finance Banks serving Tier-2 and Tier-3 India. English-only digital lending journeys, generic OCR pipelines, and global LLMs that were never trained on code-mixed Hindi-English bank narrations are quietly creating a two-tier credit system — one for urban, English-literate borrowers, and one for everyone else.
The Vernacular Loan Origination Stack — Powered by Sarvam AI solves this at every layer of the stack. It ingests Aadhaar, PAN, utility bills, and bank statements in Devanagari, Tamil, Bengali, and 8+ other Indian scripts, runs vernacular KYC and credit reasoning with Sarvam-105B, and delivers loan offers and Key Fact Statements in the borrower's own language.
Real-world use cases this architecture supports:
NBFCs underwriting unsecured personal loans for Bharat customers
Small Finance Banks running Kisan Credit Card and MSME working-capital loans
Two-wheeler and used-car financing at Tier-3 dealerships
Co-lending platforms partnering with rural Business Correspondents
Microfinance institutions running Joint Liability Group loans
LAP / secured lenders processing handwritten land records and chitta-adangal documents
This post walks you through the architecture, tech stack, and implementation phases of a production-grade vernacular lending system. It does not include full source code — that, along with Docker setup, video tutorials, tested configurations, RBI audit-trail templates, and a mock bureau API, is available in the full course on labs.codersarts.com.
📄 Before you dive in — grab the free PRD template that maps out this entire system: architecture, API spec, sprint plan, and system prompt. [Download the free PRD]
How It Works: Core Concept
At its heart, the vernacular loan origination system is a multi-modal, multilingual enrichment pipeline that transforms unstructured Indian-language documents into a structured underwriting bundle, then uses a large language model grounded in Indian linguistics to reason over that bundle.
Why the naive approach fails. The obvious path — point any cloud OCR at an Aadhaar card, feed the text to GPT-4, and ask for a credit decision — breaks in at least four places. First, global OCR vendors achieve poor accuracy on Devanagari handwriting and Tamil printed script at low DPI; field-level errors on name, address, and DOB directly compromise KYC. Second, bank-statement narrations from Indian PSU banks, UPI apps, and rural cooperative banks are code-mixed in ways that global LLMs have not been trained to parse reliably — they produce wrong salary figures, miss recurring EMIs, and miscount average-balance days. Third, routing raw Aadhaar data (including UID digits) to a non-sovereign LLM API raises DPDP compliance risk. Fourth, the RBI's digital-lending guidelines mandate disclosures and Key Fact Statements in the borrower's language — a requirement no English-first LLM pipeline satisfies natively.
How this architecture solves that. The solution chains specialist Sarvam models in a strict sequence: Sarvam Vision for script-aware OCR, SarvamParse for structured bank-statement extraction, Sarvam-105B (Indus) for bilingual credit reasoning, and Mayura for regulatory-grade translation. Each model operates on the specific problem it was designed for, with deterministic rule-based validation between every LLM step to catch hallucinations before they propagate.
Analogy. Think of it like a hospital triage system. A generalist doctor (GPT-4) can handle most complaints, but a specialist cardiologist (Sarvam-105B trained on Indian financial data) reads an ECG from a Tier-3 patient far more accurately — especially when the patient's notes are written in Tamil.
ASCII data-flow diagram:
INGESTION PHASE
───────────────
Borrower PWA (React)
│
▼
Document Upload (Aadhaar F+B, PAN, Utility Bill, Bank Stmt PDF)
│
├──► Sarvam Vision OCR ──► Field extraction JSON + confidence scores
│ │
│ Aadhaar masker (DPDP)
│ │
└──► SarvamParse ──────► Ledger JSON (credits, debits, EMIs, avg bal)
RUNTIME / UNDERWRITING PHASE
─────────────────────────────
FastAPI Orchestrator
│
├── Enriched borrower profile
├── Bureau pull (CRIF / Experian mock)
└── Structured underwriting prompt
│
▼
Sarvam-105B (Indus)
│
▼
Decision JSON {approve|refer|reject, amount, ROI, rationale}
│
├──► Mayura ──► Loan offer + KFS in borrower language
│
└──► Bulbul v3 ──► Audio KFS explainer
│
Borrower e-sign
│
Immutable audit log (PostgreSQL)
System Architecture Deep Dive
The vernacular loan origination system is built in five distinct layers, each with clear responsibilities and technology boundaries.
Architecture Layers
Frontend layer — A React 18 Progressive Web App serves both the borrower journey (vernacular UX, on-the-fly Devanagari/Tamil transliteration, audio explainers) and an operations portal for loan officers. Tailwind CSS ensures the PWA renders correctly on low-end Android devices common in Tier-3 markets. Vite handles the build pipeline for fast cold starts.
Backend orchestration layer — A FastAPI service written in Python 3.11+ is the central nervous system. It sequences API calls to each Sarvam service, enforces retry logic with exponential backoff, captures consent artefacts at every step, and writes immutable records to the audit log. All Aadhaar masking happens here before any data leaves the service boundary.
AI layer — Four Sarvam models operate in series: Vision (OCR), SarvamParse (bank statement structuring), Sarvam-105B (credit reasoning), and Mayura (translation). Bulbul v3 and Saaras v3 handle audio output and voice intake respectively. Structured-output enforcement (JSON schema validation) sits between each model call.
Data layer — PostgreSQL stores the loan-application record, consent artefacts, bureau pull results, the decision payload, and the immutable audit log. The audit table uses an append-only trigger so no row can be updated or deleted — a hard requirement under RBI's digital-lending framework.
External integrations — A mock credit-bureau API (CRIF/Experian format) is included in the course. In production, this is replaced with a live bureau connector. The Sarvam Transliteration API normalises names and addresses across scripts for bureau matching.
Component Table
Component | Role | Technology Options |
Borrower PWA | Vernacular loan journey + document upload | React 18 + Tailwind / Next.js / Vue 3 |
Ops portal | Loan officer review + manual referral | React 18 admin dashboard |
Backend API | Orchestration, consent, audit | FastAPI (Python) / Express (Node) |
Document OCR | Aadhaar / PAN / utility bill extraction | Sarvam Vision / Azure Form Recognizer (fallback) |
Bank-stmt parser | Structured ledger from PDF statements | SarvamParse / custom rule engine |
Credit reasoner | Underwriting decision + narrative | Sarvam-105B (Indus) / GPT-4o (fallback, non-KYC) |
Translator | Loan offer + KFS in borrower language | Mayura / Sarvam-Translate |
Audio explainer | Voice KFS in 11 Indian languages | Bulbul v3 |
Voice intake | Codemix borrower voice intake | Saaras v3 |
Database | Loan records, consent, audit log | PostgreSQL 15 / CockroachDB |
Data Flow Walkthrough
Borrower opens PWA — selects preferred language (Hindi, Tamil, Bengali, etc.); UI renders in that language via Mayura pre-translated strings.
Document upload — borrower photographs or uploads Aadhaar front + back, PAN card, utility bill, and 6-month bank statement PDF.
OCR pass — FastAPI calls Sarvam Vision for each identity document; response is field-level JSON with confidence scores. Low-confidence fields trigger a re-upload prompt.
Aadhaar masking — the orchestrator redacts the UID digits (first 8 of 12) from the OCR output before any downstream call; the unmasked version is stored only in the append-only audit log.
Bank-statement parsing — SarvamParse converts the PDF to a structured ledger: salary credits, UPI receipts, EMI debits, ATM withdrawals, and monthly average balance.
Enriched profile assembly — the orchestrator merges OCR fields, ledger summary, and a bureau pull into a single JSON payload.
Sarvam-105B underwriting call — the structured prompt (system + user) is sent to Sarvam-105B with JSON-schema output enforcement; response is {decision, recommended_amount, tenure, roi, rationale_citations}.
Dual-rail validation — a deterministic policy rule-engine cross-checks the LLM decision against hard credit policy (FOIR cap, minimum bureau score). Conflicts trigger a refer override.
Offer generation — Mayura translates the offer letter and Key Fact Statement into the borrower's language, preserving all numerical fields verbatim.
Audio explainer — Bulbul v3 converts the KFS to speech in the borrower's language; audio URL is surfaced in the PWA.
E-sign and audit commit — borrower e-signs; every step (consent, OCR, decision, KFS) is written to the immutable PostgreSQL audit table.
Non-Obvious Design Decisions
Decision 1: Dual-rail (rules + LLM) underwriting pipeline. Running Sarvam-105B alone for the credit decision creates hallucination risk on edge cases — a rare but catastrophic failure mode. A lightweight deterministic rule-engine runs in parallel and can override the LLM decision. This matters because an incorrect approve on a fraudulent application is a regulatory and financial liability, not just a UX bug.
Decision 2: Chunk-by-month bank statement summarisation. A 6-month bank statement for an active account easily exceeds 32K tokens. Rather than truncating, the pipeline chunks the statement by calendar month, generates a per-month summary (credits, debits, EMIs, balance), and feeds the roll-up table to Sarvam-105B. This keeps the context window manageable while preserving seasonal income patterns — critical for Kisan Credit Card and agricultural loan underwriting.
Tech Stack Recommendation
There are two sensible approaches depending on where you are in your build.
Stack A — Beginner / Prototype (Weekend Build)
This stack trades some production robustness for speed of assembly. You can have a working loan journey demo in 2–3 days.
Layer | Technology | Why |
Frontend | React 18 + Tailwind (Vite) | Fast setup, Tailwind's utility classes work well on mobile |
Backend | FastAPI (Python 3.11) | Auto-generates OpenAPI docs; easy Sarvam SDK integration |
OCR | Sarvam Vision (API) | No infra setup; pay-per-call |
Bank parser | SarvamParse (API) | Same — no custom model training |
LLM | Sarvam-105B (API) | Managed endpoint; no GPU provisioning |
Database | SQLite (local) | Zero setup for local dev; swap to Postgres before demo day |
Auth | Supabase Auth | Free tier, social login in minutes |
Estimated monthly cost (dev/demo): ~$30–60 (Sarvam API calls on light volume + SQLite = zero infra cost).
Stack B — Production-Ready
Designed for an NBFC or SFB deploying to real borrowers with RBI audit requirements.
Layer | Technology | Why |
Frontend | React 18 + Tailwind PWA | Installable on Android; offline-first for low-connectivity areas |
Backend | FastAPI + Celery + Redis | Async document processing; retry queues for API failures |
OCR | Sarvam Vision + perceptual hashing | Tamper detection + field-level confidence scoring |
Bank parser | SarvamParse + deterministic fallback | Rule-based arithmetic fallback for edge-case narrations |
LLM | Sarvam-105B + JSON schema enforcement | Structured output prevents hallucination propagation |
Translation | Mayura / Sarvam-Translate | DPDP-compliant; data stays in India |
Audio | Bulbul v3 | 11-language TTS; critical for low-literacy borrowers |
Database | PostgreSQL 15 + append-only audit trigger | RBI digital-lending compliance |
Auth | JWT + OTP (TOTP) | Aadhaar-linked OTP flow |
Infra | Docker + Kubernetes (GKE / on-prem) | Horizontal scaling for loan origination peaks |
Estimated monthly cost (production, 500 applications/day): ~$800–1,500 (Sarvam API calls + managed Postgres + GKE cluster on spot instances).
Want the production-grade source code, Docker Compose setup, and Kubernetes manifests? Get the full course on labs.codersarts.com →
Implementation Phases
Building this system end-to-end is a structured process. Here is how to approach it in five phases.
Phase 1: Regulatory Scoping and Data Modelling
Before writing a single line of code, map the regulatory requirements to your data model. RBI's digital-lending guidelines (2022, updated 2024) specify: which entities can disburse loans directly to borrowers, what must appear in the Key Fact Statement, how consent must be captured, and what audit records must be retained for how long. DPDP (Digital Personal Data Protection Act) adds consent requirements for Aadhaar data processing.
Key technical decisions in this phase: define your consent-artefact schema (who consented, to what, at what timestamp, with what IP and device fingerprint); design the append-only audit table (immutable insert trigger, cryptographic hash chaining); and decide how UID masking will be enforced at the application layer so it cannot be accidentally bypassed.
The consent-artefact framework and RBI audit-log schema — with working PostgreSQL trigger code and migration scripts — are covered in detail in the full course with working, tested code.
Phase 2: Document Ingestion Pipeline
Build the document-ingestion service that processes Aadhaar (front + back), PAN, utility bill, and bank statement. The Sarvam Vision integration requires prompt-tuned post-processing: raw OCR output needs field classification (name, DOB, address, UID), confidence-score filtering, and a re-upload prompt when scores fall below threshold.
Key technical decisions: how to handle multi-page PDFs with mixed orientations; how to implement perceptual hashing for tamper detection; how to chunk long bank statements by month for SarvamParse; and how to unify name and address fields across documents where the same person's name may appear in three different scripts.
The Devanagari handwriting post-processing pipeline and field-level confidence scoring implementation are covered in detail in the full course with working, tested code.
Phase 3: Underwriting Engine
This is the most complex phase. The orchestrator assembles the enriched borrower profile from OCR outputs + SarvamParse ledger + bureau pull, constructs the structured underwriting prompt, calls Sarvam-105B with JSON-schema output enforcement, and runs the dual-rail validation.
Key technical decisions: how to design the system prompt so Sarvam-105B cites policy clauses in its rationale JSON (critical for audit trails); how to structure the per-month bank-statement roll-up table for the model context; how to implement the deterministic rule-engine for FOIR cap and minimum bureau score enforcement; and how to handle the refer case (queue for manual underwriter review in the ops portal).
The dual-rail credit-decision pipeline — including the structured prompt template, JSON schema enforcement, and rule-engine integration — is covered in detail in the full course with working, tested code.
Phase 4: Vernacular Offer Generation and Borrower UX
With a decision payload in hand, the backend calls Mayura to translate the loan offer letter and Key Fact Statement. This step requires special handling: Mayura must receive the numerical fields (ROI percentage, EMI amount, processing fee) as isolated tokens that should not be translated — only the surrounding prose gets translated. A pre-processing step tags these fields before the Mayura call; a post-processing step validates that all numbers survived unchanged.
The React PWA at this point needs to surface the KFS, the audio explainer (Bulbul), and the e-sign flow. Transliteration of the borrower's name across scripts (for the offer letter header) uses the Sarvam Transliteration API.
The Mayura number-preservation pre/post-processing pattern and the Bulbul audio integration are covered in detail in the full course with working, tested code.
Phase 5: Deployment and Sandbox Scaling
Package the FastAPI backend, Celery worker, and PostgreSQL database in Docker Compose for local runs. For sandbox-grade deployment, the course walks through a Kubernetes manifest with horizontal pod autoscaling, a managed Cloud SQL Postgres instance, and environment variable management for Sarvam API keys.
Key technical decisions: how to handle Sarvam API rate limits gracefully (exponential backoff + dead-letter queue); how to run smoke tests against the mock bureau API before switching to a live bureau connector; and how to implement health-check endpoints for the ops team.
The Docker Compose setup, Kubernetes manifests, and mock bureau API integration are covered in detail in the full course with working, tested code.
Common Challenges
Building this stack exposes several non-obvious failure modes that will cost you days if you hit them cold.
1. Devanagari handwriting OCR errors on Aadhaar back-side Root cause: The back-side of physical Aadhaar cards is often handwritten in regional scripts by municipal clerks, at variable DPI when photographed on a mobile camera. Global OCR models were not trained on this distribution. Fix: Use Sarvam Vision with prompt-tuned field extraction; implement a confidence-score gate (reject fields below 0.80 confidence); add a client-side image quality check (blur detection via variance of Laplacian) before upload to save a round-trip.
2. Code-mixed bank narration misclassification Root cause: Narrations like UPI/PayTM/SAL CR JAN mix English acronyms, Hindi abbreviations, and month codes in a single string. Standard NLP classifiers and global LLMs produce inconsistent category assignments. Fix: Use SarvamParse as the primary parser; layer a deterministic rule-based fallback (regex pattern library for common NEFT/UPI/ATM/EMI patterns) for narrations SarvamParse returns with low confidence.
3. Aadhaar UID leaking into LLM prompts Root cause: If the OCR output is passed directly to Sarvam-105B without masking, the UID ends up in LLM logs. Under DPDP, this is a data-handling violation. Fix: Enforce masking in the orchestrator as a mandatory middleware step — not optional, not caller-controlled. Write a unit test that asserts no 12-digit numeric sequence appears in any outgoing Sarvam-105B payload.
4. Hallucinated salary in credit decision Root cause: When monthly credits are sparse or irregular (common for agricultural and seasonal workers), Sarvam-105B may extrapolate an annualised income that overstates the borrower's capacity. Fix: Dual-rail validation — the rule engine re-computes average monthly credit independently from the structured ledger and flags any LLM-stated income that differs by more than 15%.
5. Mayura translating numerical fields Root cause: Mayura is a translation model; without explicit instructions, it may transliterate "18.5% p.a." into a regional-script string that renders ambiguously on some devices. Fix: Pre-process the KFS template to wrap all numerical tokens in a no-translate tag (<nt>18.5% p.a.</nt>); strip tags after Mayura returns the translation; validate all original numbers are present in the output via regex.
6. Bank statement exceeding context window Root cause: A 6-month statement for an active salaried account can have 400–600 transactions, easily exceeding 32K tokens when formatted as text. Fix: Chunk by calendar month; generate a per-month summary table (total credits, total debits, identified EMIs, end-of-month balance); pass only the summary table (typically under 2K tokens) to Sarvam-105B.
7. Append-only audit table performance under load Root cause: A Postgres trigger that prevents UPDATE and DELETE on the audit table adds latency to every write, and a naive implementation can create lock contention at high origination volumes. Fix: Use a separate audit schema with UNLOGGED tables for write speed; flush to WAL-backed permanent storage asynchronously via a Celery task; add a composite index on (application_id, event_type, created_at).
Solving these issues took us over 80 hours of testing across different document types, Indian bank formats, and regional script variations — the full course on labs.codersarts.com walks you through each fix with working code.
Ready to Build This Yourself?
Understanding an architecture is one thing. Shipping production-grade, RBI-compliant lending software that actually handles Devanagari handwriting, code-mixed bank narrations, and 11-language KFS generation is another challenge entirely.
The gap between "I understand the design" and "I have a working, deployable system" is where most fintech builds stall. That is exactly what the Vernacular Loan Origination Stack course on labs.codersarts.com is designed to close.
Here is what you get:
✅ Full, production-quality source code (FastAPI backend + React PWA + PostgreSQL schema)
✅ 10+ video tutorial modules walking through every implementation phase
✅ Docker Compose setup — one command to run the entire stack locally
✅ Tested Sarvam Vision, SarvamParse, and Sarvam-105B prompt configurations
✅ Sample KYC document set (Aadhaar, PAN, utility bill) for testing
✅ RBI digital-lending audit-trail templates and PostgreSQL trigger code
✅ Mock bureau API (CRIF/Experian format) for sandbox development
✅ Deployment walkthrough for Docker, Kubernetes, and sandbox-grade scaling
✅ Lifetime access and free updates as Sarvam APIs evolve
✅ Community support from Codersarts Labs engineering team
$29. Everything above.
Already building at scale and need integration help with your bureau provider or core-banking system? The 1:1 Guided Session at $99 includes a live architecture walkthrough, custom credit-policy plug-ins, deployment review, and direct integration assistance.
Conclusion
The Vernacular Loan Origination Stack chains Sarvam Vision for script-aware document OCR, SarvamParse for structured bank-statement extraction, Sarvam-105B for bilingual credit reasoning, and Mayura + Bulbul for regulatory-compliant vernacular offer delivery — all orchestrated through a FastAPI backend with PostgreSQL-backed immutable audit logs. The result is a digital lending workflow that serves Tier-2 and Tier-3 borrowers in their own language, with KYC accuracy and underwriting quality that generic stacks cannot match.
If you are starting from scratch, begin with Stack A: FastAPI + Sarvam APIs + SQLite, and get a working borrower journey running locally in a weekend. Then layer in the production hardening — Celery queues, append-only audit logs, Kubernetes — from the Stack B blueprint.
The fastest path from blueprint to deployed system is the full course on labs.codersarts.com — source code, video walkthroughs, and tested configurations included.



Comments