How to Build a Vernacular E-commerce Catalog Localization Engine with Mayura and Sarvam-30B
- May 13
- 13 min read

Introduction
You've just been handed a catalog of two million SKUs and a deadline: launch vernacular UX in Hindi, Tamil, Telugu, Kannada, and seven more Indian languages before the next festive sale season. Your first instinct is to send it all to Google Translate. Three days later, QA flags that "BoAt" is now appearing as "नाव" (the Hindi word for boat), "500 g" has become "500 ग्राम" on some items and "500gms" on others, half the ₹ prices have dropped their currency symbol, and several SKU codes have been silently corrupted. Sound familiar?
Generic machine translation was never designed for e-commerce catalogs. It doesn't know that "BoAt" is a brand, that "32 GB" must never be transliterated, or that a disappointed-review tone in English should carry the same emotional weight in Malayalam. The result is either a frozen English catalog (losing the Bharat buyer) or an expensive human translation team at ₹2–₹5 per SKU—which simply doesn't scale to millions of products.
The Vernacular E-commerce Catalog & Review Localization Engine — Powered by Sarvam AI solves this by combining Sarvam's Mayura translation model, Sarvam-30B, the Transliteration API, Doc Translate, and Sarvam Vision into a single, high-throughput pipeline with brand-aware glossary management built in.
Real-world use cases this engine powers:
Horizontal marketplaces (Meesho, Flipkart, Amazon India) launching vernacular catalogues at scale
D2C brands rolling out 11-language storefronts on Shopify or Magento
Travel and hospitality platforms (MakeMyTrip, OYO) translating property descriptions and user reviews
Marketplaces surfacing translated reviews in the buyer's language to lift conversion
Manufacturers translating product manuals, warranty cards, and spec sheets via Doc Translate
Quick-commerce apps (Blinkit, Zepto) serving vernacular FMCG listings in Tier 2 and Tier 3 cities
This post walks through the full system architecture, recommended tech stacks, and a phase-by-phase implementation plan. It does not include full source code—that lives in the course on labs.codersarts.com.
📄 Before you dive in — grab the free PRD template that maps out this entire system: architecture, API spec, sprint plan, and system prompt. [Download the free PRD]
How It Works: Core Concept
The Naive Approach and Why It Fails
The obvious path is to call a generic translation API for every field in every product record. This fails for catalog-scale e-commerce for three interconnected reasons:
Brand names are not vocabulary words. "BoAt", "Lakmé", "Tata Salt", and "Amul" are trademarks. A language model trained on general text will try to translate them—because they look like common nouns in context—producing nonsense or, worse, a competitor's brand name.
Units, prices, and specs are structured data, not prose. "500 g", "₹999", "4G LTE", and "32 GB" must survive translation byte-for-byte. Any mutation breaks filtering, comparison engines, and regulatory compliance.
Throughput and cost math doesn't work. At 1 million SKUs × 5 translatable fields × 11 target languages = 55 million translation calls. GPT-4 at $0.01/1K tokens burns $550K+ on a single catalog sync. You need an Indian-language-tuned model with a fraction of that token cost and a caching layer that eliminates re-translating unchanged fields.
How the Glossary-Freeze Architecture Solves It
The key insight is to treat translation as a two-pass operation: freeze first, translate second, restore third.
Think of it like filing passport photos: you hand the photographer your original document, they photocopy the photo section and give it back to you separately, take the rest for processing, and reattach the original photo when they're done. The translator never touches the photo.
In the pipeline:
A glossary store (PostgreSQL) holds all brand names, unit patterns, price formats, and SKU masks for the catalog.
Before translation, the freeze service scans each field and replaces every protected token with a reversible placeholder: BoAt → {{BRAND_0}}, 500 g → {{UNIT_1}}.
The frozen text goes to Mayura (Sarvam Translate) for translation into the target language.
The restore service swaps placeholders back to their original values.
Long product descriptions and review text optionally pass through Sarvam-30B for tone-smoothing and review summarisation.
ASCII Data-Flow Diagram
[Product Feed / PIM Webhook]
|
v
[Ingestion Service] ──► CSV / JSON / Webhook normaliser
|
v
[Glossary Freeze] ──► Brand names, units, SKU codes, prices → {{PLACEHOLDERs}}
|
v
[Mayura / Sarvam-Translate] ──► Batch translate (title, desc, attributes)
|
v
[Sarvam-30B] ──► (optional) Tone-smooth long descriptions, review translate
|
v
[Token Restore] ──► {{BRAND_0}} → BoAt, {{UNIT_1}} → 500 g
|
v
[Quality Engine] ──► BLEU + LLM-judge + glossary-violation checker
|
Pass ┤ Fail
| |
v v
[Publish] [Editor Cockpit] ──► Human review queue (React UI)System Architecture Deep Dive
Architecture Overview
The system is composed of five layers, each with a clear responsibility boundary.
Frontend layer: A React 18 + Tailwind editor cockpit that lets catalog managers review flagged translations, approve them, or edit and re-submit. This is the only human-in-the-loop surface.
Backend / orchestration layer: A FastAPI application that exposes REST endpoints for the catalog manager UI and drives the pipeline. It pushes jobs onto Kafka (or SQS for AWS-native deployments) and reads results from the quality engine.
AI layer: The Sarvam API surface—Mayura for title and description translation, Sarvam-30B for review summarisation and tone-aware translation, the Transliteration API for script-level brand-name handling, Doc Translate for PDFs, and Sarvam Vision for image label OCR plus retranslation.
Data layer: PostgreSQL holds the glossary, audit log, quality scores, and translated catalog state. Redis is the translation cache, keyed on a hash of the source field content so cache hits are field-level, not product-level (a price change does not invalidate the description translation).
External integrations: PIM systems (Akeneo, Salsify, inriver) push catalog updates via webhook. The pipeline publishes translated records back to the marketplace's catalog API.
Component Table
Component | Role | Technology Options |
Catalog ingestion | Normalize feed formats (JSON, CSV, webhook) | FastAPI endpoint, AWS Lambda, Cloud Run |
Glossary store | Brand/unit/SKU freeze & restore | PostgreSQL + SQLAlchemy, or Firestore |
Translation queue | High-throughput job dispatch | Apache Kafka, AWS SQS, RabbitMQ |
Translation engine | Title + description localization | Mayura (Sarvam Translate), DeepL (fallback) |
Review engine | Tone-aware summarization & translation | Sarvam-30B, GPT-4o (cost fallback for edge cases) |
Transliteration | Script-level brand/unit handling | Sarvam Transliteration API |
PDF translation | Manuals, warranty, brochures | Sarvam Doc Translate |
Image OCR & retranslate | Label text extraction + overlay | Sarvam Vision + Pillow/PIL render |
Quality engine | BLEU + LLM-judge + rule checker | Custom scoring service, Ragas |
Translation cache | Avoid retranslating unchanged fields | Redis (key = SHA-256 of source field) |
Editor cockpit | Human review UI for flagged outputs | React 18 + Tailwind + Vite |
Audit log | Compliance, rollback, change tracking | PostgreSQL, append-only table |
Data Flow Walkthrough (Runtime — Single Product Update)
PIM webhook fires with an updated product record (e.g., a description change on SKU BT-BOAT-52X).
Ingestion service extracts translatable fields: title, description, attributes, and review array.
Glossary service looks up known brand names, units, and patterns; replaces them with numbered placeholders. The SHA-256 hash of each frozen field is computed.
Cache lookup: Redis checks whether a translation for this exact hash + language pair already exists. On hit, skip translation; go to step 8.
On miss, translation jobs are pushed to Kafka with target languages and field type (title, desc, review).
Kafka consumers call Mayura with the frozen text and language target. Reviews are routed to Sarvam-30B with a tone tag.
Translated text is returned; token restore service swaps placeholders back.
Quality engine scores every field: BLEU against a back-translation baseline, glossary-violation scan, and LLM-judge confidence.
Above-threshold results are written to the translated catalog store and marked APPROVED_AUTO. Below-threshold results are enqueued in the editor cockpit with scores attached.
The catalog API publisher pushes approved translations to the marketplace product feed.
Two Non-Obvious Design Decisions
Hashing at the field level, not the product level. If a product has ten fields and only the price changes, nine fields should be served from cache. Keying the cache on a per-field content hash rather than a product-level version number gives you 80–90% cache hit rates on typical catalog re-sync jobs and dramatically reduces API spend.
LLM-judge calibration against a golden set, not just BLEU. BLEU scores are fast and cheap but correlate poorly with human judgment for Indian languages—especially for longer descriptions and reviews. Building a small (500–1,000 item) human-rated golden set and fine-tuning the LLM-judge against it costs a few days of work but pays back in dramatically lower false-positive escalation rates to the editor cockpit.
Tech Stack Recommendation
Stack A — Beginner / Prototype (Build in a Weekend)
Layer | Technology | Why |
Backend | FastAPI (Python 3.11) | Async-native, fast to scaffold |
Translation | Sarvam Translate (Mayura) REST API | Simplest integration path |
Review AI | Sarvam-30B via API | No infra, pay-per-token |
Glossary store | SQLite (single file) | Zero setup for prototyping |
Cache | In-memory Python dict | Good enough for < 10K SKUs |
Queue | Python asyncio task queue | No broker needed at small scale |
Frontend | Streamlit | Fastest UI for demo purposes |
Container | Docker Compose (2 services) | Repeatable local environment |
Estimated monthly cost: $20–$50 (Sarvam API calls on a test catalog of 10K SKUs, minimal infra).
Stack B — Production-Ready (Designed to Scale)
Layer | Technology | Why |
Backend | FastAPI + Uvicorn + Gunicorn | Production ASGI, multi-worker |
Orchestration | Apache Kafka (or AWS SQS) | 10K+ product/hour throughput |
Translation | Mayura batch API + retry logic | Lower cost per call at volume |
Review AI | Sarvam-30B with tone-tag prompting | Best Indian-language tone fidelity |
Transliteration | Sarvam Transliteration API | Script-level brand protection |
PDF pipeline | Sarvam Doc Translate | Handles warranty + manual PDFs |
Image pipeline | Sarvam Vision + Pillow render | OCR + retranslate + composite |
Glossary store | PostgreSQL + pgvector | Fuzzy brand matching on new SKUs |
Cache | Redis Cluster (field-level hash key) | 80%+ cache hit on re-sync jobs |
Frontend | React 18 + Tailwind + Vite | Production cockpit UI |
Deployment | Docker + Kubernetes + Helm | Horizontal autoscaling |
Monitoring | Prometheus + Grafana | API cost + throughput dashboards |
Estimated monthly cost: $300–$800 at 500K active SKUs, 11 languages, with Redis caching at ~85% hit rate. Roughly 10–20x cheaper than an equivalent GPT-4 pipeline.
Implementation Phases
Phase 1: Catalog Ingestion and Glossary Setup
The foundation of the entire pipeline is the ingestion layer and the glossary store. This phase involves building the connectors to consume product feeds (JSON flat files, CSV exports, or live PIM webhooks from Akeneo/Salsify), normalising them into a canonical product record schema, and populating the initial glossary with brand names, unit patterns, SKU regex masks, and price formats.
Key technical decisions here include: what constitutes a "frozen token" (brand names are obvious, but what about model numbers like "Galaxy S25 Ultra" or ingredient names like "Vitamin C 500mg"?), how to handle multi-word brand names that contain common words ("Tata Salt" vs. the word "Tata" alone), and whether to use exact-match or fuzzy-match glossary lookup.
You'll also decide your field taxonomy at this stage: which fields are always translated (title, description), which are conditionally translated (attributes depending on category), and which are never touched (SKU codes, EAN barcodes, internal IDs).
Building a robust, category-aware glossary initialiser—with CSV seed import, duplicate detection, and brand-alias resolution—is covered in detail in the full course with working, tested code.
Phase 2: Streaming Translation Pipeline with Mayura
With the glossary in place, Phase 2 builds the core translation loop. This means wiring up the freeze → Mayura → restore pattern as a streaming pipeline, adding Redis caching at the field-hash level, and implementing the Kafka (or SQS) consumer architecture that lets you run multiple translation workers in parallel.
The key decisions in this phase are batch size optimisation for the Mayura API (larger batches reduce per-call overhead but increase retry blast radius), error handling and dead-letter queuing for fields that fail translation after N retries, and how to handle language-pair fallback (e.g., if Odia translation quality is below threshold, surface to human review automatically).
You'll also add the Transliteration API call path here for category-specific name handling—electronics brands transliterated differently than FMCG brands in some Indian scripts.
Configuring optimal Mayura batch sizes, retry logic, and language-pair fallback routing for a 10K-product/hour throughput target is covered in detail in the full course with working, tested code.
Phase 3: Review Summarisation and Tone-Aware Translation with Sarvam-30B
Customer reviews are qualitatively different from product descriptions. A product description benefits from accuracy and keyword density; a review needs its emotional tone intact. A disappointed customer's review translated into Hindi should read as disappointed in Hindi—not clinically neutral.
This phase routes the review text (after freeze) through Sarvam-30B with explicit tone tags derived from a pre-classification step (casual, excited, disappointed, neutral). You'll also build the review summarisation feature: Sarvam-30B generates a 2–3 sentence summary of the top-rated reviews per language cluster so that the marketplace can show "review highlights in your language" even if only 5% of reviewers write in the target language.
Key decisions: tone classification model or prompt (zero-shot vs. fine-tuned), summary length constraint per target language (some Indian scripts are more compact than English), and whether to translate all reviews or only top-rated and most-voted ones for cost management.
Prompt templates for tone-preserving review translation in 11 Indian languages—with worked examples of "excited", "disappointed", and "neutral" tones—are included in the full course with working, tested code.
Phase 4: Doc Translate, Sarvam Vision, and Quality Scoring
Phase 4 adds the two remaining AI surfaces and the quality engine that ties the whole pipeline together.
Doc Translate handles the PDFs—product manuals, warranty cards, regulatory compliance sheets, and catalog brochures. The key decision here is whether to pass PDFs directly or to extract structured text first; direct PDF translation via Sarvam Doc Translate is simpler but offers less control over glossary freeze.
Sarvam Vision handles image labels: the pipeline extracts text from product label images via OCR, runs the frozen text through Mayura, and uses Pillow to composite the translated text back onto the original image (matching font size and bounding box). This is critical for FMCG and pharma products where the label image itself needs to be localised for vernacular storefronts.
Quality scoring runs after every translation: a BLEU back-translation score, a glossary-violation check (did any placeholder survive into the output?), and an LLM-judge call that scores fluency and brand faithfulness. Outputs below the threshold are enqueued for human review.
The LLM-judge calibration workflow—building a golden evaluation set and tuning the threshold per language pair—is covered in detail in the full course with working, tested code.
Phase 5: Editor Cockpit, Caching, and Deployment
The final phase builds the surfaces that make the pipeline production-operable: the editor cockpit React UI, cache invalidation logic, cost benchmarking, and containerised deployment.
The editor cockpit displays all below-threshold translations with their quality scores, side-by-side source and translated text, one-click approve or edit controls, and an audit trail. Catalog managers can fix a translation, submit it, and have it instantly published without re-triggering the full pipeline.
Cache invalidation is subtler than it sounds: when a product is updated, only fields whose content hash has changed should be invalidated. A bulk price update should not flush description translations for two million SKUs.
Deployment uses Docker Compose for local development and Kubernetes + Helm for production, with Prometheus metrics on API cost per language pair, queue depth, cache hit rate, and quality score distribution.
The full Docker Compose and Helm chart, plus a cost benchmarking notebook comparing Sarvam vs. GPT-4 at 100K, 500K, and 1M SKUs, are included in the full course with working, tested code.
Common Challenges
Here are seven genuine engineering challenges you will hit when building this system—and how to solve each one.
1. Brand names get translated in context Root cause: Mayura (and every translation model) uses surrounding context to resolve ambiguous tokens. "BoAt wireless earphones" contains enough context for the model to resolve "BoAt" as a noun meaning a watercraft. Fix: Freeze every brand token before sending to the model. Run a pre-translation regex + glossary lookup, replace with {{BRAND_N}}, restore post-translation. Test edge cases: multi-word brands, brands that are substrings of common words.
2. Unit inconsistency across the catalog Root cause: "500 g" and "500 grams" and "500 gm" all mean the same thing but will translate differently. Without normalisation, you end up with three variants of the same unit in the translated catalog. Fix: Add a unit normalisation step before the freeze that canonicalises all unit strings (regex + lookup table) before they go into the glossary freeze. Define canonical forms per category.
3. Review tone flattening in translation Root cause: Generic MT (including Mayura without explicit guidance) tends toward neutral register. Excited or disappointed tone markers are dropped. Fix: Classify reviews by tone before translation. Pass a tone-instruction tag in the Sarvam-30B system prompt. Validate with an LLM judge that checks tone fidelity, not just semantic accuracy.
4. Cache invalidation on partial updates Root cause: PIM systems often push a full product record even when only one field changes. If you key your cache on the product ID, you'll bust the cache for unchanged fields unnecessarily. Fix: Key the cache on the SHA-256 hash of each individual field value, not the product-level version. Unchanged fields always get a cache hit regardless of which other fields changed.
5. BLEU scores are noisy for Indian languages Root cause: BLEU was designed for European language MT and correlates poorly with human judgment for morphologically rich languages like Tamil, Telugu, or Kannada. Fix: Use BLEU as a cheap first filter only. Build a 500–1,000 item human-rated golden set per language and calibrate an LLM-judge against it. Use the LLM-judge score as the primary quality gate.
6. Throughput bottleneck at the Mayura API Root cause: Naive sequential API calls top out at a few hundred products/hour. Kafka consumers running on a single machine hit rate limits. Fix: Use Mayura's batch translation endpoint, run multiple Kafka consumer groups, and implement an exponential backoff retry with jitter. Add a circuit breaker that routes overflow to a secondary translation model.
7. SEO keyword intent lost in literal translation Root cause: A product titled "Best running shoes for flat feet" translated literally into Hindi may use words that Indian shoppers don't actually search for. The semantics are correct; the SEO value is zero. Fix: For high-traffic category titles, add a post-translation keyword alignment step: compare the translated title against the top-5 vernacular search queries for that category (sourced from Google Search Console) and flag titles where the overlap is below a threshold.
Solving these issues took us over 120 hours of testing across 11 language pairs—the course walks you through each fix with working code, production-validated configurations, and test cases.
Ready to Build This Yourself?
Understanding an architecture in a blog post and actually shipping production code against it are two very different things. The gap is real: setting up Kafka consumers that reliably handle retries, calibrating an LLM judge against your golden set, wiring up Sarvam Vision to render composite images, and benchmarking cost at scale all take time and experience to get right the first time.
The Vernacular E-commerce Catalog Localization Engine course on labs.codersarts.com gives you everything you need to go from zero to a running pipeline:
✅ Full source code for all five phases (Python backend + React cockpit)
✅ Sample catalog dataset: 10,000 SKUs across 5 product categories
✅ Pre-populated glossary scaffolding (300+ Indian brand names, unit patterns, price formats)
✅ Eval rubric and LLM-judge prompts calibrated for Indian languages
✅ Docker Compose setup for local development (Kafka + Redis + Postgres + FastAPI + React)
✅ Kubernetes + Helm deployment chart for production
✅ Cost benchmarking notebook: Sarvam vs. GPT-4 vs. DeepL at 100K / 500K / 1M SKUs
✅ Lifetime access and free updates as the Sarvam API evolves
✅ Community support in the Codersarts Discord for questions and code reviews
✅ Video walkthroughs for each phase covering architecture decisions and code choices
$29. Everything above. One payment, lifetime access.
Need a custom integration with your PIM (Akeneo, Salsify, inriver), a cost optimisation workshop for your specific catalog size, or glossary onboarding for your brand portfolio? Book a 1:1 Guided Session for $99 and we'll work through it together.
Conclusion
Building a production-grade vernacular catalog localization engine for Indian marketplaces requires more than a translation API call. The architecture described here—glossary-freeze, Mayura batch translation, Sarvam-30B tone-aware review processing, field-level Redis caching, and LLM-judge quality scoring—addresses every failure mode that generic MT introduces at scale.
If you're starting from scratch, begin with Stack A: a single FastAPI service with SQLite glossary, Mayura REST calls, and a Streamlit review UI. Validate the freeze/restore pattern on a small subset of your catalog (500–1,000 SKUs across 3 languages) before scaling. Once you're confident in translation quality, layer in Kafka, Redis, and the production Kubernetes deployment.
The fastest path from architecture to working code is the course at labs.codersarts.com—everything you need is already built, tested, and ready to run.



Comments