Project Anuvaad
  • Sunbird Anuvaad Overview
    • Features
  • USE
    • Playbook
    • Video Tutorials
  • LEARN
    • Architecture
    • Technology Stack
    • Repository structure and developers guide
    • Setting up Anuvaad on your own
    • Git branching strategies
    • Anuvaad Module Config Guidelines
  • MODULES
    • Modulewise Appendix
    • Anuvaad Workflow Manager
    • User management
    • Document converter
    • Auditor
    • OCR Content handler
    • Block merger
    • Content Handler
    • Document Digitization
    • File uploader
    • Aligner
    • ETL Translator
    • File translator
    • Anuvaad Zuul Gateway System
    • Anuvaad Translator
    • Tokenizer
    • Analytics
    • NMT
  • Legacy
    • Model Retraining
    • NMT Inference
    • Integration
      • Registration
      • Login and auth token
      • Supported Language pairs and translation models
      • Translate texts
    • Service Contracts
    • API Host Endpoints
  • ENGAGE
    • FAQ
    • KT Videos
    • Source Code Repository
    • Discuss
    • Tools
      • anuvaad-corpus-tools
      • layout-mt-corpus
      • ocr-toolkit
      • anuvaad-ocr-corpus
      • parallel-corpus
      • anuvaad-em
Powered by GitBook
On this page
  • Initial Version
  • Intermediary Version
  • Current Version
Edit on GitHub
Export as PDF
  1. MODULES

NMT

PreviousAnalyticsNextLegacy

Last updated 11 months ago

The NMT module is responsible for the translation of sentences. It can be invoked directly or via the Workflow Manager. The NMT module works in correlation with the ETL Translator to enhance translation efficiency based on previous translations or pre-provided glossary and TMX support (refer to other sections). The module supports batch inferencing and provides APIs that return model details for language and other dropdown menus.

Initial Version

In the early days of Anuvaad, OpenNMT-py based models trained on Anuvaad's proprietary data were used. These models were primarily focused on judicial content. The inference code for this initial version is available here: .

Intermediary Version

With the collaboration between Anuvaad and , data from Anuvaad and other sources were used to publish the Samanantar paper (https://arxiv.org/abs/2104.05596). Using the Samanantar dataset, IndicTrans, a more general domain model, was trained. This model performed well for legal use cases, leading to the replacement of OpenNMT with . The IndicTrans-based inferencing code is available here: .

Current Version

As the Sunbird ecosystem developed, the need for hosting multiple ML models independently became resource-intensive. This led to the development of , a centralized platform for hosting models. Applications can now utilize models from Dhruva using APIs. In Dhruva, models are wrapped with NVIDIA Triton, facilitating a scalable architecture. The IndicTrans model was moved to Dhruva, and currently, models are invoked from Dhruva via wrapper APIs from the NMT module rather than using dedicated inference. The Dhruva-ported code is available here: .


OpenNMT-py Inference Code
Ai4Bharat
IndicTrans
IndicTrans Inference Code
Dhruva
Dhruva Ported Code