Project Anuvaad
  • Sunbird Anuvaad Overview
    • Features
  • USE
    • Playbook
    • Video Tutorials
  • LEARN
    • Architecture
    • Technology Stack
    • Repository structure and developers guide
    • Setting up Anuvaad on your own
    • Git branching strategies
    • Anuvaad Module Config Guidelines
  • MODULES
    • Modulewise Appendix
    • Anuvaad Workflow Manager
    • User management
    • Document converter
    • Auditor
    • OCR Content handler
    • Block merger
    • Content Handler
    • Document Digitization
    • File uploader
    • Aligner
    • ETL Translator
    • File translator
    • Anuvaad Zuul Gateway System
    • Anuvaad Translator
    • Tokenizer
    • Analytics
    • NMT
  • Legacy
    • Model Retraining
    • NMT Inference
    • Integration
      • Registration
      • Login and auth token
      • Supported Language pairs and translation models
      • Translate texts
    • Service Contracts
    • API Host Endpoints
  • ENGAGE
    • FAQ
    • KT Videos
    • Source Code Repository
    • Discuss
    • Tools
      • anuvaad-corpus-tools
      • layout-mt-corpus
      • ocr-toolkit
      • anuvaad-ocr-corpus
      • parallel-corpus
      • anuvaad-em
Powered by GitBook
On this page
Edit on GitHub
Export as PDF
  1. LEARN

Architecture

Architecture of Anuvaad

PreviousVideo TutorialsNextTechnology Stack

Last updated 6 months ago

The architecture is around 2 major blocks :

  • Document Digitization

  • Document Translation

Components

Component
Details

Workflow Manager(WM)

Centralized Orchestrator based on user request.

Auditor

Python package/library used for formatting , exception handling.

File Uploader

Microservice to upload and maintain user documents.

File Converter

Microservice to convert files from one format to other. E.g: .doc to .pdf files.

Aligner

Microservice accepts source and target sentances and align them to form parallel corpus.

Tokenizer

Microservice tokenises pragraphs into independently translatable sentences.

Layout Detector

Microservice interface for Layout detection model.

Block Segmenter

Handles layout detection miss-classifications , region unifying.

Word Detector

Word detection.

Block Merger

An OCR system that extracts texts, images, tables, blocks etc from the input file and makes it avaible in the format which can be utilised by downstream services to perform Translation. This can also be used as an independent product that can perform OCR on files, images, ppts, etc.

Translator

Translator pushes sentences to NMT module, which internally invokes IndicTrans model hosted in Dhruva to translate and push back sentences during the document translation flow.

Content Handler

Repository Microservice which maintains and manages all the translated documents

Translation Memory X(TMX)

System translation memory to facilitate overriding NMT translation with user preferred translation. TMX provides three levels of caching - Global , User , Organisation.

User Translation Memory(UTM)

System tracks and remembers individual user translations or corrected translations and applies automatically when same sentences are encountered again.

Block Diagram
Document Digitization Flow