Project Anuvaad
  • Sunbird Anuvaad Overview
    • Features
  • USE
    • Playbook
    • Video Tutorials
  • LEARN
    • Architecture
    • Technology Stack
    • Repository structure and developers guide
    • Setting up Anuvaad on your own
    • Git branching strategies
    • Anuvaad Module Config Guidelines
  • MODULES
    • Modulewise Appendix
    • Anuvaad Workflow Manager
    • User management
    • Document converter
    • Auditor
    • OCR Content handler
    • Block merger
    • Content Handler
    • Document Digitization
    • File uploader
    • Aligner
    • ETL Translator
    • File translator
    • Anuvaad Zuul Gateway System
    • Anuvaad Translator
    • Tokenizer
    • Analytics
    • NMT
  • Legacy
    • Model Retraining
    • NMT Inference
    • Integration
      • Registration
      • Login and auth token
      • Supported Language pairs and translation models
      • Translate texts
    • Service Contracts
    • API Host Endpoints
  • ENGAGE
    • FAQ
    • KT Videos
    • Source Code Repository
    • Discuss
    • Tools
      • anuvaad-corpus-tools
      • layout-mt-corpus
      • ocr-toolkit
      • anuvaad-ocr-corpus
      • parallel-corpus
      • anuvaad-em
Powered by GitBook
On this page
Edit on GitHub
Export as PDF
  1. MODULES

Modulewise Appendix

Summary of the purpose of each module and necessary links

PreviousAnuvaad Module Config GuidelinesNextAnuvaad Workflow Manager

Last updated 11 months ago

Key API contract:

slno
Module name
Purpose
Code location
API contract

1

user management

manage the User and Admin side functionalities in Anuvaad.

2

file handler

The User Uploads the file and in return the file will be stored in the samba share for further api's to access them.

3

file converter

consumes the input files and converts them into PDF. Best results are obtained only for the file formats supported by Libreoffice.

4

file translator

transform the data in the file to form JSON file and download the translated files of type docx, pptx, and html.

5

content handler

handle and retrieve back the contents (final result) of files translated in the Anuvaad system.

6

document converter

This microservice is intended to generate the final document after translation and digitization. This currently supports pdf, txt, xlsx document generation.

7

tokenizer

tokenise the input paragraphs received into independently translatable sentences which can be consumed by downstream services to translate the entire input

8

ocr tokenizer

This service is used to tokenise the input paragraphs received into independently translatable sentences which can be consumed by downstream services to translate the entire input.

9

ocr content handler

handle and manipulate the digitized data from anuvaad-gv-document-digitize which is part of the Anuvaad system.

10

Aligner

This Module is for “aligning” or simply, finding similar sentence pairs from two lists of sentences,

11

workflow manager

centralized orchestrator which directs the user input through the dataflow pipeline to achieve the desired output.

12

Block merger

extract text from a digital document in a structured format(paragraph,image,table) which is then used for translation purposes.

13

translator

Translator is a wrapper over the NMT and is used to send sentence by sentence to NMT for translation of the document

14

word detector

Input as pdf or image If input is pdf , then convert pdf into images Use custom prima line model to line detection in the image

15

layout detector

Output of word detector as an input. Use a prima layout model for layout detection in the image.

16

block segmenter

Output of layout detector as an input. Collation of line and word at layout level

17

google vision ocr

Output of block segmenter as an input. Use google vision as OCR engine. Text collation at word,line and paragraph level.

18

tesseract ocr

Output of block segmenter as an input. Use Anuvaad ocr model as OCR engine. Text collation at word,line and paragraph level.

19

NMT

This service gets the translated content either by invoking the model directly or fetches translated content from Dhruva platform.

20

metrics

Display Analytics

API Contract
GitHub
API Contract
GitHub
API Contract
GitHub
API Contract
GitHub
API Contract
GitHub
API Contract
GitHub
API Contract
GitHub
API Contract
GitHub
API Contract
GitHub
API Contract
GitHub
API Contract
GitHub
API Contract
GitHub
API Contract
GitHub
GitHub
API Contract
GitHub
API Contract
GitHub
API Contract
GitHub
API Contract
GitHub
API Contract
GitHub
API Contract
GitHub
API Contract