Modulewise Appendix

Summary of the purpose of each module and necessary links

Key API contract: API Contract

slno
Module name
Purpose
Code location
API contract

1

user management

manage the User and Admin side functionalities in Anuvaad.

2

file handler

The User Uploads the file and in return the file will be stored in the samba share for further api's to access them.

3

file converter

consumes the input files and converts them into PDF. Best results are obtained only for the file formats supported by Libreoffice.

4

file translator

transform the data in the file to form JSON file and download the translated files of type docx, pptx, and html.

5

content handler

handle and retrieve back the contents (final result) of files translated in the Anuvaad system.

6

document converter

This microservice is intended to generate the final document after translation and digitization. This currently supports pdf, txt, xlsx document generation.

7

tokenizer

tokenise the input paragraphs received into independently translatable sentences which can be consumed by downstream services to translate the entire input

8

ocr tokenizer

This service is used to tokenise the input paragraphs received into independently translatable sentences which can be consumed by downstream services to translate the entire input.

9

ocr content handler

handle and manipulate the digitized data from anuvaad-gv-document-digitize which is part of the Anuvaad system.

10

Aligner

This Module is for “aligning” or simply, finding similar sentence pairs from two lists of sentences,

11

workflow manager

centralized orchestrator which directs the user input through the dataflow pipeline to achieve the desired output.

12

Block merger

extract text from a digital document in a structured format(paragraph,image,table) which is then used for translation purposes.

13

translator

Translator is a wrapper over the NMT and is used to send sentence by sentence to NMT for translation of the document

14

word detector

Input as pdf or image If input is pdf , then convert pdf into images Use custom prima line model to line detection in the image

15

layout detector

Output of word detector as an input. Use a prima layout model for layout detection in the image.

16

block segmenter

Output of layout detector as an input. Collation of line and word at layout level

17

google vision ocr

Output of block segmenter as an input. Use google vision as OCR engine. Text collation at word,line and paragraph level.

18

tesseract ocr

Output of block segmenter as an input. Use Anuvaad ocr model as OCR engine. Text collation at word,line and paragraph level.

19

NMT

This service gets the translated content either by invoking the model directly or fetches translated content from Dhruva platform.

20

metrics

Display Analytics

Last updated