Modulewise Appendix
Summary of the purpose of each module and necessary links
Key API contract: API Contract
slno | Module name | Purpose | Code location | API contract |
---|---|---|---|---|
1 | user management | manage the User and Admin side functionalities in Anuvaad. | ||
2 | file handler | The User Uploads the file and in return the file will be stored in the samba share for further api's to access them. | ||
3 | file converter | consumes the input files and converts them into PDF. Best results are obtained only for the file formats supported by Libreoffice. | ||
4 | file translator | transform the data in the file to form JSON file and download the translated files of type docx, pptx, and html. | ||
5 | content handler | handle and retrieve back the contents (final result) of files translated in the Anuvaad system. | ||
6 | document converter | This microservice is intended to generate the final document after translation and digitization. This currently supports pdf, txt, xlsx document generation. | ||
7 | tokenizer | tokenise the input paragraphs received into independently translatable sentences which can be consumed by downstream services to translate the entire input | ||
8 | ocr tokenizer | This service is used to tokenise the input paragraphs received into independently translatable sentences which can be consumed by downstream services to translate the entire input. | ||
9 | ocr content handler | handle and manipulate the digitized data from anuvaad-gv-document-digitize which is part of the Anuvaad system. | ||
10 | Aligner | This Module is for “aligning” or simply, finding similar sentence pairs from two lists of sentences, | ||
11 | workflow manager | centralized orchestrator which directs the user input through the dataflow pipeline to achieve the desired output. | ||
12 | Block merger | extract text from a digital document in a structured format(paragraph,image,table) which is then used for translation purposes. | ||
13 | translator | Translator is a wrapper over the NMT and is used to send sentence by sentence to NMT for translation of the document | ||
14 | word detector | Input as pdf or image If input is pdf , then convert pdf into images Use custom prima line model to line detection in the image | ||
15 | layout detector | Output of word detector as an input. Use a prima layout model for layout detection in the image. | ||
16 | block segmenter | Output of layout detector as an input. Collation of line and word at layout level | ||
17 | google vision ocr | Output of block segmenter as an input. Use google vision as OCR engine. Text collation at word,line and paragraph level. | ||
18 | tesseract ocr | Output of block segmenter as an input. Use Anuvaad ocr model as OCR engine. Text collation at word,line and paragraph level. | ||
19 | NMT | This service gets the translated content either by invoking the model directly or fetches translated content from Dhruva platform. | ||
20 | metrics | Display Analytics |
Last updated