This microservice is intended to generate the final document after translation and digitization. This currently supports pdf, txt, xlsx document generation.

GitHub

API Contract

tokenizer

tokenise the input paragraphs received into independently translatable sentences which can be consumed by downstream services to translate the entire input

GitHub

API Contract

ocr tokenizer

This service is used to tokenise the input paragraphs received into independently translatable sentences which can be consumed by downstream services to translate the entire input.

GitHub

API Contract

ocr content handler

handle and manipulate the digitized data from anuvaad-gv-document-digitize which is part of the Anuvaad system.

GitHub

API Contract

Aligner

This Module is for “aligning” or simply, finding similar sentence pairs from two lists of sentences,

GitHub

API Contract

workflow manager

centralized orchestrator which directs the user input through the dataflow pipeline to achieve the desired output.

GitHub

API Contract

Block merger

extract text from a digital document in a structured format(paragraph,image,table) which is then used for translation purposes.

GitHub

API Contract

translator

Translator is a wrapper over the NMT and is used to send sentence by sentence to NMT for translation of the document

GitHub

word detector

Input as pdf or image If input is pdf , then convert pdf into images Use custom prima line model to line detection in the image

GitHub

API Contract

layout detector

Output of word detector as an input. Use a prima layout model for layout detection in the image.

GitHub

API Contract

block segmenter

Output of layout detector as an input. Collation of line and word at layout level

GitHub

API Contract

google vision ocr

Output of block segmenter as an input. Use google vision as OCR engine. Text collation at word,line and paragraph level.

GitHub

API Contract

tesseract ocr

Output of block segmenter as an input. Use Anuvaad ocr model as OCR engine. Text collation at word,line and paragraph level.

GitHub

API Contract

NMT

This service gets the translated content either by invoking the model directly or fetches translated content from Dhruva platform.

GitHub

API Contract

metrics

Display Analytics

GitHub

API Contract

PreviousAnuvaad Module Config Guidelines NextAnuvaad Workflow Manager

Last updated 1 year ago