OCR Content handler
This microservice is served with multiple APIs to handle and manipulate the digitized data from anuvaad-gv-document-digitize
, which is part of the Anuvaad system. This service is functionally similar to the Content Handler service but differs since the output document (digitized doc) structure varies.
Modules
OCR Document Modules
DigitalDocumentSave
API to save translated documents. The JSON request object is generated from anuvaad-gv-document-digitizer
and later updated by tokenizer. This API is being used internally.
Mandatory parameters: files
, record_id
Actions:
Validating input params as per the policies
The document to be saved is converted into blocks of pages
Each block contains regions such as line, word, table, etc.
Every block is created with UUID
Saving blocks in the database
DigitalDocumentUpdateWord
API to update the text in the digitized doc. RBAC enabled.
Mandatory parameters: words
, record_id
, region_id
, word_id
, updated_word
Actions:
Validating input params as per the policies
Looping over the regions to locate the word to be updated
Updating the word and setting a flag
save=True
DigitalDocumentGet
API to fetch back the document. RBAC enabled.
Mandatory parameters: record_id
, start_page
, end_page
Actions:
Validating input params as per the policies
Returning back the document as an array of pages
Last updated