Document Digitization

This pipeline is used to extract text from a digital/scanned document. Lines and layouts (header, footer, paragraph, table, cell, image) are detected by a custom-trained Prima layout model and OCR is done using the Anuvaad OCR model.

Github repo: Anuvaad Document Processor

API contract: API Contract

How to Use

Upload a PDF or image file using the upload API:
Upload URL: https://auth.anuvaad.org/anuvaad-api/file-uploader/v0/upload-file
Get the upload ID and copy it to the DD2.0 input path.

Initiate the Workflow:

WF URL: https://auth.anuvaad.org/anuvaad-etl/wf-manager/v1/workflow/async/initiate

DD2.0 Input:

{
    "files": [
        {
            "locale": "language",
            "path": "file_name",
            "type": "file_format",
            "config": {
                "OCR": {
                    "option": "HIGH_ACCURACY",
                    "language": "language"
                }
            }
        }
    ],
    "workflowCode": "WF_A_FCWDLDBSOD20TESOTK"
}

Microservices

Word Detector

Input: PDF or image
Output: List of pages with detected lines and page information.

Github repo: Word Detector Craft

API contract: Word Detector API Contract

How to use: Word Detector

Upload a PDF or image file using the upload API:
Upload URL: https://auth.anuvaad.org/anuvaad-api/file-uploader/v0/upload-file

Initiate the Word Detector Workflow:

WF URL: https://auth.anuvaad.org/anuvaad-etl/wf-manager/v1/workflow/async/initiate

Word Detector Input:

{
    "files": [
        {
            "locale": "language",
            "path": "file_name",
            "type": "file_format",
            "config": {
                "OCR": {
                    "option": "HIGH_ACCURACY",
                    "language": "language"
                }
            }
        }
    ],
    "workflowCode": "WF_A_WD"
}

Layout Detector

Input: Output of word detector
Output: List of pages with detected layouts and lines.

Github repo: Layout Detector Prima

API contract: Layout Detector API Contract

How to use: Layout Detector

Input JSON file of the word detector as an input path.

Initiate the Layout Detector Workflow:

WF URL: https://auth.anuvaad.org/anuvaad-etl/wf-manager/v1/workflow/async/initiate

Layout Detector Input:

{
    "files": [
        {
            "locale": "language",
            "path": "word_detector_output",
            "type": "json",
            "config": {
                "OCR": {
                    "option": "HIGH_ACCURACY",
                    "language": "language"
                }
            }
        }
    ],
    "workflowCode": "WF_A_LD"
}

Block Segmenter

Input: Output of layout detector
Output: Collation of line and word at layout level.

Github repo: Block Segmenter

API contract: Block Segmenter API Contract

How to use: Block Segmenter

Input JSON file of the layout detector as an input path.

Initiate the Block Segmenter Workflow:

WF URL: https://auth.anuvaad.org/anuvaad-etl/wf-manager/v1/workflow/async/initiate

Block Segmenter Input:

{
    "files": [
        {
            "locale": "language",
            "path": "layout_detector_output",
            "type": "json",
            "config": {
                "OCR": {
                    "option": "HIGH_ACCURACY",
                    "language": "language"
                }
            }
        }
    ],
    "workflowCode": "WF_A_BS"
}

Input: Output of block segmenter
Output: Text collation at word, line, and paragraph level using Google Vision as the OCR engine.

Tesseract OCR

Input: Output of block segmenter
Output: Text collation at word, line, and paragraph level using Anuvaad OCR model.

Github repo: OCR Tesseract Server

API contract: Google Vision API Contract

How to use: Tesseract OCR

Input JSON file of the block segmenter as an input path.

Initiate the Tesseract OCR Workflow:

WF URL: https://auth.anuvaad.org/anuvaad-etl/wf-manager/v1/workflow/async/initiate

Tesseract OCR Input:

{
    "files": [
        {
            "locale": "language",
            "path": "block_segmenter_output",
            "type": "json",
            "config": {
                "OCR": {
                    "option": "HIGH_ACCURACY",
                    "language": "language"
                }
            }
        }
    ],
    "workflowCode": "WF_A_OD20TES"
}

Google OCR (Tesseract Alternative)

How to use: Google Vision OCR

Input JSON file of the block segmenter as an input path.

Initiate the Google OCR Workflow:

WF URL: https://auth.anuvaad.org/anuvaad-etl/wf-manager/v1/workflow/async/initiate

Google OCR Input:

{
    "files": [
        {
            "locale": "language",
            "path": "block_segmenter_output",
            "type": "json",
            "config": {
                "OCR": {
                    "option": "HIGH_ACCURACY",
                    "language": "language"
                }
            }
        }
    ],
    "workflowCode": "WF_A_OTES"
}

Github repo: OCR Google Vision Server

API contract: Google Vision API Contract

PreviousContent Handler NextFile uploader

Last updated 8 months ago