OCR

OCR Plugin steps have been designed to convert images to text with tools using Optical Character Recognition technology.

OCR: Google Vision

Description

OCR: Google Vision plugin step detects and extracts text from an image and provides text output in JSON format.

Prerequisites:

Create a Google Cloud Vision API key

https://cloud.google.com/docs/authentication/api-keys?hl=en&visit_id=637051029162974596-3924725435&rd=1#creating_an_api_key

Add restrictions to API keys

https://cloud.google.com/docs/authentication/api-keys#api_key_restrictions

Fill the details under the following as seen in the snapshot below,

Billing -> Payment Settings and Billing -> Payment Method for API Key to work.

Configurations

No.
Field Name
Description
1
Step Name
Name of the step. This name has to be unique in a single workflow.
2
API Key:

3
Accept Value as variable/static
Leave checkbox unchecked to accept API Key value from a field in the previous steps of the stream using a drop down list.
Else enable checkbox for API Key field to appear as Text box.
4
API Key
Specify the API Key for authentication to Google Cloud Platform. This field is mandatory. API Key is encrypted and is not stored in the .psw file.

API Key is entered using a widget. The widget handles both Text (static value or environment variable) and Combo (drop down containing values from previous steps). If checkbox above is enabled API Key field appears as Text box. Else if checkbox above is disabled API Key field appears as a drop down to select fields from previous steps.
5
Button: Test Connection
Test connection with the API provided. Verifies whether the connection is available or not.

Note: If the connection fields are provided from previous step, then Test Connection Button does not work.

Input Tab:

No.
Field Name
Description

Input Fields:

1
Path/URL
Specify the path of the image file to be converted to text or click the Browse button to browse the file path.
2
Button: Browse
Clicking on this button brings up the dialog to browse the image file to be converted to text format.
3
Type
Specify an annotation features that support optical character recognition (OCR). Specify one of the following annotation features,
‘TEXT_DETECTION’ detects and extracts text from any image. For example, a photograph might contain a street sign or traffic sign. The JSON includes the entire extracted string, as well as individual words, and their bounding boxes.
‘DOCUMENT_TEXT_DETECTION’ also extracts text from an image, but the response is optimized for dense text and documents. The JSON includes page, block, paragraph, word, and break information.
‘OBJECT_LOCALIZATION’
Detects multiple objects in an image and provides information about the objects and where the object was found in the image.

Output Tab:

No.
Field Name
Description

Output Fields:

1
Result
Specify an output field to hold converted json text on successful plugin execution. The default value is OutputText.

Common Buttons:

No.
Field Name
Description

Buttons:

1
OK
On click of this button. It will check the field values. If any required field values are missing, then it will display validation error message.
If all the required field values are provided then it will save the field values.
2
Cancel
On click of this button, it will cancel the window and do not save any values.

OCR: Tesseract

Description

OCR: Tesseract plugin step detects and extracts text from an image to a readable text type. Supported image types: BMP, PNG, JPG, JPEG.

Compatibility: Tesseract version 4.0.0.

Prerequisites:

Download tessdata(tesseract-ocr) version 4.0.0.
https://github.com/tesseract-ocr/tessdata
After download, extract it and put it on the processing machine on a particular path. You will need to specify this path in the ‘Data Folder Path’ in the step.
Install Microsoft Visual C++ Redistributable for Visual Studio 2015, 2017, and 2019 (32 bit & 64 bit)

Configurations

No.
Field Name
Description
1
Step Name
Name of the step. This name has to be unique in a single workflow.

Input Tab:

No.
Field Name
Description

Input Fields:

1
Data Folder Path
Specify the Tesseract data folder path or click the Browse button to browse the folder path (data folder path is mentioned in the prerequisites).
The data type is String. This field is mandatory.
2
Button: Browse
Clicking on this button brings up the dialog to browse the Tesseract data folder path.
3
File Path
Specify the path of the input image file to extract readable text. Alternately browse the file path.
Note: Supported image types are BMP, PNG, JPG, JPEG
The data type is String. This field is mandatory.
4
Button: Browse
Clicking on this button brings up the dialog to browse the image File path.
5
Language Code
Specify Language. (e.g. eng for English, hin for Hindi, urd for Urdu). Multiple languages can be passed. Add ‘+’ sign to extract multi-language output.

For language code refer URL:
https://muthu.co/all-tesseract-ocr-options/
Default value is: eng. The data type is String.
6
Page Segment Mode
Select Page Segmentation Mode required as per the input file type. Allowed values are 0-13. The data type is String.
Please refer table below for a list of Page Segmentation Mode with a description.

Sr. No.
Page Segment Mode
Description
1
0
Orientation and script detection (OSD) only.
2
1
Automatic page segmentation with OSD.
3
2
Automatic page segmentation, but no OSD, or OCR.
4
3
Fully automatic page segmentation, but no OSD. (Default)
5
4
Assume a single column of text of variable sizes.
6
5
Assume a single uniform block of vertically aligned text.
7
6
Assume a single uniform block of text.
8
7
Treat the image as a single text line.
9
8
Treat the image as a single word.
10
9
Treat the image as a single word in a circle.
11
10
Treat the image as a single character.
12
11
Sparse text. Find as much text as possible in no particular order.
13
12
Sparse text with OSD.
14
13
Raw line. Treat the image as a single text line, bypassing hacks that are Tesseract-specific.

Output Tab:

No.
Field Name
Description

Output Field:

1
Output Text
Specify an output field to hold converted text on successful plugin execution. The default value is OutputText.

Common Buttons:

No.
Field Name
Description

Buttons:

1
OK
On click of this button. It will check the field values. If any required field values are missing then it will display validation error message.
If all the required field values are provided then it will save the field values.
2
Cancel
On click of this button, it will cancel the window and do not save any values

Links to better reach

AutomationEdge Training Portal

Bot Store

EPD

No.	Field Name	Description
1	Step Name	Name of the step. This name has to be unique in a single workflow.
2	API Key:
3	Accept Value as variable/static	Leave checkbox unchecked to accept API Key value from a field in the previous steps of the stream using a drop down list. Else enable checkbox for API Key field to appear as Text box.
4	API Key	Specify the API Key for authentication to Google Cloud Platform. This field is mandatory. API Key is encrypted and is not stored in the .psw file. API Key is entered using a widget. The widget handles both Text (static value or environment variable) and Combo (drop down containing values from previous steps). If checkbox above is enabled API Key field appears as Text box. Else if checkbox above is disabled API Key field appears as a drop down to select fields from previous steps.
5	Button: Test Connection	Test connection with the API provided. Verifies whether the connection is available or not. Note: If the connection fields are provided from previous step, then Test Connection Button does not work.

Input Tab:
No.	Field Name	Description
	Input Fields:
1	Path/URL	Specify the path of the image file to be converted to text or click the Browse button to browse the file path.
2	Button: Browse	Clicking on this button brings up the dialog to browse the image file to be converted to text format.
3	Type	Specify an annotation features that support optical character recognition (OCR). Specify one of the following annotation features, ‘TEXT_DETECTION’ detects and extracts text from any image. For example, a photograph might contain a street sign or traffic sign. The JSON includes the entire extracted string, as well as individual words, and their bounding boxes. ‘DOCUMENT_TEXT_DETECTION’ also extracts text from an image, but the response is optimized for dense text and documents. The JSON includes page, block, paragraph, word, and break information. ‘OBJECT_LOCALIZATION’ Detects multiple objects in an image and provides information about the objects and where the object was found in the image.

Output Tab:
No.	Field Name	Description
	Output Fields:
1	Result	Specify an output field to hold converted json text on successful plugin execution. The default value is OutputText.

Common Buttons:
No.	Field Name	Description
	Buttons:
1	OK	On click of this button. It will check the field values. If any required field values are missing, then it will display validation error message. If all the required field values are provided then it will save the field values.
2	Cancel	On click of this button, it will cancel the window and do not save any values.

Sr. No.	Page Segment Mode	Description
1	0	Orientation and script detection (OSD) only.
2	1	Automatic page segmentation with OSD.
3	2	Automatic page segmentation, but no OSD, or OCR.
4	3	Fully automatic page segmentation, but no OSD. (Default)
5	4	Assume a single column of text of variable sizes.
6	5	Assume a single uniform block of vertically aligned text.
7	6	Assume a single uniform block of text.
8	7	Treat the image as a single text line.
9	8	Treat the image as a single word.
10	9	Treat the image as a single word in a circle.
11	10	Treat the image as a single character.
12	11	Sparse text. Find as much text as possible in no particular order.
13	12	Sparse text with OSD.
14	13	Raw line. Treat the image as a single text line, bypassing hacks that are Tesseract-specific.