OCR Plugin steps have been designed to convert images to text with tools using Optical Character Recognition technology.
OCR: Google Vision plugin step detects and extracts text from an image and provides text output in JSON format.
Prerequisites:
https://cloud.google.com/docs/authentication/api-keys#api_key_restrictions
Billing -> Payment Settings and Billing -> Payment Method for API Key to work.
No.
Field Name
Description
1
Step Name
Name of the step. This name has to be unique in a single workflow.
2
API Key:
3
Accept Value as variable/static
Leave checkbox unchecked to accept API Key value from a field in the previous steps of the stream using a drop down list.
Else enable checkbox for API Key field to appear as Text box.
4
API Key
Specify the API Key for authentication to Google Cloud Platform. This field is mandatory. API Key is encrypted and is not stored in the .psw file.
API Key is entered using a widget. The widget handles both Text (static value or environment variable) and Combo (drop down containing values from previous steps). If checkbox above is enabled API Key field appears as Text box. Else if checkbox above is disabled API Key field appears as a drop down to select fields from previous steps.
5
Button: Test Connection
Test connection with the API provided. Verifies whether the connection is available or not.
Note: If the connection fields are provided from previous step, then Test Connection Button does not work.
Input Tab:
No.
Field Name
Description
Input Fields:
1
Path/URL
Specify the path of the image file to be converted to text or click the Browse button to browse the file path.
2
Button: Browse
Clicking on this button brings up the dialog to browse the image file to be converted to text format.
3
Type
Specify an annotation features that support optical character recognition (OCR). Specify one of the following annotation features,
- ‘TEXT_DETECTION’ detects and extracts text from any image. For example, a photograph might contain a street sign or traffic sign. The JSON includes the entire extracted string, as well as individual words, and their bounding boxes.
- ‘DOCUMENT_TEXT_DETECTION’ also extracts text from an image, but the response is optimized for dense text and documents. The JSON includes page, block, paragraph, word, and break information.
- ‘OBJECT_LOCALIZATION’
Detects multiple objects in an image and provides information about the objects and where the object was found in the image.
Output Tab:
No.
Field Name
Description
Output Fields:
1
Result
Specify an output field to hold converted json text on successful plugin execution. The default value is OutputText.
Common Buttons:
No.
Field Name
Description
Buttons:
1
OK
On click of this button. It will check the field values. If any required field values are missing, then it will display validation error message.
If all the required field values are provided then it will save the field values.
2
Cancel
On click of this button, it will cancel the window and do not save any values.
OCR: Tesseract plugin step detects and extracts text from an image to a readable text type. Supported image types: BMP, PNG, JPG, JPEG.
Compatibility: Tesseract version 4.0.0.
Prerequisites:
No.
Field Name
Description
1
Step Name
Name of the step. This name has to be unique in a single workflow.
Input Tab:
No.
Field Name
Description
Input Fields:
1
Data Folder Path
Specify the Tesseract data folder path or click the Browse button to browse the folder path (data folder path is mentioned in the prerequisites).
The data type is String. This field is mandatory.
2
Button: Browse
Clicking on this button brings up the dialog to browse the Tesseract data folder path.
3
File Path
Specify the path of the input image file to extract readable text. Alternately browse the file path.
Note: Supported image types are BMP, PNG, JPG, JPEG
The data type is String. This field is mandatory.
4
Button: Browse
Clicking on this button brings up the dialog to browse the image File path.
5
Language Code
Specify Language. (e.g. eng for English, hin for Hindi, urd for Urdu). Multiple languages can be passed. Add ‘+’ sign to extract multi-language output.
For language code refer URL:
https://muthu.co/all-tesseract-ocr-options/
Default value is: eng. The data type is String.
6
Page Segment Mode
Select Page Segmentation Mode required as per the input file type. Allowed values are 0-13. The data type is String.
Please refer table below for a list of Page Segmentation Mode with a description.
Sr. No.
Page Segment Mode
Description
1
0
Orientation and script detection (OSD) only.
2
1
Automatic page segmentation with OSD.
3
2
Automatic page segmentation, but no OSD, or OCR.
4
3
Fully automatic page segmentation, but no OSD. (Default)
5
4
Assume a single column of text of variable sizes.
6
5
Assume a single uniform block of vertically aligned text.
7
6
Assume a single uniform block of text.
8
7
Treat the image as a single text line.
9
8
Treat the image as a single word.
10
9
Treat the image as a single word in a circle.
11
10
Treat the image as a single character.
12
11
Sparse text. Find as much text as possible in no particular order.
13
12
Sparse text with OSD.
14
13
Raw line. Treat the image as a single text line, bypassing hacks that are Tesseract-specific.
Output Tab:
No.
Field Name
Description
Output Field:
1
Output Text
Specify an output field to hold converted text on successful plugin execution. The default value is OutputText.
Common Buttons:
No.
Field Name
Description
Buttons:
1
OK
On click of this button. It will check the field values. If any required field values are missing then it will display validation error message.
If all the required field values are provided then it will save the field values.
2
Cancel
On click of this button, it will cancel the window and do not save any values