OCR Automation

OCR is a technology that allows you to extract text from images and transform it into a digital format that can be easily processed, searched, and used by other applications. Consider common scenarios where manual data entry from images is time-consuming and error-prone, such as verifying information on identification documents (driver's licenses and passports). OCR automation streamlines these tasks by accurately extracting the textual content as digital strings, which can then be further processed or refined and used in your RPA flows.

Supported RPA Agent Platforms: Windows

Common use cases

Extract text from scanned or photographed IDs (driver's licenses, passports) for faster, more accurate verification.
Capture serial numbers, product codes, and labels from product images for inventory and data management.
Read vehicle license plates from images for security and parking systems.
Extract key information like serial numbers, model numbers, and calibration dates from equipment photos during inspections and maintenance.

Note:

OCR functionality currently supports the recognition of printed English text on images. Handwritten text is not supported. This feature uses Tesseract OCR engine for image processing.

Get text with OCR

Configuration

Variable name: Specify the name of the output variable that will store the extracted text from the image. This variable can then be used in subsequent steps of your automation flow.

Template image path: This is the file path to a reference image that helps the bot understand, where the data you want to extract is located. It's used only during the configuration of the OCR action.

Execution image path: This is the file path to the actual image that the bot will process to extract text when the workflow is run. The bot will use the configuration and area locations from the template file to read data from this image.

Set image scale: Adjust the scale of the images ( e.g., 2 for 2x, 1.5 for 1.5x). Increase the scale to enhance the visibility and clarity of the text, and also improve the overall OCR accuracy. Values between 1 and 10 (including decimals) are supported.

Advanced options

Invert image colors: Enable this option to invert the colors of the image. This can be useful when the text color is too light compared to the background.

Apply image preprocessing: Enable this option to apply default image preprocessing techniques to improve OCR accuracy.

Extract text from:

Entire image
Extract all printed text from the image as a single string.

Specific areas in image
Extract only specific portions of text from the image

Area image: An image preview of the area you selected in the image.
Variable name: Holds the extracted text value from the selected area.

Areas relative to keys in image
Extract text based on a reference or key text found in your image. The bot will first locate the specified key text and then extract the value based on its position relative to that key. You will define how close (the relative position) the value is to the key, which is useful when the exact position of the data element might vary slightly

Key text: Text that we will use to search the area to extract.
Variable name: Holds the extracted text value from the selected area.

Advanced settings:
Key matching pattern:
Exact match: The text in the image must be an exact match to the key text you provide.
Contains text: The text in the image must include the key text you provide anywhere within it.
Text starts with: The text in the image must begin with the key text you provide.
Text ends with: The text in the image must end with the key text you provide.

Key occurrence: Specify which instance of the key text to use if it appears multiple times (e.g., 1 for the first occurrence, 2 for the second).
Anchor text: Specific text from the image used to locate the key text, especially when it appears multiple times or its location varies. It's more like a reference point on the image from where the bot will start looking for key text so that it does not have to search through the entire image.
For example, say you want to extract the billing address, which appears twice: once under "Billing Information" and again under "Shipping Information." You can use "Billing Information" as the anchor text to extract only the first instance.
Anchor occurrence: The number of times the anchor text appears before starting extraction.
Data extraction coordinates: Define the precise location and size of the data you want to extract relative to the key, using pixel offsets for X (horizontal), Y (vertical), and the width and height of the extraction area.

X: The horizontal distance (in pixels) from the left edge of the image to the top-left corner of the data you want to extract.
Y: The vertical distance (in pixels) from the bottom edge of the image to the top-left corner of the data you want to extract.

Width: The horizontal size (in pixels) of the data you want to extract, measured from the X coordinate.
Height: The vertical size (in pixels) of the data you want to extract, measured from the Y coordinate.

Note:
Supported image formats are .jpg , .jpeg, .gif, .tif , .tiff, .png, .bmp

Delay settings

Delay settings allow you to introduce a pause before or after an action. This is useful to ensure the bot waits for necessary processes to complete, such as file downloads, before proceeding with subsequent steps.

Delay before action (Time in ms): Specify the duration (in milliseconds) the bot should wait before executing the current action. This can prevent errors if the required elements or files are not immediately available.

Delay after action (Time in ms): Specify the duration (in milliseconds) the bot should wait after the current action has been completed. This can be useful for allowing systems to update or stabilize before the bot moves to the next step.

OCR Automation | Zoho RPA Help

OCR Automation

Common use cases

Get text with OCR

Configuration

Delay settings