Optical Character Recognition
(OCR) model is a text-recognition model that identifies and extracts text (both printed and handwritten) from
digital images and PDFs
. You can train the model to scan either a
and extract only the required information by using machine learning. This is especially useful when you want to process and retrieve structured data from a large volume of unstructured data.
This structured data can then be stored and processed by the businesses whenever required, thereby helping them simplify and automate their data entry processes. For example, structured data is the date and time of an email, whereas unstructured data is the entire content of the email itself.
Creator supports two types of OCR models: you can build custom models suited to your business needs, or choose a ready-to -use (prebuilt) model that is ready to deployed in your applications for many common business scenarios.
You can build custom OCR models that can be trained to identify and extract only the required values. The custom OCR model utilizes an ensemble of industry leading text recognition technologies to identity and highlight the text in the case of custom OCR models. All the extractable text identified by the model will be now be highlighted to show that they're untagged values. You can then add and tag the fields whose values you want to extract from the images. The model can then be trained to extract and process the required text found in your images.
- The OCR model can extract text from images irrespective of font types.
- The model can detect both printed and handwritten text. It is recommended to use printed text. If there's too much variation w.r.t handwritten text, then the model might find it hard to process the required text.
The following GIF shows the required text values being extracted from the input image (Invoice). Similarly, you can extract text values from input PDFs.
OCR model prerequisites
You can build and train custom OCR models tailored to suit your business needs. Additionally, you can utilize our
ready-to-use OCR model
, which can be directly deployed into your applications.
Let's say you want to extract text from a certain set of input images. In this case, a custom OCR model would be better suited. In some cases, you might want to extract all the detected text from the input image. The ready-to-use OCR model can be used in this case. This applies to extracting text from PDFs as well in both custom and
ready-to-use OCR models.
As the admin, you can create and use OCR models, whereas your users can only consume the models you've created.
You should know enough about all your business requirements to determine the dataset that'll be used for training your model.
As a low-code platform, Creator doesn't require you and your users to have prior coding and
(ML) skills to create and consume the prediction models.
What data do you need?
Image based OCR models
You must have at least five images of similar layout to be uploaded as training data.
The images can also be of different layouts, provided you tag the extraction values correctly.
PDF based OCR models
In the case of custom OCR models, you must have at least
PDFs containing upto
pages of similar layout to extract the required text.
Which pricing plan must you be in?
AI Models will be available for users in
plans. Refer to our
to know more.
Which version of Creator should you be using?
You must be using Zoho Creator 6 (C6) to be able to create custom AI models, whereas the
ready-to-use AI models
are available in both C6 and C5.
Let's assume that you've built
Zylker's Invoice Processing
app using Creator. You've a form named
, in which you add the details of your invoice along with a digital copy of those invoices. You need to extract certain data from the invoice such as the invoice date, invoice number, due date, and the billing address. This can be done manually by relying on paper invoices to process payments and maintain accounts. However, when multiple entries are involved, automating the extraction process saves a great deal of time and manual work.
Here's how you would use the OCR model in the above case:
Create a model
Identify the values to be extracted and select fields with their respective field types to store the values. Here,
will be the values to be extracted.
Upload sufficient training data of similar/different layouts and tagging the defined fields whose values need to be extracted. Here, you'll need to upload
of your Invoices .
Train the model
Deploy the model
Select an image or a file upload field in the form which will contain the input for the model.
Add the fields defined earlier to store the extracted values from the image field. Here,
will be the fields that'll store the values to be extracted (refer above GIF) .
in live to get the required values. The input in this case will be the either an image or a PDF of your
OCR Model Flow
Setting up your model
Add training data
Training data is the initial dataset that is used by the model to analyze data patterns, make interpretations, and arrive at a conclusion that helps it recognize text from the training data. To train an OCR model, you need to gather sufficient images or PDFs of similar and different layouts. Next, you need to identify the values that you want to extract from the images you've gathered or PDFs you've created. Once the training data has been finalized, you can proceed to add fields to your OCR model.
The model outcomes may not be always accurate, which is also the case with any AI.
The model outcomes are dynamic. The same input can produce different outcomes at different times based on how much the machine has learned. This implies that as you continuously
a model, it is learning continuously.
In Creator, form fields store the values entered by users. Similarly, the values that you want to extract will be displayed in the respective fields. Adding fields is used establish their definition them so that when the model is implemented in your Creator application, these fields will be listed in that application's form. You can select/deselect the pre-defined fields as required. Now that you've identified the values you need to extract from your training data, you need to add fields and their corresponding
The supported data types include
Add Training Data
The images that you gathered or the PDFs you created earlier will come in handy now. The images or PDFs can be pictures can be documents that include bills, checks, invoices, passports, receipts, and so on. The text in these documents can be handwritten and/or printed, although printed text is preferable.
: You can upload both handwritten and printed images or PDFs. Make sure to tag the extraction values correctly in all images so that the model can identify and extract the values as desired.
After tagging all the added fields, you can add new fields directly from here if required. Just click
Add New Field
in the tag fields dropdown. This option will be shown only if the fields added is less than 10 per model.
Once you've uploaded the required training data, the text in each of them will appear highlighted. Next, you need to tag values for the fields you'd added earlier in all the uploaded images/PDFs. Tagging here refers to mapping or associating the added field to the value it must extract and display. You can tag a value by selecting and dragging over the corresponding value in all the images/PDF pages. For example, if you've added an
field whose datatype is number, you must tag the value of the invoice number in the image. This is done so that the OCR model recognizes that these are the field values that need to be extracted from the input data.
In case of uploading PDFs, you'll be able to choose from the following actions.
Fit to Width
Fit to Page
Supported formats for images include
Maximum size of each uploaded image cannot exceed
Overall model size must not exceed
Maximum size of each uploaded PDF cannot exceed
pages of similar layout per PDF is required in case of custom OCR models.
(for print and handwritten text)
Currently only English is supported.
Upload clear images.
If you upload images of different layouts, ensure you tag the extraction values properly.
A minimum of five images of similar layout needs to be uploaded for proper recognition and extraction.
You have to tag each field to its corresponding values in a minimum of five images for the training to be successful.
A minimum of 1 field with its field type and a maximum of 10 fields with their field types can be extracted per model.
To add an OCR field in your form, you must have an
field as the source field in that form.
here refers to the field in which the input image will be uploaded for the model to identify and extract text. If the supported field type isn't available in the form, you will need to first create one in order to deploy the OCR model.
After adding the training data, you can review the
number of images
added. If you need to make any modifications, you can go back and make them. Otherwise, you can proceed to train the model.
Before you can actually use your OCR model in your application, you have to train it to perform the way you want. After you've selected and reviewed your training data, click
to train your model.
: Training might take some time, so you can stay on the same page and wait, or you can close the page and come back later. The training time depends on the model size and the number of training models in the queue.
View and manage model details
Once training your model is complete, you can view the model details, the model's versions, and its deployment details, if any. The model is now ready to be published and deployed into your apps.
You can manage your model in the following ways:
- Since new data is always being generated, we recommend that you periodically retrain your model. This helps in improving the OCR model's reliability and accuracy in extracting values.
You can click
and your model will be retrained.
After each retraining is over, a new version of the model will be created. You can
between different versions according to your needs.
If you want to delete a version that is currently being used, you need to switch to another version before proceeding to delete it.
If the model training fails, you'll see 'Model training has failed!' Meanwhile, the previous working model will be used for prediction.
option is available only when you add training data via form fields. This is because you can train your model as and when new records are continuously added to your application.
When adding data via CSV file, you can delete the model, upload a new file, and then train the model again.
- You can rename your model if required.
- If you want to delete your model due to inconsistent or wrongly-added data, you can use the
It is recommended (but not mandatory) to test your model before publishing and deploying it in your applications.
After training and before publishing, you can test your model to check and know how your model works and if the training is satisfactory, before deploying it in your applications. You can upload test data and after testing your model, you'll get the extracted values as outcome.
If you don't find your outcomes as expected, you can either retrain your model or edit your model details and train the model again. Retrain refers to simply training the model (without making any edits) again as the model is continuously learning.
While testing your model, you can upload only one image at a time.
For OCR model guidelines, click
Improve your OCR model performance
After you've trained, tested, and evaluated your model, you can tweak your model to improve its performance. Now here are some things you can try to help improve your model's performance.
While gathering images or creating PDFs, ensure that they contain well-aligned characters so that the model can easily recognize the text.
Ensure to gather and upload high-quality images and PDFs. The higher the original training data's quality, the easier it is to separate the characters from the rest, and the higher the accuracy of the OCR model would be.
You can try to increase the contrast between text and background to bring more clarity to the output.
When employed on printed text, the OCR engine is highly accurate.
Ensure that the images or PDF pages aren't turned upside down. Make sure you get the image in the right format and the text should appear horizontal and not inclined.
After you train your model, you can publish it to make it available to your users and start making predictions.
You can publish your model only once. In case you don’t want your users to use the model, you can delete the model.
To use your OCR model in an environment-enabled Creator application, you must have at least one version of that application published in the
. After deploying your model in the application, you can filter between different
stages of the environment
to check which stage is the model deployed in.
After you've published your model, you need to select the application and form in which you want to deploy your model. You'll be redirected to the chosen form, where the
that you defined earlier will be listed in the form builder. You can select/deselect the fields as required. The fields that have been deselected will not be added in that form. For example, you might be using the same OCR model in two forms. You might not require the same set of fields in both the forms and can deselect the ones that are not required.
A new OCR field will be added in your form, in which you can upload an image of your choice. The OCR model will analyze and display the extracted values in the defined fields.
Note: Ensure to add an image or a file upload field without references i.e these fields haven't been used anywhere, to your form before selecting a trained model.
Get started with sample data
To help you get started quickly with OCR, we provide sample data that you can readily use in your Creator applications. You can scroll down and download the attachment