You can build and train custom prediction models to predict outcomes tailored to suit your business needs.
Now that you're aware of the type of skills and data required, the next stage is to know how to narrow down and choose the values to be fed into training data.
Let's assume that you've built an application named Zylker Insurance using Creator. You've added a form named Insurance Claim in which your customers submit claim requests. You want to find out whether or not the insurance claim requests raised by your customers are legitimate. As you can see, this is a form of binary prediction outcome model.
Now, think of the factors that'll influence the outcomes that you want the prediction model to make. For example, for the question "Is the insurance claim request legitimate or not?" think about questions like these:
You can use the above information to make your data selections.
Training data is the initial dataset that is used by the model to find patterns, make interpretations, and arrive at a prediction. Once you've finalized the data that you want to feed into your model, you can add the data in two ways: from the form fields in your application or from a CSV file. In prediction models, training data consists of base field and dependent fields in the first method of adding data and base column and dependent columns for the latter method.
You might wonder when to choose which of the above methods of adding training data. If you have sufficient records in your application to be used as training data, you can go with adding fields from your forms, whose records will be used as training data. Alternatively, if you do not have sufficient records in your application, but have your data stored in a file, you can go with the latter option of adding data from a CSV file.
Note:
The most crucial thing to consider here is whether a record/column that isn't your historical outcome column is indirectly impacted by the outcome.
Let's say you want to predict whether an order is going to be delayed. You may have the actual delivered date in your data. This date is present only after the order is delivered. If you include this column, the model will have close to 100 percent accuracy. The orders for which you want to predict the delivery date, won't have been delivered yet, so they won't have the delivered date column populated. In order to achieve accurate outcomes, you should deselect columns like this before training.
You can select the data stored in your application fields to be fed into the model as the training data.
The Prediction model supports the following field types that can be added as the base field and dependent fields. If your data has unsupported field types, they won't be shown in the field selection page.
While selecting application fields as your training data, you require two types of field data:
In the above example, the base field would be the "Is this a Fraud Claim", while the dependent fields could be"Months as Customer", "Incident Severity", and "Total Claim Amount" fields.
After you've chosen the records for your training data, by default, the data from all your records will be taken into consideration. Sometimes, you might want to focus on learning and making predictions on a specific set of records. You can define a criteria that filters a specific set of records to train your prediction model; this criteria can be set by laying out a set of conditions as per your need. You can use this step to filter your data if you're aware that the records you are using to train a model contain irrelevant information.
Let's assume an insurance company uses Creator to build an application that predicts whether an insurance claim request is fraudulent or not. To predict this accurately, the model should be trained with all the records found in the application form, thereby widening the model's comprehension.
Now, let's say that the insurance company has declared that all claim requests with the incident date before the year 2018 cannot be processed. When this is entered as a criteria, only the requests raised after 2018 will be considered for the fraud claim. You can use the criteria in this case and filter the records accordingly.
Another way of selecting the training data is to use the data stored in your files of CSV format. In case your data has unsupported field columns, they won't be shown in the column selection page.
Column Selection
The Prediction model supports training data from CSV in the following data types: number, text, date and date-time. Data from these two types can be added as the base column and dependent columns.
Any columns with unwanted data that don't fall into one of these two data types below are unsupported in CSV prediction. If your data has unsupported field column types, they won't be shown in the field selection page.
The columns that you select from this CSV will act as base and dependent columns for training.
In the above example, the base column would be the "Fraud Claim", whereas the dependent columns could be "Months as Customer", "Incident Severity", and "Total Claim Amount" fields.
After adding the training data, you can review the model details such as Model Name, Base Field/Column, and Dependent Fields/Columns. If required, you can modify the Model Name, Base Field/Column, and Dependent Fields/Columns by going back. Otherwise, you can proceed to train the model.
Before you can actually use your prediction model in your application, you have to train it to perform and produce the expected outcomes. After you've selected and reviewed your data fields/columns, click Train to train your model.
Once training your model is complete, you can view the model details, the model's versions, and its deployment details, if any. The model is now ready to be published and deployed to your apps.
You can manage your model in the following ways:
After training, you can test your model to check and know how your model works and if the training is satisfactory, before deploying it in your applications. You can upload test data and after testing your model; you'll get the predicted outcome along with an accuracy score.
The prediction model calculates the accuracy score for your trained model based on the prediction results of your test dataset. For example: if your dataset has 500 records, and the model correctly predicts 492 of them, then an accuracy score of 96 percent is shown.
After you've trained, tested, and evaluated your model, if you find that the model outcomes aren't as expected, you can edit (optional) your model to improve its performance. Here are some things you can try to help improve your model's accuracy score.
Cleaning data is the process of removing inaccurate, incorrectly formatted, duplicate, or insufficient information from a training dataset. Combining two or more data sources increases the risk of data duplication or labeling errors. Even when prediction results appear to be accurate, data mistakes can make them unreliable.
Before feeding the training data, it is crucial to clean up your data, as this will help improve your model performance. Taking the time to carefully review each row of data for typos, missing numbers, spelling mistakes, and other errors, is the best way to cleanup faulty data. By doing this, you can get rid of data that is obviously unsuitable for model training.
After you train your model, you can publish it to make it available to your users and start making predictions. Learn how
To use your prediction model in an environment-enabled Creator application, you must have at least one version of that application published in the production environment. After deploying your model in the application, you can filter between different stages of the environment to check which stage is the model deployed in. Learn how
Learn how to use the best tools for sales force automation and better customer engagement from Zoho's implementation specialists.
If you'd like a personalized walk-through of our data preparation tool, please request a demo and we'll be happy to show you how to get the best out of Zoho DataPrep.
You are currently viewing the help pages of Qntrl’s earlier version. Click here to view our latest version—Qntrl 3.0's help articles.