Base and dependent fields
While selecting application fields as your training data, you require two types of field data:
- Base field is the field for which you want to predict the outcome.
- Dependent fields are the fields that you want the prediction model to use for the prediction process.
In the above example, the base field would be the "Is this a Fraud Claim", while the dependent fields could be"Months as Customer", "Incident Severity", and "Total Claim Amount" fields.
After you've chosen the records for your training data, by default, the data from all your records will be taken into consideration. Sometimes, you might want to focus on learning and making predictions on a specific set of records. You can define a criteria that filters a specific set of records to train your prediction model; this criteria can be set by laying out a set of conditions as per your need. You can use this step to filter your data if you're aware that the records you are using to train a model contain irrelevant information.
Let's assume an insurance company uses Creator to build an application that predicts whether an insurance claim request is fraudulent or not. To predict this accurately, the model should be trained with all the records found in the application form, thereby widening the model's comprehension.
Now, let's say that the insurance company has declared that all claim requests with the incident date before the year 2018 cannot be processed. When this is entered as a criteria, only the requests raised after 2018 will be considered for the fraud claim. You can use the criteria in this case and filter the records accordingly.
Data from CSV File
Another way of selecting the training data is to use the data stored in your files of CSV format. In case your data has unsupported field columns, they won't be shown in the column selection page.
The Prediction model supports training data from CSV in the following data types: number, text, date and date-time. Data from these two types can be added as the base column and dependent columns.
Any columns with unwanted data that don't fall into one of these two data types below are unsupported in CSV prediction. If your data has unsupported field column types, they won't be shown in the field selection page.
Base and dependent columns
The columns that you select from this CSV will act as base and dependent columns for training.
- Base column is the column for which you want to predict the outcome.
- Dependent columns are the columns that you want the prediction model to use for the prediction process.
In the above example, the base column would be the "Fraud Claim", whereas the dependent columns could be "Months as Customer", "Incident Severity", and "Total Claim Amount" fields.
- A base field/column is required to predict the outcome.
- A maximum of 20 dependent fields/columns can be added to your prediction model.
- You can add up to 20 criteria per prediction model.
- 4MB is the maximum file size limit for CSV files.
- You need a minimum of 50 records for training the model. To achieve higher accuracy, you can add more than 10000 records to make the data richer.
After adding the training data, you can review the model details such as Model Name, Base Field/Column, and Dependent Fields/Columns. If required, you can modify the Model Name, Base Field/Column, and Dependent Fields/Columns by going back. Otherwise, you can proceed to train the model.
Before you can actually use your prediction model in your application, you have to train it to perform and produce the expected outcomes. After you've selected and reviewed your data fields/columns, click Train to train your model.