Custom data types are used to validate organization-specific data, such as employee ID, invoice ID, shipment tracking ID, or asset ID. By creating a custom data type, you can set a standard for your organization-specific columns in your data.
This allows you to measure the data quality for those columns like any other common data column and improves the data quality as a result.
You can create your own data type using the Create custom data type option in the Change data type transform.
To change the data type of a column
1. Right-click the column name and select the Change data type option from the context menu.
2. Select Create custom data type link from the side panel.
3. Provide a name for your custom data type in the Data type name & icon text box.
By default, an icon is created using the first two letters of the name to represent the new data type. You can customize this icon by changing the letters and the color.
4. Select an option from the Choose visibility drop-down. This determines the scope of the data type if it should be visible within the dataset or the workspace or the organization.
5. Select the required option from the Base type drop-down.
The base type upon which a custom data type can be created are: Text, Number, Decimal, and Date.
6. In the Conditions section, you can add conditions to define your data type with options to include match types and expressions.
7. To create a condition, select the match type and enter the value or an expression. The different match types are:
- Wildcard
- String length comparators
- String length values
8. Select the
icon to add more conditions, if needed.
9. When adding multiple conditions, the logical operators, AND or OR appear next to the conditions. You can click to toggle the logical operator between AND and OR.
Using the logical operators, you can combine the conditions and apply logic to determine the rule of precedence. An expression of the condition order is displayed under the Criteria expression section. You can click Edit to alter the default expression using logical operators and parenthesis to specify the precedence or the sequential order as to which condition should be evaluated first. Click Save after making the required changes.
10. You can choose to Edit to alter the default expression using logical operators and parenthesis to specify the precedence or the sequential order as to which condition should be evaluated first. Click Save after making the required changes.
11. Click the Create button and create the data type.
12. Select the Apply button to assign your custom data type to the selected column.
Examples
Case 1: Match the asset ID pattern. Macbook assets at Zykler & Co. are marked with an asset ID in the following format:
ZYKLER-002981-MAC
|
ZYKLER-002982-MAC
|
ZYKLER-002983-MAC
|
ZYKLER-002984-MAC
|
The pattern consists of the text 'ZYKLER' at the beginning, followed by an unique six-digit number and the text 'MAC' at the end. The data type "AssetID" can be created using the begins with and ends with conditions, and the AND operator to join the two conditions.
begins with: ZYKLER
AND
ends with: MAC
Case 2: Match Employee ID pattern. Zykler employees have their employee ID in the following format:
jones-1234
|
john-2211
|
Bane-123456
|
Michael-4233
|
To make sure the incoming data for the employee ID column follows this format, we can create a custom data type reflecting its pattern. The name in the first part starts with an upper or lower case character followed by a hyphen, then a unique number of 4-to-6 digits.
The pattern is matched using the regex: " [a-zA-Z]*-[0-9]{4,6} ".
With the regex option in DataPrep, you can create highly specific custom formats like these.
SEE ALSO