Split

Split




DataPrep allows you to split the columns based on the given input. A column can be split into multiple columns using a delimiter based on the chosen split option.
​​​​
You can divide a column into multiple columns using the Split  transform. 

To split a column

1. Right-click the column name and select  Split  from the context menu. 

2. Choose one of the Split options and provide the required input based on which the column should be split. 

The options available are:
  1. Start and end index
  2. Start and end length
  3. Delimiter
  4. Regex
  5. Whitespaces 
3. Let's take a look at the inputs required for some of the options listed above.

Notes
Start and end index: 
Start index - The index at which the delimiter starts. The default start index is 1.
End index - The index at which the delimiter ends.

Notes
Start index and length: 
Start index - The index at which the delimiter starts. The default start index is 1.
Length - The character length of the delimiter.

Notes
Delimiter: 
Delimiter - Split using the text or pattern as the delimiter.
Starting delimiter -  Split using the text or pattern matched starting from the start delimiter.
Ending delimiter -  Split using the text or pattern matched before the ending delimiter. 

Notes
Regex: 
Regex pattern - Enter the delimiter using regular expression.



Notes
Ignore case - Turn off case sensitivity when matching text or pattern.
Number of matches to split - Specify the number of matches to be split as columns. The minimum number is 2.

4. You can also choose to store the extracted value in a 'column' or as a 'list' using the Store output as option.
 
5. DataPrep shows a live preview of the changes made to the column.

6. You can apply this transform to multiple columns. Select the columns using the  option under the  Columns to apply section.  

To apply filters

If you want to apply some filters along with this transform, you can use the filters functionality.

1. Click the  Filters  tab.

2. Click the   icon and add the required columns in the  Filters  section. You can also reorder the filters using the drag and drop method.


3. For every column added, you can select one of the following options from the drop-down:
  1. Actual: This option lets you filter rows based on the actual values in the column. Click  here  to know more.
  2. Data quality: This option lets you filter rows based on the quality of data in the column. Click  here  to know more.
  3. Patterns: This option helps you filter rows based on the data patterns in the selected column. Click  here  to know more.
  4. Seasonal: This option helps you filter rows based on the seasonal parameters such as quarter, month, week, etc. Click  here  to know more.
  5. Outliers: This option allows you to filter rows based on the outliers present in the data of the selected column. Click  here  to know more. 
Notes
Note: The filter options are displayed based on the datatype of the column added for the filter.

4. When you add more than one filter to the  Filters  section, the logical operators, AND or OR appear next to the filters. You can click to toggle the logical operator between AND and OR.
  1. Using the logical operators, you can combine the conditions and apply logic to determine the rule of precedence. The final expression is displayed in the  Criteria expression  box. You can click  Edit  to alter the default expression using logical operators and parenthesis to specify the precedence or the sequential order as to which condition should be evaluated first. Click  Save  after making the required changes. 
  1. For example, In the expression, ((1 OR 2) AND (3 OR 4)) , at first the condition ( 1 OR 2 ) will be executed and the condition ( 3 OR 4 ) will be executed next. Thirdly, since, the AND operator is used, the filter will be applied when both the conditions are true.
5. You can further drill down to choose specific values based on the filter option selected for each filter, in the next section.


For example, in the above screenshot, the  Data quality  option is selected for the All columns filter in the  Filters  section. Based on the selection, further options to filter specific values are displayed in the  All columns (Data quality)  section.

6. You can choose to include or exclude the selected items in the last section.

7. If you want to remove all the filters for some reason, you can use the  Clear  button.

8. A live preview of the filter transform is shown as you make changes. 

9. Click the  Apply  button to apply the transform along with the filters.

SEE ALSO