The filter transform allows you to selectively filter data based on filter conditions applied over one or more columns.
1. Click the Transform menu and select the Filter option. The Filter pane will slide open to view.
2. Choose any of the following options in the Filter options section:
3. Click the
icon to add columns to the filters. The All columns filter is added by default. However, you can add more filters or remove the existing ones. You can also reorder the filters using the drag and drop method.
-
The All columns filter will apply the filter depending on the values available in all the columns in your dataset.
-
The Any column filter will apply the filter depending on the value available in at least one of the columns in your dataset.
4. When you add more than one filter to the Filters section, the logical operators, AND or OR appear next to the filters. You can click to toggle the logical operator between AND and OR.
Using the logical operators, you can combine the conditions and apply logic to determine the rule of precedence. The final expression is displayed in the Criteria expression box. You can click Edit to alter the default expression using logical operators and parenthesis to specify the precedence or the sequential order as to which condition should be evaluated first. Click Save after making the required changes.
For example, in the expression, ((1 OR 2) AND (3 OR 4)) , at first the condition ( 1 OR 2 ) will be executed and the condition ( 3 OR 4 ) will be executed next. Thirdly, since, the AND operator is used, the filter will be applied when both the conditions are true.
5. You can use the Clear button to remove all the filters.
6. For every filter added, you can select one of the following options from the drop-down:
- Actual: This option lets you filter rows based on the actual values in the column.
- Data quality: This option lets you filter rows based on the quality of data in the column.
- Patterns: This option helps you filter rows based on the data patterns in the selected column.
- Seasonal: This option helps you filter rows based on the seasonal parameters such as quarter, month, week, etc.
- Outliers: This option allows you to filter rows based on the outliers present in the data of the selected column.
Note: The filter options are displayed based on the datatype of the column added for the filter.
Filter Options
Filter options are categories using which you can filter values. The filter options are displayed based on the datatype of the column added for the filter.
Let us discuss each filter option in detail.
Actual
If you had selected the Actual option for the filter, you can further drill down to choose one of the following to select specific values to filter.
1. Individual values
You can select the required individual values from here. You can also search for values using the Search bar. Click the + Add new data button to add new individual values.
The Individual values tab is displayed for
text,
decimal and
number base datatypes.
2. Range
If you choose the Range option, you can select the required range of values. To customise the range, click the +Add new range button, choose Above, Below or Between, enter the values and click add.
The Range tab is displayed for
number,
decimal,
date and
datetime base datatypes.
3. Wildcard
If you choose the wildcard option, you can construct the required condition using the String or expression field from the available options:
Contains
Doesn't contain
Starts with
Doesn't start with
Ends with
Doesn't end with
Is
Is not
Matches regex
You can also toggle case-sensitivity and add whitespace, tabspace, newline, and return.
Click the
icon to add more conditions. When you add more than one condition in the Wildcard tab, the logical operators, AND or OR appear next to the conditions. You can click to toggle the logical operator between AND and OR.
Using the logical operators, you can combine the conditions and apply logic to determine the rule of precedence. The final expression is displayed in the Criteria expression box. You can click Edit to alter the default expression using logical operators and parenthesis to specify the precedence or the sequential order as to which condition should be evaluated first. Click Save after making the required changes.
For example, In the expression, ((1 OR 2) AND (3 OR 4)), at first the condition (1 OR 2) will be executed and the condition (3 OR 4) will be executed next. Thirdly, since, the AND operator is used, the filter will be applied when both the conditions are true.
Seasonal
If you had selected the Seasonal option for the filter, you can further drill down to choose one of the following options to select the required values to be filtered.
Quarter
Month
Week
Weekday
Day of the month
The Seasonal option is displayed only for
date and
datetime base datatypes.
Data quality
If you had selected the Data quality option for the filter, you can choose to filter rows based on the data quality of the column i.e., valid data, invalid data, or missing data in the selected column.
Patterns
If you had selected the Patterns option for the filter, the patterns present in the column data will be displayed in the section below. You can select the required pattern from the list, or create your own custom pattern using the +Add new pattern option.
The Patterns option is displayed for all datatypes except List and Map datatypes.
Outliers
If you had selected the Outliers option for the filter, the outliers present in the column data will be displayed in the next section below. You can select the required values from the list to filter your data.
7. In the final step, in the selection box, you can choose to include or exclude the values selected in the filter values section using the Include items or Exclude items option.
8. Click the Preview button to view the preview of the filter in the Preview pane.
9. Click the Apply button to apply the filter transform.
SEE ALSO