Dataset operations

Dataset transforms

Dataset transform  consists of combining and transforming datasets to suit your needs. You can apply the following transforms in a dataset.
  • Deduplicate
    • Row-wise
    • Column-wise
  • Join
  • Append
  • Pivot
  • Unpivot
  • Calculated columns
  • Derive dataset
  • Mark as personal data
All the dataset transforms are present in the top bar transform menu



Deduplicate

You can remove duplicate records from your data using the Deduplicate transform. This can be done in two ways: row-wise and column-wise.

Row-wise

This method removes rows with duplicate data, allowing only unique rows to be present in your dataset.

To apply row-wise deduplication:

1. Click the Transform menu, click Deduplicate, then select Row-wise

2. You can choose to ignore case and whitespace while removing duplicate rows.

Note: There could be instances where no duplicates were found in the sample dataset. You can still apply the rule to remove duplicates rows when the entire dataset is processed during export.

3. A live preview will be shown with the duplicate rows highlighted in red.



4. Click Remove duplicates

Column-wise

You can also select single or multiple columns and choose to Deduplicate. The transform removes rows based on duplicate values present in the selected columns. 

To apply column-wise deduplication:

1. Click the Transform menu, click Deduplicate then select Column-wise

2. You can choose to ignore case and whitespace to find duplicates. 

3. You can choose one of the two methods to deduplicate your dataset based on the selected column: Automatic deduplication or Manual conditions

4. When you choose the Automatic deduplication method, DataPrep works for you to deduplicate your data based on the columns you've selected. 



5. When you choose the Manual conditions method, you will need to enter the conditions and expressions and construct the 'if' statements. You can then select which rows to keep, or remove, within each of the duplicate cluster if the condition is true.

 

6. The following table lists the available  If  conditions for all the data types.  Click here  to know more about data types.

Text

Numeric

Datetime

Duration

Boolean

List

Map

 contains

 =equal to

 = equal to

 is smallest

 is true

 has value

 has key

 doesn't contain

 != not equal to

 != not equal to

 is largest

 is false

 is empty list

 is empty map

 begins with

 > more than

 is earliest

 = equal to

 contains

 is not empty   list

 is not empty   map

 ends with

 < less than

 is latest

 != not equal to

 doesn't contain

 is cell empty

 is cell empty

 is

 >= more than or   equal

 is after

 is cell empty

 begins with

 is cell not   empty

 is cell not   empty

 is not

 <= less than or   equal to

 is before

 is cell not empty

 ends with

 use regex

 use regex

 is cell empty

 is smallest

 on or after

 use regex

 is

 use patterns

 use patterns

 is cell not   empty

 is largest

 on or before

 use patterns

 is not

 

 

 use regex

 is cell empty

 is cell empty

 

 is cell empty

 

 

 use patterns

 is cell not   empty

 is cell not   empty

 

 is cell not   empty

 

 

 

 use regex

 use regex

 

 use regex

 

 

 

 use patterns

 use patterns

 

 use patterns

 

 


7. You can also keep adding more conditions using the AND and OR operators to apply deduplication using a combination of conditions.
For example, you can write a condition that goes like this, "If the  mail  column contains  zoho.com keep  those rows", i.e., 
Enter conditions to select which 
If mail contains zoho.com

8. With the Advanced option, you can insert functions and provide conditions to remove duplicates. 



9. Click the Preview button to see which rows will be removed during the transformation.

10.  You can also select multiple columns for deduplication using (+) in Columns to de-duplicate

Join 

You can join two datasets together using common columns using the Join transform. Click here to see a quick video of Join transform.

For example, consider a dataset which has the purchase data of a store and another dataset with their customer information. These datasets can be combined using a common column such as the customer ID to identify and match the records from both the datasets.

DataPrep offers all four types of joins: inner join, left join, right join, and outer join. 

To perform a join transform:

1. Click the Transform menu, click Combine, then select Join

Info: You can also click the  icon or right-click the stage in the Pipeline builder page and choose the Join transform.



2. In the Join dataset  dialog, you can choose the dataset with which you are going to join the current dataset and the join type.    

3. Select the dataset that you want to join your current dataset using the Choose a dataset to join drop-down.

A new dataset will be created with the result of the join transform.


 
DataPrep automatically calculates and shows the Join potential between the current dataset and other datasets from your workspace. The join potential is shown in percentage, making it easier for you to choose a dataset for the Join transform. It is a calculation based on various factors like the amount of matching data in a column and matching column names between the datasets.

4.  Choose the type of join using the Join type option. You can also change the type of join from the Join type menu in the Transform panel. 

 

5.  Enter a name for the new dataset in the New dataset name box in the Transform panel. 

6. Select the columns using which you want to join the two datasets in the  Matched columns section.

Resolve repeated column names

 
7. Click the  Preview  button. The Resolve repeated column names  dialog appears, if there are any columns with the same name. You will have to rename or delete them to resolve duplicate columns.

8. In the preview, you can unselect the columns that you do not want to include in the resultant dataset using the check boxes available in the column headers of the preview.    

9. Click Join to join the two datasets.



10.  Once you've successfully joined the two datasets, you can click Open the joined dataset.



11. If you wish to view the join configuration, you can click open the Data source configuration from the Ruleset  pane from the joined dataset.



12. You can also click  Edit join  to update the join configuration.



Joins are typically used to combine two different datasets on the common items. Let's look into each type of join with an example. 

Inner join

An inner join combines two datasets using common columns and discards all the unmatched rows. 

The inner join is best represented as: 



Let us take the example of the following datasets. 

Orders dataset 



Customers dataset 



Order ID is the common column here. Inner join has combined the rows of the two datasets that shared values in the Order ID column and discarded the unmatched values from both Orders and Customers datasets. 



Left join


A left join combines the current dataset with another dataset using common columns, and discards all unmatched rows from the other dataset.

Left join can be represented as: 



Let's take the example of the following datasets:

Orders dataset



Customers dataset



Left join  has combined the rows of the two datasets that shared values in the Order ID column. The unmatched values were only discarded from the Customers dataset. 



Right join 


A right join combines the current dataset with another dataset using common columns and discards all unmatched rows from the current dataset.
Right join can be represented as:



Let's take the example of the following datasets: 

Orders dataset



Customers dataset



Right join has combined the rows of the two datasets that shared values in the Order ID column. The unmatched values were only discarded from the Orders dataset. 


Outer join

An outer join combines two datasets using common columns and includes all the rows from both datasets (including all unmatched rows). An outer join can be represented as:



Let's take the example of the following datasets.

Orders dataset



Customers dataset



Outer join has combined the rows of the two datasets that shared values in the Order ID column. The unmatched values are kept from both Orders and Customers datasets. 


Append

In DataPrep, you can append one dataset with another and create a new dataset using the Append transform. Click here to see a quick video on Append transform.

To append datasets

1. Click the Transform menu, click Combine , then select Append

Info: You can also click the  icon or right-click the stage in the Pipeline builder page and choose the Append transform.



2.  In the Append dataset  dialog, you can choose the dataset that you want to append to the current dataset. 

A new dataset will be created with the result of the append transform.



3.  Enter a name for the new dataset in the New dataset name box in the transform panel.

4. DataPrep compares the column names of the two datasets and shows the unmatched columns so can choose to include or exclude them. For the matched ones, incoming data is added directly below the columns that have matched.



5. Click the Preview button and DataPrep will show a live preview of the append transform.



6. Click  Append  to apply the transform. Once you've successfully combined the two datasets by appending the rows, click Open the combined dataset.



7.  If you want to edit the append configuration, you can open the Data source configuration from the Ruleset pane from the newly created dataset and edit the configuration. 


Pivot 

Pivot Table distributes data for easy consumption. It spreads out the data in long, winding tables by converting categories into columns. Pivot can be created by selecting the Column , Row , and Data fields. Click here to see a quick video of Pivot transform.

To apply the pivot transform

1. Click the  Transform  menu and select the Pivot  option. The Pivot pane will slide open to view.  

Info: You can also click the  icon or right-click the stage in the Pipeline builder page and choose the Pivot transform.




2. Select the fields that you want to convert to columns in the   Columns  section. Choose the fields to feature as rows in the  Rows  section and the fields with data in the  Data  section.



Note: You can only choose fields with numerical data in the Data section.

3.  You can also select the  Filters  tab if you want to apply any filters. Using filters, you can selectively filter data based on the filter conditions applied over one or more columns. If you want to learn more about filters, click  here .



4. Select the  Sort  tab if you want to rearrange the order of your columns and rows. The fields added under the  Pivot  tab are also shown in the  Sort  tab. You can sort the data by rows and by columns using the  By rows  and  By columns  sections.

 You can sort data in ascending or descending order, or perform a custom sort by selecting the options in the  By columns  and  By rows  sections. 

 You can view a preview of the table by clicking the  Preview  button. Click  Reset  to set back  to the default selection (Ascending).

For example, you can select the  Ascending  and  Descending  options in the Product category and Region fields in the  By columns  sections, and select the  Ascending  option in the Customer Name filter in the  By rows  section to sort the Sales Data. The data sorted will look like as shown in the preview below.



You can also customize the arrangement by selecting the  Custom  option. You can drag and drop or use the up and down arrows to arrange the rows and columns.



Note: The filter and sort functionalities are optional.

5. You can add fields to the  Data  section, and you can choose one of the aggregate functions. The functions can get you the sum, count, average, etc. depending upon the data type of the fields added. The list of functions available for each data type is given in the table below.

Datatype

 Functions

Number

Sum

Maximum

Minimum

Average

Standard deviation

Median

Mode

Percentile

Variance

Count

Distinct count

Text

Count

Distinct count

Date

Count

Distinct count

Maximum date

Minimum date

List

Count

Distinct count

Map

Count

Distinct count


Note:  The functions for the list data type counts the values and t he functions for the map data type counts the keys.

6.  Click the   Preview  button and DataPrep will show a live preview of the pivot transformation.



7. When two or more fields are selected in the  Columns  box, the column names are joined using a dash or hyphen (-). 

8.  In the new dataset, you can open the data source settings from the  Ruleset  pane and edit the pivot configuration. 

9.  Click  Pivot  to apply the pivot configuration to your dataset. 

Unpivot

Unpivot converts columns to rows. The Unpivot transform is useful in condensing data and the data is often exported to analytics software for creating reports and dashboards. The result is saved as a new dataset when the transform is applied.

To apply unpivot transform:

1. Click the Transform menu and select Unpivot option.

Info: You can also click the  icon or right-click the stage in the Pipeline builder page and choose the Unpivot transform.



2. Enter a name for the new dataset in the New dataset name  field.  

3. Select the columns to unpivot in the Columns to apply box. These are the columns that will be converted into rows.

4.  A new column will be created using the column headers of the selected columns. Enter a name for this new column in the   Column name for headers field.  



5. The values under the columns selected will be added to a separate column. Provide a name for this column in the  Column name for values  field. 

6. Click on the Preview button and DataPrep will show a live preview of the transformation.

 

7. Click  Apply  and apply the unpivot transform.

Formula column

You can create new columns and customize them using the Formula column transform. Zoho DataPrep offers a variety of functions to suit your needs. Click  here to know more about the functions.  

To create a formula column

1. Click on the  Transform menu and select the  Formula column option.

2. Provide a name for new column name in the  New column name field.

3. You can insert the functions in the  Formula field by typing the function names, or use the intelliSense to complete the functions. You can also apply filters and search functions in the  Click to insert functions section.

4. You can also click the OpenAI ChatGPT tile to generate formula by sending prompts to ChatGPT. Click here to know more.

5. You can add parameters to the formula, or use intelliSense to choose columns. You can also search for column names in the Click to insert columns section.  




6. DataPrep shows a live preview of the changes made to the formula when you click on the  Preview button. 

7. Click  Apply to apply the changes.

Derive dataset

The Derive dataset transform allows you to create a branch of your dataset from the last applied transform in your dataset. The new dataset will have the most recent state of your data with an empty ruleset.

To derive dataset                 

1. Click the  Transform menu and select the  Derive dataset option. 

2. Enter a name for your derived dataset under the New dataset name field.
 

3. Click  Apply to create your derived dataset. 

Now you've successfully created the derived dataset. The newly created dataset will have the most recent state of your data with an empty ruleset. 

4.  You can choose to click  Open and start using the new dataset.  

5. You can click the  Data source configuration in the derived dataset to view the source details of the parent dataset.



Mark as personal data

You can mark a column that contains PII (Personal Identifiable Information) or personal data and ePHI (Electronic Protected Health Information ) data using the Mark PII and ePHI data transform. You can also apply security methods on the columns to protect your data and choose to include or exclude these columns during export.

 

To mark columns with PII data or personal data      

1. Click the Transform menu and select the Mark PII and ePHI data option.

 You can also right-click on a column and select the Mark PII and ePHI data option from the context menu.

 

2. Add personal data columns to the Mark columns with personal data section.

 

3. Click Apply to mark the selected columns as personal data.




To mark columns with ePHI data or health data

1. Click the Transform menu and select the Mark PII and ePHI data option. 

Info: You can also right-click on a column and select the Mark PII and ePHI data option from the context menu.

2. Add columns containing health data under the Mark columns with ePHI data section.



3. Click Apply to mark the selected columns as ePHI data columns.

To protect personal data or ePHI data during export  

 

1. On the Pipeline Builder page, once you have completed creating your data flow and applying the necessary transforms in your stages, you can right-click a stage and select the Add Destination option.



 

 

2. From the side pane, you can choose the destination where you want to export data. For example, let us choose Files as the destination.

3. Choose the columns with personal data or ePHI data to be included during export using the corresponding check boxes.

Note : Columns not marked as personal data or ePHI data will be included by default.
 


4. Choose the required security method from the drop-down to protect your personal data and click Next.

There are three security measures that can be applied to the personal data or ePHI data columns. These security measures are used to protect sensitive data such as Personally Identifiable Information (PII).

Security measures to protect personal data or ePHI data

1. Data masking
Data masking hides original data with 'x' to protect personal information.
 
2. Data Tokenization
Data tokenization replaces each distinct value in your data with a random value. Hence the output is statistically identical to the original data.
 
3. None
You can select None if you do not want to use any security method.

Note :
1. Exporting datasets
without securing personal data will be restricted if Secure data exports option is enabled in your organization's privacy settings. Click here to know more.
2. Based on the Compliance settings of your organization, exporting datasets with ePHI data may be restricted or, exporting datasets without securing ePHI data with security measures and password protection would be restricted. Learn more
5. Provide the required values, and click Export.



Note: You can also review how personal data or ePHI data is protected in all your workspaces under the Review personal data in all workspaces section. Click here to know more.

SEE ALSO

    Access your files securely from anywhere

      Zoho CRM Training Programs

      Learn how to use the best tools for sales force automation and better customer engagement from Zoho's implementation specialists.

      Zoho CRM Training
        Redefine the way you work
        with Zoho Workplace

          Zoho DataPrep Personalized Demo

          If you'd like a personalized walk-through of our data preparation tool, please request a demo and we'll be happy to show you how to get the best out of Zoho DataPrep.

          Zoho CRM Training

            Create, share, and deliver

            beautiful slides from anywhere.

            Get Started Now


              Zoho Sign now offers specialized one-on-one training for both administrators and developers.

              BOOK A SESSION









                                            You are currently viewing the help pages of Qntrl’s earlier version. Click here to view our latest version—Qntrl 3.0's help articles.




                                                Manage your brands on social media

                                                  Zoho Desk Resources

                                                  • Desk Community Learning Series


                                                  • Digest


                                                  • Functions


                                                  • Meetups


                                                  • Kbase


                                                  • Resources


                                                  • Glossary


                                                  • Desk Marketplace


                                                  • MVP Corner


                                                  • Word of the Day


                                                    Zoho Marketing Automation

                                                      Zoho Sheet Resources

                                                       

                                                          Zoho Forms Resources


                                                            Secure your business
                                                            communication with Zoho Mail


                                                            Mail on the move with
                                                            Zoho Mail mobile application

                                                              Stay on top of your schedule
                                                              at all times


                                                              Carry your calendar with you
                                                              Anytime, anywhere




                                                                    Zoho Sign Resources

                                                                      Sign, Paperless!

                                                                      Sign and send business documents on the go!

                                                                      Get Started Now




                                                                              Zoho TeamInbox Resources



                                                                                      Zoho DataPrep Resources



                                                                                        Zoho DataPrep Demo

                                                                                        Get a personalized demo or POC

                                                                                        REGISTER NOW


                                                                                          Design. Discuss. Deliver.

                                                                                          Create visually engaging stories with Zoho Show.

                                                                                          Get Started Now









                                                                                                              • Related Articles

                                                                                                              • Derive dataset

                                                                                                                The Derive dataset transform allows you to create a branch of your dataset from the last applied transform in your dataset. The new dataset will have the most recent state of your data with an empty ruleset. To derive dataset 1. In the DataPrep ...
                                                                                                              • Filter

                                                                                                                The filter option allows you to filter data based on custom conditions applied over one or more columns. The filter can also be used without applying them as a rule. However, you can choose to apply them as a rule by either keeping or deleting the ...
                                                                                                              • Deduplicate

                                                                                                                You can remove duplicate records from your data using the Deduplicate transform. This can be done in two ways: row-wise and column-wise. Row-wise This method removes rows with duplicate data, allowing only unique rows to be present in your dataset. ...
                                                                                                              • Mark PII and ePHI data

                                                                                                                You can mark a column that contains PII(Personal Identifiable Information) or personal data and ePHI (Electronic Protected Health Information ) data using the Mark PII and ePHI data transform. You can also apply security methods on the columns to ...
                                                                                                              • Window functions to add formula columns

                                                                                                                A window function is used to perform calculations across a group of rows on a dataset. A set of rows is termed a window. You can use window functions to perform summations and calculations based on a rolling window of data, relative to the current ...
                                                                                                                Wherever you are is as good as
                                                                                                                your workplace

                                                                                                                  Resources

                                                                                                                  Videos

                                                                                                                  Watch comprehensive videos on features and other important topics that will help you master Zoho CRM.



                                                                                                                  eBooks

                                                                                                                  Download free eBooks and access a range of topics to get deeper insight on successfully using Zoho CRM.



                                                                                                                  Webinars

                                                                                                                  Sign up for our webinars and learn the Zoho CRM basics, from customization to sales force automation and more.



                                                                                                                  CRM Tips

                                                                                                                  Make the most of Zoho CRM with these useful tips.



                                                                                                                    Zoho Show Resources