Dataset operations

Dataset transforms

Dataset transform  consists of combining and transforming datasets to suit your needs. You can apply the following transforms in a dataset.
  • Deduplicate
    • Row-wise
    • Column-wise
  • Join
  • Append
  • Pivot
  • Unpivot
  • Calculated columns
  • Derive dataset
  • Mark as personal data
All the dataset transforms are present in the topbar transform menu



Deduplicate

You can remove duplicate records from your data using the Deduplicate transform. This can be done in two ways: row-wise and column-wise.

Row-wise

This method removes rows with duplicate data, allowing only unique rows to be present in your dataset.

To apply row-wise deduplication:

1. Click the Transform menu, click Deduplicate , then select Row-wise

2. You can choose to ignore case and whitespace while removing duplicate rows.

Note: There could be instances where no duplicates were found in the sample dataset. You can still apply the rule to remove duplicates rows when the entire dataset is processed during export.

3. A live preview will be shown with the duplicate rows highlighted in red.



4. Click Remove duplicates

Column-wise

You can also select single or multiple columns and choose to Deduplicate. The transform removes rows based on duplicate values present in the selected columns. 

To apply column-wise deduplication:

1. Click the Transform menu, click Deduplicate then select Column-wise

2. You can choose to ignore case and whitespace to find duplicates. 

3. You can choose one of the two methods to deduplicate your dataset based on the selected column: Automatic deduplication or Manual conditions

4. When you choose the Automatic deduplication method, DataPrep works for you to deduplicate your data based on the columns you've selected. 



5. When you choose the Manual conditions method, you will need to enter the conditions and expressions and construct the 'if' statements. You can then select which rows to keep, or remove, within each of the duplicate cluster if the condition is true.

 

6. The following table lists the available  If  conditions for all the data types.  Click here  to know more about data types.

Text

Numeric

Datetime

Duration

Boolean

List

Map

 contains

 =equal to

 = equal to

 is smallest

 is true

 has value

 has key

 doesn't contain

 != not equal to

 != not equal to

 is largest

 is false

 is empty list

 is empty map

 begins with

 > more than

 is earliest

 = equal to

 contains

 is not empty   list

 is not empty   map

 ends with

 < less than

 is latest

 != not equal to

 doesn't contain

 is cell empty

 is cell empty

 is

 >= more than or   equal

 is after

 is cell empty

 begins with

 is cell not   empty

 is cell not   empty

 is not

 <= less than or   equal to

 is before

 is cell not empty

 ends with

 use regex

 use regex

 is cell empty

 is smallest

 on or after

 use regex

 is

 use patterns

 use patterns

 is cell not   empty

 is largest

 on or before

 use patterns

 is not

 

 

 use regex

 is cell empty

 is cell empty

 

 is cell empty

 

 

 use patterns

 is cell not   empty

 is cell not   empty

 

 is cell not   empty

 

 

 

 use regex

 use regex

 

 use regex

 

 

 

 use patterns

 use patterns

 

 use patterns

 

 


7. You can also keep adding more conditions using the AND and OR operators to apply deduplication using a combination of conditions.
For example, you can write a condition that goes like this, "If the  mail  column contains  zoho.com keep  those rows", i.e., 
Enter conditions to select which 
If mail contains zoho.com

8. With the Advanced option, you can insert functions and provide conditions to remove duplicates.



9. Click the Preview button to see which rows will be removed during the transformation.

10.  You can also select multiple columns for deduplication using (+) in Columns to de-duplicate

Join 

You can join two datasets together using common columns using the Join transform. Click here to see a quick video of Join transform.

For example, consider a dataset which has the purchase data of a store and another dataset with their customer information. These datasets can be combined using a common column such as the customer ID to identify and match the records from both the datasets.

DataPrep offers all four types of joins: inner join, left join, right join, and outer join. 

To perform a join transform:

1. Click the Transform menu, click Combine , then select Join

2. In the Join dataset  dialog, you can choose the dataset with which you are going to join the current dataset and the join type.    

3. Select the dataset that you want to join your current dataset using the Choose a dataset to join drop-down.

A new dataset will be created with the result of the join transform.


 
DataPrep automatically calculates and shows the Join potential between the current dataset and other datasets from your workspace. The join potential is shown in percentage, making it easier for you to choose a dataset for the Join transform. It is a calculation based on various factors like the amount of matching data in a column and matching column names between the datasets.

4. Choose the type of join using the Join type option. You can also change the type of join from the Join type menu in the Transform panel. 

 

5.  Enter a name for the new dataset in the New dataset name  box in the Transform panel. 

6. Select the columns using which you want to join the two datasets in the Matched columns section.
 
7. Click the Preview button. The Resolve repeated column names  dialog appears, if there are any columns with the same name. You will have to rename or delete them to resolve duplicate columns.

8. In the preview, you can unselect the columns that you do not want to include in the resultant dataset using the check boxes available in the column headers of the preview.    

9. Click Join to join the two datasets.



10.  Once you've successfully joined the two datasets, you can click Open the joined dataset.



11. If you wish to view the join configuration, you can click open the Data source configuration from the Ruleset  pane from the joined dataset.



12. You can also click  Edit join  to update the join configuration.



Joins are typically used to combine two different datasets on the common items. Let's look into each type of join with an example. 

Inner join

An inner join combines two datasets using common columns and discards all the unmatched rows. 

The inner join is best represented as: 



Let us take the example of the following datasets. 

Orders dataset 



Customers dataset 



Order ID is the common column here. Inner join has combined the rows of the two datasets that shared values in the Order ID column and discarded the unmatched values from both Orders and Customers datasets. 



Left join


A left join combines the current dataset with another dataset using common columns, and discards all unmatched rows from the other dataset.

Left join can be represented as: 



Let's take the example of the following datasets:

Orders dataset



Customers dataset



Left join  has combined the rows of the two datasets that shared values in the Order ID column. The unmatched values were only discarded from the Customers dataset. 



Right join 


A right join combines the current dataset with another dataset using common columns and discards all unmatched rows from the current dataset.
Right join can be represented as:



Let's take the example of the following datasets: 

Orders dataset



Customers dataset



Right join has combined the rows of the two datasets that shared values in the Order ID column. The unmatched values were only discarded from the Orders dataset. 


Outer join

An outer join combines two datasets using common columns and includes all the rows from both datasets (including all unmatched rows). An outer join can be represented as:



Let's take the example of the following datasets.

Orders dataset



Customers dataset



Outer join has combined the rows of the two datasets that shared values in the Order ID column. The unmatched values are kept from both Orders and Customers datasets. 


Append

In DataPrep, you can append one dataset with another and create a new dataset using the Append transform. Click here to see a quick video on Append transform.

To append datasets

1. Click the Transform menu, click Combine , then select Append

2.  In the Append dataset dialog, you can choose the dataset that you want to append to the current dataset. 

A new dataset will be created with the result of the append transform.



3.  Enter a name for the new dataset in the New dataset name box in the transform panel.

4. DataPrep compares the column names of the two datasets and shows the unmatched columns so can choose to include or exclude them. For the matched ones, incoming data is added directly below the columns that have matched.



5. Click the Preview button and DataPrep will show a live preview of the append transform.



6. Click  Append  to apply the transform. Once you've successfully combined the two datasets by appending the rows, click Open the combined dataset.



7.  If you want to edit the append configuration, you can open the Data source configuration from the Ruleset pane from the newly created dataset and edit the configuration. 


Pivot 

Pivot Table distributes data for easy consumption. It spreads out the data in long, winding tables by converting categories into columns. Pivot can be created by selecting the Column , Row , and Data fields. Click here to see a quick video of Pivot transform.

To apply the pivot transform

1. Click the Transform menu and select the Pivot  option. The Pivot pane will slide open to view.  

2. Select the fields that you want to convert to columns in the Columns section. Choose the fields to feature as rows in the Rows section and the fields with data in the Data section.



Note: You can only choose fields with numerical data in the Data section.

3.  You can also select the Filters tab if you want to apply any filters. Using filters, you can selectively filter data based on the filter conditions applied over one or more columns. If you want to learn more about filters, click here .



4. Select the Sort  tab if you want to rearrange the order of your columns and rows. The fields added under the Pivot tab are also shown in the  Sort  tab. You can sort the data by rows and by columns using the  By rows and By columns  sections.

 You can sort data in ascending or descending order, or perform a custom sort by selecting the options in the  By columns and  By rows  sections. 

 You can view a preview of the table by clicking the  Preview  button. Click Reset  to set back  to the default selection (Ascending).

For example, you can select the  Ascending  and  Descending  options in the Product category and Region fields in the  By columns  sections, and select the  Ascending  option in the Customer Name filter in the  By rows  section to sort the Sales Data. The data sorted will look like as shown in the preview below.



You can also customize the arrangement by selecting the  Custom  option. You can drag and drop or use the up and down arrows to arrange the rows and columns.



Note: The filter and sort functionalities are optional.

5. You can add fields to the  Data  section, and you can choose one of the aggregate functions. The functions can get you the sum, count, average, etc. depending upon the data type of the fields added. The list of functions available for each data type is given in the table below.

Datatype

 Functions

Number

Sum

Maximum

Minimum

Average

Standard deviation

Median

Mode

Percentile

Variance

Count

Distinct count

Text

Count

Distinct count

Date

Count

Distinct count

Maximum date

Minimum date

List

Count

Distinct count

Map

Count

Distinct count


Note:  The functions for the list data type counts the values and the functions for the map data type counts the keys.

6.  Click the Preview button and DataPrep will show a live preview of the pivot transformation.



7. When two or more fields are selected in the  Columns  box, the column names are joined using a dash or hyphen (-). 

8.  In the new dataset, you can open the data source settings from the Ruleset  pane and edit the pivot configuration. 

9.  Click Pivot to apply the pivot configuration to your dataset. 

Unpivot

Unpivot converts columns to rows. The Unpivot transform is useful in condensing data and the data is often exported to analytics software for creating reports and dashboards. The result is saved as a new dataset when the transform is applied.

To apply unpivot transform:

1. Click the Transform menu and select Unpivot option.

2. Enter a name for the new dataset in the New dataset name  field.  

3. Select the columns to unpivot in the Columns to apply box. These are the columns that will be converted into rows.

4.  A new column will be created using the column headers of the selected columns. Enter a name for this new column in the Column name for headers field.  



5. The values under the columns selected will be added to a separate column. Provide a name for this column in the Column name for values  field. 

6. Click on the Preview button and DataPrep will show a live preview of the transformation.

 

7. Click  Apply  and apply the unpivot transform.

Formula column

You can create new columns and customize them using the Formula column transform. Zoho DataPrep offers a variety of functions to suit your needs. Click here to know more about the functions.  

To create a formula column

1. Click on the Transform menu and select the Formula column option.

2. Provide a name for new column name in the New column name field.

3. You can insert the functions in the  Formula field by typing the function names, or use the intelliSense to complete the functions. You can also apply filters and search functions in the  Click to insert functions section.

4. You can add parameters to the formula, or use intelliSense to choose columns. You can also search for column names in the Click to insert columns section.  




5. DataPrep shows a live preview of the changes made to the formula when you click on the Preview button. 

6. Click Apply to apply the changes.

Derive dataset

The Derive dataset transform allows you to create a branch of your dataset from the last applied transform in your dataset. The new dataset will have the most recent state of your data with an empty ruleset.

To derive dataset                 

1. Click the Transform menu and select the Derive dataset  option. 

2. Enter a name for your derived dataset under the New dataset name field.

 

3. Click Apply to create your derived dataset. 

Now you've successfully created the derived dataset. The newly created dataset will have the most recent state of your data with an empty ruleset. 

4.  You can choose to click  Open and start using the new dataset.  

5. You can click the  Data source configuration in the derived dataset to view the source details of the parent dataset.



Mark as personal data

You can mark a column that contains personal data using the Mark as personal data option. You can also apply security measures on the column to protect your personal data, or choose to include or exclude this column during export.

To mark columns with personal data:
                 
1. Click the Transform menu and select the Mark as personal data option. 

2. Add the columns with personal data in the Mark columns with personal data section. 

3. Click Apply and mark the columns as personal data.

 

Security measures to protect personal data

During export, there are three security measures that can be applied to the columns marked as containing personal data:

1. Data masking  
Data masking hides original data with 'x' to protect personal information.

2. Data Tokenization 
Data tokenization replaces each distinct value in your data with a random value. Hence the output is statistically identical to the original data. 

3. None
You can select none if you do not want to use any security method. You can choose whether to export these column using the corresponding check boxes.  


SEE ALSO

    Zoho CRM Training Programs

    Learn how to use the best tools for sales force automation and better customer engagement from Zoho's implementation specialists.

    Zoho CRM Training
      Redefine the way you work
      with Zoho Workplace

        Zoho DataPrep Personalized Demo

        If you'd like a personalized walk-through of our data preparation tool, please request a demo and we'll be happy to show you how to get the best out of Zoho DataPrep.

        Zoho CRM Training

          Create, share, and deliver

          beautiful slides from anywhere.

          Get Started Now


            Zoho Sign now offers specialized one-on-one training for both administrators and developers.

            BOOK A SESSION





                        Still can't find what you're looking for?

                        Write to us:  support@zohoforms.com


                              



                            


                          New to Zoho Social?

                              Zoho Marketing Automation

                                Zoho Sheet Resources

                                 




                                    Zoho Forms Resources


                                      Secure your business
                                      communication with Zoho Mail


                                      Mail on the move with
                                      Zoho Mail mobile application

                                        Stay on top of your schedule
                                        at all times


                                        Carry your calendar with you
                                        Anytime, anywhere




                                              Zoho Sign Resources

                                                Sign, Paperless!

                                                Sign and send business documents on the go!

                                                Get Started Now


                                                    Zoho SalesIQ Resources



                                                        Zoho TeamInbox Resources



                                                                Zoho DataPrep Resources



                                                                  Zoho DataPrep Demo

                                                                  Get a personalized demo or POC

                                                                  REGISTER NOW


                                                                    Design. Discuss. Deliver.

                                                                    Create visually engaging stories with Zoho Show.

                                                                    Get Started Now











                                                                                            • Related Articles

                                                                                            • Derive dataset

                                                                                              The Derive dataset transform allows you to create a branch of your dataset from the last applied transform in your dataset. The new dataset will have the most recent state of your data with an empty ruleset. To derive dataset 1. Click the Transform ...
                                                                                            • Filter

                                                                                              The filter transform allows you to selectively filter data based on filter conditions applied over one or more columns. To apply the filter transform 1. Click the Transform menu and select the Filter option. The Filter pane will slide open to view. ...
                                                                                            • Target Matching

                                                                                              Target matching allows you to set a target and align the source dataset to match with your target before exporting data. You can attach a target dataset to match the column structure, formats, and the data types in the existing source dataset. This ...
                                                                                            • Window functions to add formula columns

                                                                                              A window function is used to perform calculations across a group of rows on a dataset. A set of rows is termed a window. You can use window functions to perform summations and calculations based on a rolling window of data, relative to the current ...
                                                                                            • Context menu

                                                                                              Zoho DataPrep offers various options in the context menu to perform column level operations.  A few basic options here are Sort, Rename, Delete and Move. To access the context menu, right click the column name and click any one of the icons to ...
                                                                                            Wherever you are is as good as
                                                                                            your workplace

                                                                                              Resources

                                                                                              Videos

                                                                                              Watch comprehensive videos on features and other important topics that will help you master Zoho CRM.



                                                                                              eBooks

                                                                                              Download free eBooks and access a range of topics to get deeper insight on successfully using Zoho CRM.



                                                                                              Webinars

                                                                                              Sign up for our webinars and learn the Zoho CRM basics, from customization to sales force automation and more.



                                                                                              CRM Tips

                                                                                              Make the most of Zoho CRM with these useful tips.



                                                                                                Zoho Show Resources