Dataset operations

Dataset operations

Dataset operations consists of combining and transforming datasets to suit your needs. You can perform the following operations in a dataset.
  • Deduplicate
    • Row-wise
    • Column-wise
  • Join
  • Append
  • Pivot
  • Unpivot
  • Calculated columns
  • Derive dataset
  • Mark as personal data
All the dataset operations are present in the topbar transform menu



Deduplicate

Duplicate records from your data can be removed using the Deduplicate transformation. This can be done in two ways: row-wise and column-wise.

Row-wise

This method removes rows with duplicate data, allowing only unique rows to be present in your dataset.

To apply row-wise deduplication:

1. Click the Transform menu, click Deduplicate, then select Row-wise

2. You can choose to ignore case and whitespace while removing duplicate rows.

Note: There could be instances where no duplicates were found in the sample dataset. You can still apply the rule to remove duplicates rows when the entire dataset is processed during export.

3. A live preview will be shown with the duplicate rows highlighted in red.

4. Click Remove duplicates

Column-wise


You can also select single or multiple columns and choose to Deduplicate. The operation removes rows based on duplicate values present in the selected columns. 

To apply column-wise deduplication:

1. Click the Transform menu, click Deduplicate then select Column-wise

2. You can choose to ignore case and whitespace to find duplicates. 

3. You can choose one of the two methods to deduplicate your dataset based on the selected column: Automatic deduplication or Manual conditions

4. When you choose the Automatic deduplication method, DataPrep works for you to deduplicate your data based on the columns you've selected. 

5. When you choose the Manual conditions method, you will need to enter the conditions and expressions and construct the 'if' statements. You can then select which rows to keep, or remove, within each of the duplicate cluster if the condition is true. 

6. The following table lists available conditions for the if statement. 

Data types
Text
Numeric
Date
Boolean
If conditions
 contains
 = equal to
 is empty
 contains
If conditions
 doesn't contain
 > more than
 is not empty
 doesn't contain
If conditions
 begins with
 < less than

 begins with
If conditions
 ends with
 >= more than or equal

 ends with
If conditions
 is
 <= less than or equal

 is 
If conditions
 is not
 is empty

 is not
If conditions
 is empty
 is not empty

 is empty
If conditions
 is not empty


 is not empty

Note: You cannot use conditions on columns with array or map data types. 

7. You can also keep adding more conditions using the AND and OR operators to apply deduplication using a combination of conditions.
For example, you can write a condition that goes like this, "If the mail column contains zoho.comkeep those rows", i.e., 
Enter conditions to select which 
If mail contains zoho.com



8. With the Advanced option, you can insert functions and provide conditions to remove duplicates.



9. A live preview shows which rows will be removed during the transformation.

10.  You can also select multiple columns for deduplication using (+) in Columns to de-duplicate

Join 

You can join two datasets together using common columns.

For example, consider a dataset which has the purchase details of customers and another dataset with their contact information. These datasets can be combined using a common column such as the customer ID to identify and match the records from both the datasets.

DataPrep offers all four types of joins: inner join, left join, right join, and outer join. 

DataPrep automatically calculates and shows the Join potential between the current dataset and the other datasets from your project. The join potential is shown in percentage, making it easier for you to choose a dataset for the Join operation. 




To perform a join operation: 

1. Click the Transform menu, click Combine, then select Join

2. Choose the dataset with which you are going to join the current dataset with, along with the join type. At this point you can choose to create a new dataset with the result of the join operation or update the current dataset with the result. 

3. Select the dataset that you want to join your current dataset with. 

4. Choose the type of join using the Join type option. You can also change the type of join later from the Join type menu in the bottom-left of the operations bar.

5. If you choose to create a new dataset, enter a name for the new dataset in the New dataset name box in the operations bar after the join pop up closes. 

6. Click the Preview button and DataPrep will show a live preview of the join operation along with Join info that displays useful stats. 

7. Click Join to join the two datasets.



8.  You've now successfully joined the two datasets. 
9. If you wish to view the join configuration, you can click open the Data source configuration from the Ruleset pane.



10. You can also click Edit join to update the join configuration.



Joins are typically used to combine two different datasets on the common items. Let's look into each type of join with an example. 

Inner join

An inner join combines two datasets using common columns and discards all the unmatched rows. 

The inner join is best represented as: 



Let us take the example of the following datasets. 

Orders dataset 



Customers dataset 



Order ID is the common column here. Inner join has combined the rows of the two datasets that shared values in the Order ID column and discarded the unmatched values from both Orders and Customers datasets. 



Left join


A left join combines the current dataset with another dataset using common columns, and discards all unmatched rows from the other dataset.

Left join can be represented as: 



Let's take the example of the following datasets:

Orders daaset



Customers dataset



Left join has combined the rows of the two datasets that shared values in the Order ID column. The unmatched values were only discarded from the Customers dataset. 



Right join 


A right join combines the current dataset with another dataset using common columns and discards all unmatched rows from the current dataset.
Right join can be represented as:



Let's take the example of the following datasets: 

Orders dataset



Customers dataset



Right join has combined the rows of the two datasets that shared values in the Order ID column. The unmatched values were only discarded from the Orders dataset. 


Outer join

Outer join uses common columns to combine two datasets, including the unmatched rows from both the datasets. An outer join can be represented as:



Let's take the example of the following datasets.

Orders dataset



Customers dataset



Outer join has combined the rows of the two datasets that shared values in the Order ID column. The unmatched values are kept from both Orders and Customers datasets. 

Append

In DataPrep, you can append one dataset with another in two ways. For example, if you need to include the survey data of 2015 with your current dataset 2016, you can append your incoming dataset to your current dataset directly, or you can create a new dataset to append the individual datasets.



To append two datasets: 

1. Click the Transform menu, click Combine, then select Append


2.Click the Preview button and DataPrep will show a live preview of the operation once the datasets are selected.

3. It also compares the column names of the two datasets and shows the unmatched columns so you can choose to keep or delete them. For the matched ones, incoming data is added directly below the columns that have matched.



4. In the new dataset, you can open the data source settings from the Ruleset and edit the Append configuration.

Pivot 

Pivot Table distributes data for easy consumption. It spreads out the data in long, winding tables by converting categories into columns. Pivot can be created by selecting the Column, Row, and Data fields. 

To apply pivot:

1. Click the Transform menu, then select Pivot

2. Select the fields that you want to convert to columns in the Columns box. Choose the fields to feature as rows in the Rows box and the fields with data (only numerical values) in the Data box. 

3. You change to one of these functions from the dropdown in the Data shelf.
  1. ACTUAL
  2. COUNT
  3. DISTINCT COUNT
  4. SUM
  5. AVG
  6. MIN
  7. MAX
  8. STDDEV
  9. VARIANCE 
  10. MODE
  11. MEDIAN 
4. Click the Preview button and DataPrep will show a live preview of the pivot transformation. 



5. When two or more fields are selected in the Columns box, the column names are joined using a dash or hyphen (-). You can double-click to rename the column names in the Preview area. 

6. In the new dataset, you can open the data source settings from the Ruleset and edit the Pivot configuration.

Unpivot

Unpivot converts columns to rows. Unpivot is useful for condensing data and exporting to analytics software for creating reports and dashboards. 

To unpivot:

1. Click the Transform menu, then select Unpivot.

2. Select the columns to unpivot and enter the column names for headers and values. 

3. Click on the Preview button and DataPrep will show a live preview of the transformation. 



4. Enter a name for the new dataset and apply the transformation. 

5. In the new dataset, you can open the data source settings from the Ruleset and edit the Unpivot configuration. 

Formula column

You can create new columns and customize them using the Formula column option. 

To create a formula column:

1. Click on the Transform menu, then select Formula column

2. You can call the functions in the text field, or use the drop-down menus to select columns and apply functions directly. 

3. DataPrep shows a live preview of the changes made to this column when you click on the preview button. 




To view the list of all formula column functions available in Zoho DataPrep, click here.

Derive dataset

Derived dataset allows you to create a branch of your dataset from the last applied transformation in your dataset. The new dataset will have the most recent state of your data with an empty Ruleset.

To derive from your dataset: 

1. Click the Transform menu, then select Derive dataset

2. Enter a name for your derived dataset under New dataset name

3. Click Save to create your derived dataset. 

4. The data source details in your derived dataset will show derived from the name of your parent dataset.

Mark as personal data

You can mark a column as containing personal data and secure the column data using the Mark as personal data option.

To mark columns with personal data: 

1. Click the Transform menu, then select Mark as personal data

2. Select the columns with personal data in the operation panel. 

3. Click Apply

During export, there are three security measures that can be applied to the columns marked as containing personal data:

1. Data masking 
Data masking hides original data with 'x' to protect personal information.

2. Data Tokenization 
Data tokenization replaces each distinct value in your data with a random value. Hence the output is statistically identical to the original data. 

3. None
Marking none will mark the selected column as personal data column. You can then choose or not to export this column later while configuring export

    Zoho DataPrep Personalized Demo

    If you'd like a personalized walk-through of our data preparation tool, please request a demo and we'll be happy to show you how to get the best out of Zoho DataPrep.

    Zoho CRM Training

      Create, share, and deliver

      beautiful slides from anywhere.

      Get Started Now





              Zoho CRM Training Programs

              Learn how to use the best tools for sales force automation and better customer engagement from Zoho's implementation specialists.

              Zoho CRM Training

                Zoho SalesIQ Resources



                    Zoho TeamInbox Resources




                              Zoho DataPrep Resources

                                Zoho DataPrep Demo

                                Get a personalized demo or POC

                                REGISTER NOW


                                  Design. Discuss. Deliver.

                                  Create visually engaging stories with Zoho Show.

                                  Get Started Now











                                                        • Related Articles

                                                        • Create workspace and add new dataset

                                                          To import data, you need to create a new workspace, or open an existing workspace and add a dataset. To import data:  1. Create a new workspace from the home page. Alternatively, you can select an existing workspace if you have one, and create ...
                                                        • Share your datasets

                                                          You can share your datasets with other users and groups in multiple ways.  Click the Share menu from the topbar of the Operations page to start sharing your datasets.   Alternatively, you can click the share icon from the Datasets tab on hovering ...
                                                        • Export options

                                                          DataPrep offers a wide range of export options. Your data can be exported to BI tools, cloud storage services, cloud databases. They can also be downloaded as files in different formats. To export your dataset 1. Click on the Export icon in the top ...
                                                        • Target Matching

                                                          Target matching allows you to set a target and align the source dataset to match with your target before exporting data. You can import a target dataset to match the column structure, formats, and the data types in the existing source dataset. This ...
                                                        • Context menu

                                                          Zoho DataPrep offers various options in the context menu to perform column level operations.  A few basic options here are Sort, Rename, Delete and Move. To access the context menu, right click the column name and click any one of the icons to ...

                                                        Resources

                                                        Videos

                                                        Watch comprehensive videos on features and other important topics that will help you master Zoho CRM.



                                                        eBooks

                                                        Download free eBooks and access a range of topics to get deeper insight on successfully using Zoho CRM.



                                                        Webinars

                                                        Sign up for our webinars and learn the Zoho CRM basics, from customization to sales force automation and more.



                                                        CRM Tips

                                                        Make the most of Zoho CRM with these useful tips.



                                                          Zoho Show Resources