Dataset transforms

Dataset transforms

Dataset transform  consists of combining and transforming datasets to suit your needs. You can apply the following transforms in a dataset.
  • Deduplicate
    • Row-wise
    • Column-wise
  • Join
  • Append
  • Pivot
  • Unpivot
  • Calculated columns
  • Derive dataset
  • Mark as personal data
All the dataset transforms are present in the topbar transform menu



Deduplicate

You can remove duplicate records from your data using the Deduplicate transform. This can be done in two ways: row-wise and column-wise.

Row-wise

This method removes rows with duplicate data, allowing only unique rows to be present in your dataset.

To apply row-wise deduplication:

1. Click the Transform menu, click Deduplicate , then select Row-wise

2. You can choose to ignore case and whitespace while removing duplicate rows.

Note: There could be instances where no duplicates were found in the sample dataset. You can still apply the rule to remove duplicates rows when the entire dataset is processed during export.

3. A live preview will be shown with the duplicate rows highlighted in red.



4. Click Remove duplicates

Column-wise

You can also select single or multiple columns and choose to Deduplicate. The transform removes rows based on duplicate values present in the selected columns. 

To apply column-wise deduplication:

1. Click the Transform menu, click Deduplicate then select Column-wise

2. You can choose to ignore case and whitespace to find duplicates. 

3. You can choose one of the two methods to deduplicate your dataset based on the selected column: Automatic deduplication or Manual conditions

4. When you choose the Automatic deduplication method, DataPrep works for you to deduplicate your data based on the columns you've selected. 



5. When you choose the Manual conditions method, you will need to enter the conditions and expressions and construct the 'if' statements. You can then select which rows to keep, or remove, within each of the duplicate cluster if the condition is true.

 

6. The following table lists available conditions for the if statement. 

Data types
Text
Numeric
Date
Boolean
If conditions
 contains
 = equal to
 is empty
 contains
If conditions
 doesn't contain
 > more than
 is not empty
 doesn't contain
If conditions
 begins with
 < less than

 begins with
If conditions
 ends with
 >= more than or equal

 ends with
If conditions
 is
 <= less than or equal

 is 
If conditions
 is not
 is empty empty

 is not
If conditions
 is empty
 is not empty

 is empty
If conditions
 is not empty


 is not empty

Note: You cannot use conditions on columns with array or map data types. 

7. You can also keep adding more conditions using the AND and OR operators to apply deduplication using a combination of conditions.
For example, you can write a condition that goes like this, "If the  mail  column contains  zoho.com keep  those rows", i.e., 
Enter conditions to select which 
If mail contains zoho.com

8. With the Advanced option, you can insert functions and provide conditions to remove duplicates.



9. Click the Preview button to see which rows will be removed during the transformation.

10.  You can also select multiple columns for deduplication using (+) in Columns to de-duplicate

Join 

You can join two datasets together using common columns using the Join transform.

For example, consider a dataset which has the purchase data of a store and another dataset with their customer information. These datasets can be combined using a common column such as the customer ID to identify and match the records from both the datasets.

DataPrep offers all four types of joins: inner join, left join, right join, and outer join. 

To perform a join transform:

1. Click the Transform menu, click Combine , then select Join

2. In the   Join dataset  dialog, you can choose the dataset with which you are going to join the current dataset, along with the join type. At this point, you can choose to create a new dataset with the result of the join transform or update the current dataset with the result.  

3. Select the dataset that you want to join your current dataset with in the   Choose a dataset to join  drop-down.

 
DataPrep automatically calculates and shows the Join potential between the current dataset and other datasets from your workspace. The join potential is shown in percentage, making it easier for you to choose a dataset for the Join transform. It is a calculation based on various factors like the amount of matching data in a column and matching column names between the datasets.

4.  Choose the type of join using the Join type option. You can also change the type of join from the Join type  menu in the Transform panel.

5. The toggle  Join the two datasets to create a new one  is enabled by default. However, you can modify the selection if you do not want to create a new dataset.

 

6.  If you choose to create a new dataset, enter a name for the new dataset in the   New dataset name  box in the Transform panel.

7. Select the columns using which you want to join the two datasets i n the   Matched columns  section.
 
8. Click the   Preview  button. The   Resolve repeated column names  dialog appears, if there are any columns with the same name. You will have to rename them to resolve duplicate columns.

9. In the preview, you can unselect the columns that you do not want to include using the check boxes in the column headers.    

10. Click Join to join the two datasets.



11.  You've now successfully joined the two datasets. 

12. If you wish to view the join configuration, you can click open the Data source configuration from the Ruleset  pane.



13. You can also click  Edit join to update the join configuration.



Joins are typically used to combine two different datasets on the common items. Let's look into each type of join with an example. 

Inner join

An inner join combines two datasets using common columns and discards all the unmatched rows. 

The inner join is best represented as: 



Let us take the example of the following datasets. 

Orders dataset 



Customers dataset 



Order ID is the common column here. Inner join has combined the rows of the two datasets that shared values in the Order ID column and discarded the unmatched values from both Orders and Customers datasets. 



Left join


A left join combines the current dataset with another dataset using common columns, and discards all unmatched rows from the other dataset.

Left join can be represented as: 



Let's take the example of the following datasets:

Orders dataset



Customers dataset



Left join  has combined the rows of the two datasets that shared values in the Order ID column. The unmatched values were only discarded from the Customers dataset. 



Right join 


A right join combines the current dataset with another dataset using common columns and discards all unmatched rows from the current dataset.
Right join can be represented as:



Let's take the example of the following datasets: 

Orders dataset



Customers dataset



Right join has combined the rows of the two datasets that shared values in the Order ID column. The unmatched values were only discarded from the Orders dataset. 


Outer join

An outer join combines two datasets using common columns and includes all the rows from both datasets (including all unmatched rows). An outer join can be represented as:



Let's take the example of the following datasets.

Orders dataset



Customers dataset



Outer join has combined the rows of the two datasets that shared values in the Order ID column. The unmatched values are kept from both Orders and Customers datasets. 

Append

In DataPrep, you can append one dataset with another using the Append transform. You can either append your incoming dataset to your current dataset directly, or you can create a new dataset to append the individual datasets.

To append datasets: 

1. Click the Transform menu, click Combine , then select Append

2.  In the   Append dataset  dialog, you can choose the dataset that you want to append to the current dataset.




3. The  Join the two datasets to create a new one  toggle is enabled by default. However, you can modify the selection if you do not want to create a new dataset.

4.  If you choose to create a new dataset, enter a name for the new dataset in the   New dataset name  box in the transform panel.

5. DataPrep compares the column names of the two datasets and shows the unmatched columns so can choose to include or exclude them. For the matched ones, incoming data is added directly below the columns that have matched.



6. Click the Preview  button and DataPrep will show a live preview of the append transform.



7. Click Append to apply the transform. You've now successfully combined the two datasets by appending the rows.

8.  If you want to edit the append configuration, you can open the data source configuration from the Ruleset pane of the newly resultant dataset and edit the configuration.


Pivot 

Pivot Table distributes data for easy consumption. It spreads out the data in long, winding tables by converting categories into columns. Pivot can be created by selecting the Column, Row, and Data fields. 

To apply the pivot transform:

1. Click the   Transform  menu and select the Pivot option. The Pivot pane will slide open to view.  

2. Select the fields that you want to convert to columns in the   Columns  section. Choose the fields to feature as rows in the Rows section and the fields with data in the Data section.

Note: You can only choose fields with numerical data in the Data section.

3.  You can also select the  Filters  tab if you want to apply any filters. Using filters, you can selectively filter data based on the filter conditions applied over one or more columns. If you want to learn more about filters, click here.


4. Select the  Sort  tab if you want to rearrange the order of your columns and rows. The fields added under the  Pivot  tab are also shown in the  Sort  tab. You can sort the data by rows and by columns using the  By rows  and  By columns  sections.

    You can sort data in ascending or descending order, or perform a custom sort by selecting the options in the  By columns  and  By rows  sections. 

    You can view a preview of the table by clicking the  Preview  button. Click  Reset  to set back  to the default selection (Ascending).

For example, you can select the  Ascending  and  Descending  options in the Product category and Region fields in the  By columns  sections, and select the  Ascending  option in the Customer Name filter in the  By rows  section to sort the Sales Data. The data sorted will look like as shown in the preview below.



You can also customize the arrangement by selecting the  Custom  option. You can drag and drop or use the up and down arrows to arrange the rows and columns.



Note: The filter and sort functionalities are optional.

5. You can add fields to the  Data  section, and you can choose one of the aggregate functions. The functions can get you the sum, count, average, etc. depending upon the data type of the fields added. The list of functions available for each data type is given in the table below.

Datatype

 Functions

Number

Sum

Maximum

Minimum

Average

Standard deviation

Median

Mode

Percentile

Variance

Count

Distinct count

Text

Count

Distinct count

Date

Count

Distinct count

Maximum date

Minimum date

List

Count

Distinct count

Map

Count

Distinct count


Note:  The functions for the list data type counts the values and the functions for the map data type counts the keys.

6.  Click the   Preview  button and DataPrep will show a live preview of the pivot transformation.


7. When two or more fields are selected in the  Columns  box, the column names are joined using a dash or hyphen (-). 

8.  In the new dataset, you can open the data source settings from the  Ruleset  pane and edit the pivot configuration. 

9.  Click  Pivot  to apply the pivot configuration to your dataset.

Unpivot

Unpivot converts columns to rows. The Unpivot transform is useful in condensing data and the data is often exported to analytics software for creating reports and dashboards. The result is saved as a new dataset when the transform is applied.

To apply unpivot transform:

1. Click the Transform menu and select Unpivot option.

2. Enter a name for the new dataset in the   New dataset name  field.  

3. Select the columns to unpivot in the   Columns to apply   box. These are the columns that will be converted into rows.

4.  A new column will be created using the column headers of the selected columns. Enter a name for this new column in the   Column name for headers   field.  



5. The values under the columns selected will be added to a separate column. Provide a name for this column in the  Column name for values  field. 

6. Click on the   Preview  button and DataPrep will show a live preview of the transformation.

 

7. Click Apply and apply the unpivot transform.

Formula column

You can create new columns and customize them using the Formula column transform. Zoho DataPrep offers a variety of functions to suit your needs. Click here to know more about the functions.  

To create a formula column

1. Click on the  Transform menu and select the  Formula column option.

2. Provide a name for new column name in the  New column name  field.

3. You can insert the functions in the  Formula field by typing the function names, or use the intelliSense to complete the functions. You can also apply filters and search functions in the  Click to insert functions section.

4. You can add parameters to the formula, or use intelliSense to choose columns. You can also search for column names in the Click to insert columns section.  




5. DataPrep shows a live preview of the changes made to the formula when you click on the Preview button. 

6. Click Apply to apply the changes.

Derive dataset

The Derive dataset transform allows you to create a branch of your dataset from the last applied transform in your dataset. The new dataset will have the most recent state of your data with an empty ruleset.

To derive dataset: 

1. Click the Transform menu and select the Derive dataset option. 

2. Enter a name for your derived dataset under the New dataset name field.

 

3. Click  Apply  to create your derived dataset. 

Now you've successfully created the derived dataset. The newly created dataset will have the most recent state of your data with an empty ruleset. 

4.  You can choose to click  Open  and start using the new dataset.  

5. You can click the  Data source configuration  in the derived dataset to view the  source details of the parent dataset.



Mark as personal data

You can mark a column that contains personal data using the Mark as personal data option. You can also apply security measures on the column to protect your personal data, or choose to include or exclude this column during export.

To mark columns with personal data:
                 
1. Click the  Transform  menu and select the  Mark as personal data  option. 

2. Add the columns with personal data in the  Mark columns with personal data  section. 

3. Click Apply and mark the columns as personal data.

 

During export, there are three security measures that can be applied to the columns marked as containing personal data:

1. Data masking  
Data masking hides original data with 'x' to protect personal information.

2. Data Tokenization 
Data tokenization replaces each distinct value in your data with a random value. Hence the output is statistically identical to the original data. 

3. None
You can select none if you do not want to use any security method. You can choose whether to export these column using the corresponding check boxes.  




    Zoho CRM Training Programs

    Learn how to use the best tools for sales force automation and better customer engagement from Zoho's implementation specialists.

    Zoho CRM Training
      Redefine the way you work
      with Zoho Workplace

        Zoho DataPrep Personalized Demo

        If you'd like a personalized walk-through of our data preparation tool, please request a demo and we'll be happy to show you how to get the best out of Zoho DataPrep.

        Zoho CRM Training

          Create, share, and deliver

          beautiful slides from anywhere.

          Get Started Now


            Get started with Zoho Sign

            in a few quick steps!

            Download Help Guide





                      Still can't find what you're looking for?

                      Write to us: support@zohoforms.com


                            




                            

                        Zoho Marketing Automation

                          Zoho Sheet Resources

                           




                              Zoho Forms Resources


                                Secure your business
                                communication with Zoho Mail


                                Mail on the move with
                                Zoho Mail mobile application

                                  Stay on top of your schedule
                                  at all times


                                  Carry your calendar with you
                                  Anytime, anywhere




                                        Zoho Sign Resources

                                          Sign, Paperless!

                                          Sign and send business documents on the go!

                                          Get Started Now


                                              Zoho SalesIQ Resources



                                                  Zoho TeamInbox Resources



                                                          Zoho DataPrep Resources



                                                            Zoho DataPrep Demo

                                                            Get a personalized demo or POC

                                                            REGISTER NOW


                                                              Design. Discuss. Deliver.

                                                              Create visually engaging stories with Zoho Show.

                                                              Get Started Now











                                                                                    • Related Articles

                                                                                    • Derive dataset

                                                                                      The Derive dataset transform allows you to create a branch of your dataset from the last applied transform in your dataset. The new dataset will have the most recent state of your data with an empty ruleset. To derive dataset 1. Click the Transform ...
                                                                                    • Window functions to add formula columns

                                                                                      A window function is used to perform calculations across a group of rows on a dataset. A set of rows is termed a window. You can use window functions to perform summations and calculations based on a rolling window of data, relative to the current ...
                                                                                    • Append transform

                                                                                      In DataPrep, you can append one dataset with another using the Append transform. You can either append your incoming dataset to your current dataset directly, or you can create a new dataset to append the individual datasets.  To append datasets 1. ...
                                                                                    • Deduplicate transform

                                                                                      You can remove duplicate records from your data using the Deduplicate transform. This can be done in two ways: row-wise and column-wise. Row-wise This method removes rows with duplicate data, allowing only unique rows to be present in your dataset. ...
                                                                                    • Context menu

                                                                                      Zoho DataPrep offers various options in the context menu to perform column level operations.  A few basic options here are Sort, Rename, Delete and Move. To access the context menu, right click the column name and click any one of the icons to ...
                                                                                    Wherever you are is as good as
                                                                                    your workplace

                                                                                      Resources

                                                                                      Videos

                                                                                      Watch comprehensive videos on features and other important topics that will help you master Zoho CRM.



                                                                                      eBooks

                                                                                      Download free eBooks and access a range of topics to get deeper insight on successfully using Zoho CRM.



                                                                                      Webinars

                                                                                      Sign up for our webinars and learn the Zoho CRM basics, from customization to sales force automation and more.



                                                                                      CRM Tips

                                                                                      Make the most of Zoho CRM with these useful tips.



                                                                                        Zoho Show Resources