Dataset transforms

Dataset transform consists of combining and transforming datasets to suit your needs. You can apply the following transforms in a dataset.

Deduplicate

Row-wise
Column-wise

Join
Append
Pivot
Unpivot
Calculated columns
Derive dataset
Mark as personal data

All the dataset transforms are present in the top bar transform menu

Deduplicate

You can remove duplicate records from your data using the Deduplicate transform. This can be done in two ways: row-wise and column-wise.

Row-wise

This method removes rows with duplicate data, allowing only unique rows to be present in your dataset.

To apply row-wise deduplication:

1. Click the Transform menu, click Deduplicate, then select Row-wise.

2. You can choose to ignore case and whitespace while removing duplicate rows.

Note: There could be instances where no duplicates were found in the sample dataset. You can still apply the rule to remove duplicates rows when the entire dataset is processed during export.

3. A live preview will be shown with the duplicate rows highlighted in red.

4. Click Remove duplicates.

Column-wise

You can also select single or multiple columns and choose to Deduplicate. The transform removes rows based on duplicate values present in the selected columns.

To apply column-wise deduplication:

1. Click the Transform menu, click Deduplicate then select Column-wise.

2. You can choose to ignore case and whitespace to find duplicates.

3. You can choose one of the two methods to deduplicate your dataset based on the selected column: Automatic deduplication or Manual conditions.

4. When you choose the Automatic deduplication method, DataPrep works for you to deduplicate your data based on the columns you've selected.

5. When you choose the Manual conditions method, you will need to enter the conditions and expressions and construct the 'if' statements. You can then select which rows to keep, or remove, within each of the duplicate cluster if the condition is true.

6. The following table lists the available If conditions for all the data types. Click here to know more about data types.

Text	Numeric	Datetime	Duration	Boolean	List	Map
contains	=equal to	= equal to	is smallest	is true	has value	has key
doesn't contain	!= not equal to	!= not equal to	is largest	is false	is empty list	is empty map
begins with	> more than	is earliest	= equal to	contains	is not empty list	is not empty map
ends with	< less than	is latest	!= not equal to	doesn't contain	is cell empty	is cell empty
is	>= more than or equal	is after	is cell empty	begins with	is cell not empty	is cell not empty
is not	<= less than or equal to	is before	is cell not empty	ends with	use regex	use regex
is cell empty	is smallest	on or after	use regex	is	use patterns	use patterns
is cell not empty	is largest	on or before	use patterns	is not
use regex	is cell empty	is cell empty		is cell empty
use patterns	is cell not empty	is cell not empty		is cell not empty
	use regex	use regex		use regex
	use patterns	use patterns		use patterns

7. You can also keep adding more conditions using the AND and OR operators to apply deduplication using a combination of conditions.

For example, you can write a condition that goes like this, "If the mail column contains zoho.com , keep those rows", i.e.,
Enter conditions to select which
If mail contains zoho.com

8. With the Advanced option, you can insert functions and provide conditions to remove duplicates.

9. Click the Preview button to see which rows will be removed during the transformation.

10. You can also select multiple columns for deduplication using (+) in Columns to de-duplicate.

Join

You can join two datasets together using common columns using the Join transform. Click here to see a quick video of Join transform.

For example, consider a dataset which has the purchase data of a store and another dataset with their customer information. These datasets can be combined using a common column such as the customer ID to identify and match the records from both the datasets.

DataPrep offers all four types of joins: inner join, left join, right join, and outer join.

To perform a join transform:

1. Click the Transform menu, click Combine, then select Join.

Info: You can also click the

icon or right-click the stage in the Pipeline builder page and choose the Join transform.

2. In the Join dataset dialog, you can choose the dataset with which you are going to join the current dataset and the join type.

3. Select the dataset that you want to join your current dataset using the Choose a dataset to join drop-down.

A new dataset will be created with the result of the join transform.

DataPrep automatically calculates and shows the Join potential between the current dataset and other datasets from your workspace. The join potential is shown in percentage, making it easier for you to choose a dataset for the Join transform. It is a calculation based on various factors like the amount of matching data in a column and matching column names between the datasets.

4. Choose the type of join using the Join type option. You can also change the type of join from the Join type menu in the Transform panel.

5. Enter a name for the new dataset in the New dataset name box in the Transform panel.

6. Select the columns using which you want to join the two datasets in the Matched columns section.

Resolve repeated column names

7. Click the Preview button. The Resolve repeated column names dialog appears, if there are any columns with the same name. You will have to rename or delete them to resolve duplicate columns.

8. In the preview, you can unselect the columns that you do not want to include in the resultant dataset using the check boxes available in the column headers of the preview.

9. Click Join to join the two datasets.

10. Once you've successfully joined the two datasets, you can click Open the joined dataset.

11. If you wish to view the join configuration, you can click open the Data source configuration from the Ruleset pane from the joined dataset.

12. You can also click Edit join to update the join configuration.

Joins are typically used to combine two different datasets on the common items. Let's look into each type of join with an example.

Inner join

An inner join combines two datasets using common columns and discards all the unmatched rows.

The inner join is best represented as:

Let us take the example of the following datasets.

Orders dataset

Customers dataset

Order ID is the common column here. Inner join has combined the rows of the two datasets that shared values in the Order ID column and discarded the unmatched values from both Orders and Customers datasets.

Left join

A left join combines the current dataset with another dataset using common columns, and discards all unmatched rows from the other dataset.

Left join can be represented as:

Let's take the example of the following datasets:

Orders dataset

Customers dataset

Left join has combined the rows of the two datasets that shared values in the Order ID column. The unmatched values were only discarded from the Customers dataset.

Right join

A right join combines the current dataset with another dataset using common columns and discards all unmatched rows from the current dataset.

Right join can be represented as:

Let's take the example of the following datasets:

Orders dataset

Customers dataset

Right join has combined the rows of the two datasets that shared values in the Order ID column. The unmatched values were only discarded from the Orders dataset.

Outer join

An outer join combines two datasets using common columns and includes all the rows from both datasets (including all unmatched rows). An outer join can be represented as:

Let's take the example of the following datasets.

Orders dataset

Customers dataset

Outer join has combined the rows of the two datasets that shared values in the Order ID column. The unmatched values are kept from both Orders and Customers datasets.

Append

In DataPrep, you can append one dataset with another and create a new dataset using the Append transform. Click here to see a quick video on Append transform.

To append datasets

1. Click the Transform menu, click Combine , then select Append.

Info: You can also click the

icon or right-click the stage in the Pipeline builder page and choose the Append transform.

2. In the Append dataset dialog, you can choose the dataset that you want to append to the current dataset.

A new dataset will be created with the result of the append transform.

3. Enter a name for the new dataset in the New dataset name box in the transform panel.

4. DataPrep compares the column names of the two datasets and shows the unmatched columns so can choose to include or exclude them. For the matched ones, incoming data is added directly below the columns that have matched.

5. Click the Preview button and DataPrep will show a live preview of the append transform.

6. Click Append to apply the transform. Once you've successfully combined the two datasets by appending the rows, click Open the combined dataset.

7. If you want to edit the append configuration, you can open the Data source configuration from the Ruleset pane from the newly created dataset and edit the configuration.

Pivot

Pivot Table distributes data for easy consumption. It spreads out the data in long, winding tables by converting categories into columns. Pivot can be created by selecting the Column , Row , and Data fields. Click here to see a quick video of Pivot transform.

To apply the pivot transform

1. Click the Transform menu and select the Pivot option. The Pivot pane will slide open to view.

Info: You can also click the

icon or right-click the stage in the Pipeline builder page and choose the Pivot transform.

2. Select the fields that you want to convert to columns in the Columns section. Choose the fields to feature as rows in the Rows section and the fields with data in the Data section. Click here to know about the functions

Note: You can only choose fields with numerical data in the Data section.

3. You can also select the Filters tab if you want to apply any filters. Using filters, you can selectively filter data based on the filter conditions applied over one or more columns. If you want to learn more about filters, click here .

4. Select the Sort tab if you want to rearrange the order of your columns and rows. The fields added under the Pivot tab are also shown in the Sort tab. You can sort the data by rows and by columns using the By rows and By columns sections.

You can sort data in ascending or descending order, or perform a custom sort by selecting the options in the By columns and By rows sections.

You can view a preview of the table by clicking the Preview button. Click Reset to set back to the default selection (Ascending).

For example, you can select the Ascending and Descending options in the Product category and Region fields in the By columns sections, and select the Ascending option in the Customer Name filter in the By rows section to sort the Sales Data. The data sorted will look like as shown in the preview below.

You can also customize the arrangement by selecting the Custom option. You can drag and drop or use the up and down arrows to arrange the rows and columns.

Note: The filter and sort functionalities are optional.

5. You can add fields to the Data section, and you can choose one of the aggregate functions. The functions can get you the sum, count, average, etc. depending upon the data type of the fields added. The list of functions available for each data type is given in the table below.

Datatype	Functions
Number	Sum Maximum Minimum Average Standard deviation Median Mode Percentile Variance Count Distinct count
Text	Count Distinct count
Date	Count Distinct count Maximum date Minimum date
List	Count Distinct count
Map	Count Distinct count

Note: The functions for the list data type counts the values and t he functions for the map data type counts the keys.

6. Click the Preview button and DataPrep will show a live preview of the pivot transformation.

7. When two or more fields are selected in the Columns box, the column names are joined using a dash or hyphen (-).

8. In the new dataset, you can open the data source settings from the Ruleset pane and edit the pivot configuration.

9. Click Pivot to apply the pivot configuration to your dataset.

Unpivot

Unpivot converts columns to rows. The Unpivot transform is useful in condensing data and the data is often exported to analytics software for creating reports and dashboards. The result is saved as a new dataset when the transform is applied.

To apply unpivot transform:

1. Click the Transform menu and select Unpivot option.

Info: You can also click the

icon or right-click the stage in the Pipeline builder page and choose the Unpivot transform.

2. Enter a name for the new dataset in the New dataset name field.

3. Select the columns to unpivot in the Columns to apply box. These are the columns that will be converted into rows.

4. A new column will be created using the column headers of the selected columns. Enter a name for this new column in the Column name for headers field.

5. The values under the columns selected will be added to a separate column. Provide a name for this column in the Column name for values field.

6. Click on the Preview button and DataPrep will show a live preview of the transformation.

7. Click Apply and apply the unpivot transform.

Formula column

You can create new columns and customize them using the Formula column transform. Zoho DataPrep offers a variety of functions to suit your needs. Click here to know more about the functions.

To create a formula column

1. Click on the Transform menu and select the Formula column option.

2. Provide a name for new column name in the New column name field.

3. You can insert the functions in the Formula field by typing the function names, or use the intelliSense to complete the functions. You can also apply filters and search functions in the Click to insert functions section.

4. You can also click the OpenAI ChatGPT tile to generate formula by sending prompts to ChatGPT. Click here to know more.

5. You can add parameters to the formula, or use intelliSense to choose columns. You can also search for column names in the Click to insert columns section.

6. DataPrep shows a live preview of the changes made to the formula when you click on the Preview button.

7. Click Apply to apply the changes.

Derive dataset

The Derive dataset transform allows you to create a branch of your dataset from the last applied transform in your dataset. The new dataset will have the most recent state of your data with an empty ruleset.

To derive dataset

1. Click the Transform menu and select the Derive dataset option.

2. Enter a name for your derived dataset under the New dataset name field.

3. Click Apply to create your derived dataset.

Now you've successfully created the derived dataset. The newly created dataset will have the most recent state of your data with an empty ruleset.

4. You can choose to click Open and start using the new dataset.

5. You can click the Data source configuration in the derived dataset to view the source details of the parent dataset.

Mark as personal data

You can mark a column that contains PII (Personal Identifiable Information) or personal data and ePHI (Electronic Protected Health Information ) data using the Mark PII and ePHI data transform. You can also apply security methods on the columns to protect your data and choose to include or exclude these columns during export.

To mark columns with PII data or personal data

1. Click the Transform menu and select the Mark PII and ePHI data option.

You can also right-click on a column and select the Mark PII and ePHI data option from the context menu.

2. Add personal data columns to the Mark columns with personal data section.

3. Click Apply to mark the selected columns as personal data.

To mark columns with ePHI data or health data

1. Click the Transform menu and select the Mark PII and ePHI data option.

Info: You can also right-click on a column and select the Mark PII and ePHI data option from the context menu.

2. Add columns containing health data under the Mark columns with ePHI data section.

3. Click Apply to mark the selected columns as ePHI data columns.

To protect personal data or ePHI data during export

1. On the Pipeline Builder page, once you have completed creating your data flow and applying the necessary transforms in your stages, you can right-click a stage and select the Add Destination option.

2. From the side pane, you can choose the destination where you want to export data. For example, let us choose Files as the destination.

3. Choose the columns with personal data or ePHI data to be included during export using the corresponding check boxes.

Note : Columns not marked as personal data or ePHI data will be included by default.

4. Choose the required security method from the drop-down to protect your personal data and click Next.

There are three security measures that can be applied to the personal data or ePHI data columns. These security measures are used to protect sensitive data such as Personally Identifiable Information (PII).

Security measures to protect personal data or ePHI data

1. Data masking

Data masking hides original data with 'x' to protect personal information.

2. Data Tokenization

Data tokenization replaces each distinct value in your data with a random value. Hence the output is statistically identical to the original data.

3. None

You can select None if you do not want to use any security method.

Note :
1. Exporting datasets without securing personal data will be restricted if Secure data exports option is enabled in your organization's privacy settings. Click here to know more.
2. Based on the Compliance settings of your organization, exporting datasets with ePHI data may be restricted or, exporting datasets without securing ePHI data with security measures and password protection would be restricted. Learn more

5. Provide the required values, and click Export.

Note: You can also review how personal data or ePHI data is protected in all your workspaces under the Review personal data in all workspaces section. Click here to know more.

SEE ALSO

How does data preparation work in Zoho DataPrep?

How to apply filter on datasets?

Learn more about Pivot transform

Learn more about Unpivot transform

How to apply transforms on a set of rows?

Create. Review. Publish.

Write, edit, collaborate on, and publish documents to different content management platforms.

Get Started Now

Access your files securely from anywhere

Zoho CRM Training Programs

Learn how to use the best tools for sales force automation and better customer engagement from Zoho's implementation specialists.

Redefine the way you work

with Zoho Workplace

Start your free trial

Zoho DataPrep Personalized Demo

If you'd like a personalized walk-through of our data preparation tool, please request a demo and we'll be happy to show you how to get the best out of Zoho DataPrep.

Create, share, and deliver

beautiful slides from anywhere.

Get Started Now

Zoho Sign now offers specialized one-on-one training for both administrators and developers.

BOOK A SESSION

Zoho Workerly Home
Forums
Connect With Us:

Zoho Recruit Home
Forums
Connect With Us:

Start tracking your website metrics today

Use PageSense to understand what your visitors are looking for, how they are behaving, and where they are facing problems on your website.

Ready to improve your website's performance?

Install the PageSense code snippet on your site in a matter of minutes and start collecting in-depth data about the website visitors to grow your business.

Track and measure every business goal

Set up goals in PageSense to measure every single action performed by visitors on your website like button or link clicks, form submissions, and page engagements.

Understand the customer journey on your website

Create funnels in PageSense to quickly see which pages visitors use to enter your website, where they navigate to next, and which pages they decide to leave without converting.

Visualize your visitor's behavior with color codes

Set up heatmaps in PageSense to see where users have clicked more, how far they've scrolled, and on which parts of a page they've spent the most time using color-coded patterns in reports.

Measure customer engagement with your web forms

Use form analytics in PageSense to see how people interact with different fields in your form, whether they complete the form successfully or not, and where exactly they drop out on your form.

Record real visitor interactions on your website

Use session recordings in PageSense to watch a video of all the visitor actions performed on your website including the pages they navigate, the buttons they click, the UX issues they face, and more.

Test and optimize webpages for better conversions

Run A/B or Split URL tests in PageSense to figure out which version of your web page works best for your business and results in the best conversion rate.

Give each customer a unique website experience

Use personalization in PageSense to deliver customized versions of your website for every individual customer based on their demographics, local weather, browsing history, and more.

Grow your business through customer feedback

Run polls on your website using PageSense to understand what your customers think about your products/services and what needs improvement on your site.

Send timely updates that customers love

Use web push notifications in PageSense to schedule and notify your customers about an upcoming flash sale, product releases, promotional coupons, and a lot more that can spark conversions on your website.

Advertise your business to website visitors

Use pop-ups in PageSense to instantly grab the attention of visitors by showing attractive signup offers, coupon code discounts, or email newsletters that can eventually convert them into subscribers.

Access more advanced settings in PageSense

Use PageSense's advanced features like creating mutually exclusive groups, enabling cross-domain tracking, configuring customized project JS, and more to get deeper insights about your website.

Try easy-to-use extensions

Download the PageSense extension app available for your web browser with a few clicks and start collecting all of your required website metrics in real time.

Discover your favorite integrations with PageSense

Get a deeper look at your website's data by seamlessly integrating PageSense with a host of popular third-party apps like Google Analytics, Mixpanel, Intercom, and more.

Quick Links	Workflow Automation	Data Collection
Web Forms	Enterprise	Online Data Collection Tool
Embeddable Forms	Banking	Begin Data Collection
Interactive Forms	Workplace	Data Collection App
CRM Forms	Customer Service	Accessible Forms
Digital Forms	Marketing	Forms for Small Business
HTML Forms	Education	Forms for Enterprise
Contact Forms	E-commerce	Forms for any business
Lead Generation Forms	Healthcare	Forms for Startups
Wordpress Forms	Customer onboarding	Forms for Small Business
No Code Forms	Construction	RSVP tool for holidays
Free Forms	Travel	Features for Order Forms
Prefill Forms	Non-Profit
Intake Forms	Legal	Mobile App
Form Designer	HR	Mobile Forms
Card Forms	Food	Offline Forms
Assign Forms	Photography	Mobile Forms Features
Translate Forms	Real Estate	Kiosk in Mobile Forms
Electronic Forms
Drag & drop form builder
Notification Emails for Forms	Alternatives	Security & Compliance
Holiday Forms	Google Forms alternative	GDPR
Form to PDF	Jotform alternative	HIPAA Forms
Email Forms	Formstack alternative	Encrypted Forms
	Wufoo alternative	Secure Forms
		WCAG

Quick Links

New to Zoho Writer?

Signup Now

Zoho Writer

What's new

Help Guide

Blogs

Forums

Webinars

API

Developers

Build Extensions

Create. Review. Publish.

Write, edit, collaborate on, and publish documents to different content management platforms.

Get Started Now

Zoho Campaigns Resources

Campaigns Community Learning Series

New to Zoho CRM Plus?

Deliver unforgettable customer experiences

ACCESS ZOHO CRM PLUS

New to Zoho CRM Plus?

Deliver unforgettable customer experiences

ACCESS ZOHO CRM PLUS

New to Zoho Marketing Plus?

Everything you need to run your marketing

ACCESS ZOHO MARKETING PLUS

New to Zoho Marketing Plus?

Everything you need to run your marketing

ACCESS ZOHO MARKETING PLUS

Zoho Pagesense Resources

You are currently viewing the help pages of Qntrl’s earlier version. Click here to view our latest version—Qntrl 3.0's help articles.

New to Zoho Survey?

Access Zoho Survey

Latest Feature Updates

Explore our Pricing & Plans

Zoho Survey Resources

Android Mobile Surveys

Insurvey Blogs

Insight Blogs

Tips & Tricks

New to Zoho Social?

Manage your brands on social media

Access Zoho Social

Desk Community Learning Series
Digest
Functions
Meetups
Kbase
Resources
Glossary
Desk Marketplace
MVP Corner
Word of the Day
Ask the Experts

Zoho Sheet Resources

New to Zoho Forms?

Latest Feature Updates

Explore our Pricing & Plans

Zoho Forms Resources

Secure your business

communication with Zoho Mail

Get started

Mail on the move with

Zoho Mail mobile application

Go Mobile

Stay on top of your schedule

at all times

Get started

Carry your calendar with you

Anytime, anywhere

Get started

Latest Announcements

New to Zoho Sign?

Zoho Sign Resources

Sign, Paperless!

Sign and send business documents on the go!

Get Started Now

Zoho SalesIQ Resources

New to Zoho TeamInbox?

Zoho TeamInbox Resources

Connect with us

New to Zoho ZeptoMail?

LEARN MORE

Latest Feature Updates

Zoho DataPrep Resources

New to Zoho DataPrep?

Access Zoho DataPrep

Zoho DataPrep Demo

Get a personalized demo or POC

Design. Discuss. Deliver.

Create visually engaging stories with Zoho Show.

Get Started Now

New to Zoho Workerly?

Access Zoho Workerly

New to Zoho Recruit?

Access Zoho Recruit

New to Zoho CRM?

Signup Now

Access Zoho CRM

Latest Feature Updates

New to Zoho Projects?

Access Zoho Projects

New to Zoho Sprints?

Access Zoho Sprints

New to Zoho Assist?

Access Zoho Assist

New to Bigin?

Signup Now

Access Bigin

Latest Feature Updates

Related Articles
Derive dataset
The Derive dataset transform allows you to create a branch of your dataset from the last applied transform in your dataset. The new dataset will have the most recent state of your data with an empty ruleset. To derive dataset 1. In the DataPrep ...
Filter
The filter option allows you to filter data based on custom conditions applied over one or more columns. The filter can also be used without applying them as a rule. However, you can choose to apply them as a rule by either keeping or deleting the ...
Append
In DataPrep, you can append one dataset with another and create a new dataset using the Append transform. Here's a quick video on how to use the Append transform. To append datasets 1. In the DataPrep Studio page, click the Transform menu, click ...
Mark PII and ePHI data
You can mark a column that contains PII(Personal Identifiable Information) or personal data and ePHI (Electronic Protected Health Information ) data using the Mark PII and ePHI data transform. You can also apply security methods on the columns to ...
Window functions to add formula columns
A window function is used to perform calculations across a group of rows on a dataset. A set of rows is termed a window. You can use window functions to perform summations and calculations based on a rolling window of data, relative to the current ...