Data transformation using Code Studio in Zoho DataPrep involves manipulating and preparing raw data for analysis in a structured data pipeline. By writing custom code, you can clean, transform, prepare, enrich and restructure data to suit your needs. This process allows you to integrate data from multiple sources, handle missing values, create new calculated fields, and apply complex logic to enhance data quality. For example, consider CRM data flowing in from various sources—we can consolidate and organize this data efficiently to generate more accurate insights.
Important: This option is supported in the US IN and EU DCs only.
How to access Code Studio?
1. Open an existing pipeline or create a new one and import data into it. This creates a stage for the data source. Right-click the stage and choose the Add Code Stage option.
2. This creates a code stage and a default output stage. Right-click the code stage to view the following options:
Edit: Opens the Code Studio Editor to modify the code.
Delete – Removes the code stage from the pipeline. You can click the

undo icon to revert the changes if needed or if it was deleted by mistake.
Add Code Input stage – This option highlights the code stage and you can connect any stage to the code stage title by clicking and dragging the arrow to add as input data to your code.
Add Code Output stage – Adds an additional output stage connected to the code stage.
3. The Code Studio Editor page will open with a sample code in it.
Code Editor: The central section of the Code Studio Editor is a dedicated Python environment for writing, editing, and executing code to clean and transform data.
Library: Located on the left side of the Code Editor, this section includes default libraries. You may also upload files and libraries into Code Studio apart from the default libraries.Click the + Add icon at the top to create a new file, upload a file, or add a library. Library acts as a file directory for all the folders and its files.
Important : Ensure that you update the input stage name in fetch_stage_input_as_dataframe() and the output stage name in save_stage_output_from_dataframe() to match the exact names used in your pipeline. Replace "Stage 1" and "StoreSalesProcessed" in the sample code with your actual code stage and output stage names. Only then the code will execute successfully in your pipeline. If the stage names do not match, the execution will fail throwing an error.
4. Enter the Python script into the Code Studio Editor to transform and prepare your data. Make sure you give a name and save the script.
5. Once completed, click the Test Run at the top to validate the script. The transforms will be applied on the first 100 rows of data.
Note: The Test run will make changes only on the sample data and not on the complete dataset.
6. You can check the results of the Test Run in the Output and logs in the Console sections at the bottom.
Console: A text-based interface used for debugging and troubleshooting. Any errors in the script will be displayed here. You can fix them and execute a test run again.
Output: Displays a preview table for the executed script.

7. Once you achieve the desired output, you can go ahead and Deploy the changes.
8. Click Deploy. Provide a version name and description. Once Deployed, the final output will be saved in the output stage and will be enabled. Deploying the code will apply changes in the full dataset. You can click open the output stage and view data in the DataPrep Studio page.
Note: The output stage remains disabled until the code is deployed. You must deploy the code at least once to enable and access this stage in DataPrep Studio page.
10. Now you can add destination to the output stage and execute the pipeline to export data.
Note: Code Studio follows a pay-as-you-go pricing model. Credits are deducted based on the selected compute capacity. For example, a 1 GB (2 CPU) configuration consumes 1 credit per minute. Please ensure your payment settings are configured before use. Each deployment consumes 1 credit. The compute size refers to the allocated memory and CPU resources used to execute and deploy your code.
By default, the compute configuration is set to 1 CPU and 1 GB RAM.
Go to the
ellipses icon from the pipeline builder, and click on the Job history option. You can view the details of the execution here. You can filter out these pipelines using the
Code executions filter option.
Other options in the Code Editor
1. You can click the ellipses icon in the top-right corner to access the following features:
Run last deployed version: Executes the most recently deployed version of the script without requiring any new changes to be published.
View Last Run Logs: Displays the execution status, logs, and detailed information from the most recent run to help monitor performance and troubleshoot issues.
Version Details: Provides information about the deployed versions of the script.
Download: Downloads the script in a zip file.
2. Click the ⌘ icon to use the basic keyboard shortcuts in the Code Studio Editor.
Track changes in Code Studio
The
Version history option helps you keep track of all the actions done in a pipeline. You can access the Version history from the
Last saved link at the top of the pipeline builder. This keeps track of all changes made to the Code Studio stage. This includes renaming the stage, updating the code (every time you click Save), and adding or modifying input and output stages. If the pipeline is currently published or live, making any changes will automatically move it back to Draft so the updates can be reviewed before publishing again.
SEE ALSO