How to import data from Dataiku?

Import data from Dataiku [Beta]

Zoho DataPrep allows you to bring in data from Dataiku, an end-to-end platform designed for data science, machine learning, and artificial intelligence (AI) that helps organizations design, deploy, and manage data and AI projects. With this integration, you can prepare data and enhance analytical capabilities, streamline data workflows, and facilitate informed decision-making through valuable insights.

Info
Important: You can import all data only and incremental data fetch is not supported yet for Dataiku.

To import data from Dataiku

1. Open an existing pipeline or create a pipeline from the Home Page, Pipelines tab or Workspaces tab and click the Add data option.

InfoInfo: You can also click the Import data  icon at the top of the pipeline builder to bring data from multiple sources into the pipeline.


2. Choose the Other Apps category from the left pane and click Dataiku. You can also search for Dataiku in the search box.



NotesNote: If you had already added a connection earlier, click the required connection and proceed to import. You can also find your saved connections under the Saved connections category from the left pane. To learn more about Saved connections, click here.

3. Select an account from the saved connections or connect a new account using Add new option.



4. Provide a unique Connection name and Domain name of your project from Dataiku in the respective fields. 
Notes
Note: You can get the domain name from the Home page URL  of your Dataiku organization.



5. Enter the secret key from your Dataiku organization. 

To get the secret key 

  1. Click the Profile icon from the top-right corner and click the Profile and settings  icon.
  2. Go to the API keys tab and click the New API key button.
  3. Provide a Label and Description for the secret key.
  4. Once your API key gets created copy and paste it in the DataPrep import screen.


6. Choose the required Project and the corresponding datasets and columns will be displayed. Select the Datasets and Columns that you would like to import.



7. Once you have completed importing data, Pipeline builder page opens and you can start applying transforms. You can also right-click the stage and choose the Prepare data option and prepare your data in the DataPrep Studio page. Click here to know more about the transforms.



Notes
Note: When you import more than one dataset from your Dataiku organization, each dataset will be created as a stage in DataPrep as above.

8. Once you are done creating your data flow and applying necessary transforms in your stages, you can right-click a stage and add a destination to complete your data flow.

NotesNote:  After adding a destination to the pipeline, you can try executing your pipeline using a manual run at first. Once you make sure manual run works, you can then set up schedule to automate the pipeline. Learn about the different types of runs here.

Schedule

You can schedule your pipeline using the Schedule option. 

Schedule configuration

1. Select the Schedule option in the pipeline builder.

2. Select a Repeat method (hourly, daily, weekly, monthly) and set frequency using Perform every dropdown. The options of the Perform every dropdown change with the Repeat method. Click here to know more.




3. Select the GMT at which you want to import new data found in the source. By default, your local time zone will be selected.

4. Pause schedule afterThis option allows you to choose to pause the schedule after n number of failures.
InfoInfo: The range can be between 2-100. The default value is 2.

Import configuration

You can configure how to import and fetch incremental data from your Dataiku organization using the Import configuration option. 
Notes
Note: The import configuration needs to be mandatorily setup for all the sources in the pipeline. Without setting up the import configuration, the schedule cannot be saved. 

5. Select the Click here link to set the import configuration. 

6. Select the required option from the How to import data from source? drop down. You can choose to import all data or not import data using this option

Import all data  

This option will import all available data from your Dataiku project.

 

 


Do not import data   

The data is imported only once. The second time, the rules get applied to the same data and get exported.

 




7. Click Save to schedule import for your data.

Note: If you have already configured a schedule from Dataiku, data will be reloaded based on your earlier configuration under the Import configuration section when you click on the Edit schedule option and set a new schedule.

Schedule settings

Stop export if data has invalid values: Enabling this will stop the export when prepared data still has invalid values.



Order exports

You can use this option when you have configured multiple destinations and would like to determine in what order the data has to be exported to destinations.

If not enabled, export will run in the default order.
Notes

Note: This option will be visible only if you have added more than one destination in your pipeline.

To rearrange the order of your export destinations

1) Click the Order exports toggle

2) You can drag and drop to change the order of the destinations and then click Save.



Notes
Note: Click the Edit order link if you want to rearrange the order again.

8. After you configure the schedule configuration, click Save to execute the schedule. This will start the pipeline.



Each scheduled run is saved as a job. When a pipeline is scheduled, the data will be fetched from your data sources, prepared using the series of transforms you have applied in each of the stages, and then data will be exported to your destination at regular intervals. This complete process is captured in the job history.


9. To go to the jobs list of a particular pipeline, go to the  ellipses icon in the pipeline builder, and click on the Job history menu to check the job status of your pipeline.

10. Click the required job ID in the Jobs history page to navigate to the Job summary of a particular job.

The Job summary shows the history of a job executed in a pipeline flow. Click here to know more.

11. When the schedule is completed, the data prepared in your pipeline will be exported to the configured destinations.

Info

Info: You can also view the status of your schedules later on the Jobs page.

Notes

Note: If you make any further changes to the pipeline, the changes are saved as a draft version. Choose the Draft option and mark your pipeline as ready for the changes to reflect in the schedule.




After you set your schedule, you can choose to Pause schedule or Resume scheduleEdit schedule and Remove schedule using the Schedule Active option in the pipeline builder.

When you edit and save a schedule, the next job will be from the last schedule run time to the next scheduled data interval.


Info
Important: Adding Zoho Dataiku as destination and pushing data to Dataiku from DataPrep is not supported yet.

SEE ALSO