1. Choose the Cloud storage category from the left pane and click Google Drive. You can also search Google Drive in the search box.
Note: If you had already added a Google Drive connection earlier, click the Saved connections category from the left pane and proceed to import. To learn more about Saved connections, click here.
2. If you have already added a connection, click the existing connection and start importing data.
Note: Click the Add new link to add a new Google Drive account. You can create as many Google Drive connections as required.
3. Authenticate your Google Drive account. You will need to authorize DataPrep to access your files when you do this for the first time.
4. To incrementally import your data, click the Advanced selection link.
Advanced selection helps you perform dynamic file selection based on regex. This can be used for getting new or incremental data from your Google Drive folder. The newly added or modified file that matches the file pattern after the previous sync will be taken from your Google Drive folder.
The details required are :
Choose folder : Choose the folder you want to import data from.
Include subfolders : You can also select the Include subfolders checkbox if you want to include subfolders while searching for a file. Click here to know the limitations of this option.
Note: The file pattern match is a simple regex type match. For example, to fetch files with file names such as Sales_2022.csv , Sales_2023.csv , Sales_2024.csv , you can input the pattern Sales_.*
Similarly to fetch files such as PublicData1.csv , PublicData2.csv , PublicData3.csv , use Public.*
If you want to import a single file, then specify the pattern using the exact file name.
Eg: leads_jan_2022.*
Parse file as : Choose the required extension to parse the file. If your file format is not a commonly used one, you can use this option to parse the file into one of the following formats before importing the data into a readable format. The available formats are CSV, TSV, JSON, XML, and TXT.
Fill in the required details and click the Import button.
Note: We support only CSV, TSV, JSON, XML, and TXT file formats for incremental fetch from cloud storage.
5. Once you have completed importing data, your dataset will open and you can start preparing your data right away.
6. When your dataset is ready, export it to the required destination before next reload.
Schedule your dataset based on your pipeline complexity. Give enough time to import, process data and export.
7. When the dataset is scheduled for import, the imported time or the last scheduled time is recorded. Initially, only the oldest file will be fetched. During every successful sync, the last sync time is updated with the new value and the file created or modified after the sync time is imported. If there is no new or modified file in Google Drive, no data will be imported. If no data were synced, the sync time will be updated since it was given a try. In the next cycle, the file created or modified after this sync time will be fetched.
You can verify the number of records fetched from Google Drive in the Operations history panel on the Sync Status page.
Click the Operations history icon near each sync status to view and track the changes made to the dataset, its previous states, the import and export schedules in a timeline.
You can also verify the processed data for every sync in the Processing history panel. On clicking the Processing history option, the side pane will open up listing all the processed data IDs available for the dataset, along with the generated time.
You can also download and verify the processed data by clicking on the icon that appears when you hover over a record.
8. To fetch the next file after the last sync time manually, you can use the Reload data from source option.
From the DataPrep studio page, select the Import menu in the top bar and click Reload data from source. Using this option, you can refresh your dataset with the latest file by reloading data from your data source.
During a manual reload, only the newly added or modified file after the last sync time is imported to the dataset.
Note: All the newly added or modified files are incrementally fetched based on Greenwich MeanTime (GMT) / UTC.
1) Using the Include subfolder option, you can only fetch files from a single sub-folder or all the sub-folders of the entire My drive or Shared with me folders. You cannot fetch files from all the sub-folders within a specific folder.
To fetch files from a single subfolder: Enter the exact path of the subfolder in the folder path. For example, 2024/jan/ . Fill in the required details. The files that match mentioned file pattern will be fetched from the specified folder.
To fetch files from all subfolders of Google Drive: Leave the folder path empty and select the Include subfolders checkbox. Fill in the required details. The files that match the mentioned file pattern will be fetched from all the sub-folders of the entire My Drive or Shared with me folders.
For instance, a folder in Google Drive has 10 files in total. The user wants to skip files from 3 to 5. Below are the steps to skip those particular files during incremental fetch.
Follow the steps below to skip files from the middle during the incremental fetch.
1) Import the file using a generic file pattern. Eg leads.*
2) Initially, only the oldest file will be fetched. i.e. leads1_2024-01-29_13-02-04.csv
During every successful sync, the last sync time is updated with the new value, and the file created/modified after the sync time is imported.
3) After importing data, click the Export now option from the Export menu on the DataPrep Studio page and export it to the required destination before reloading, or you'll lose your data.
4) From the DataPrep studio page, select the Import menu in the top bar and click Reload data from source.
5) The next file i.e. leads2_2024-01-29_13-10-20.csv will be fetched incrementally. Again, export it to the required destination before reloading, or you'll lose your data.
6) Click the ruleset icon in the top-right corner of the DataPrep Studio page to view the Ruleset pane.
7) In the Ruleset pane, click the data source configuration icon and open the Data source details page.
8) In the data source details page, enter the specific file pattern from where you want to import next in the File pattern field. Click Update. Eg leads6_2024-02-21_12-32-51.csv.*
9) Go to the DataPrep studio page, select the Import menu in the top bar and click Reload data from source.
The files leads3, leads4, leads5 will be skipped, and the file leads6 will be fetched. The modified time will be tracked.
Export this file to the required destination.
10) Now again, navigate to the data source details page and change the file pattern to generic form. Eg. leads.*
11) Schedule the data import and export to set a pipeline.
12) To schedule the import,
a. Click the Schedule import link.
b. In the Schedule config section, select a Repeat method (Every 'N' hours, Every day, Weekly once, Monthly once). Choose a time to repeat (i.e. set a frequency) using the Perform every option.
Select the Time zone to export data. By default, your local timezone will be selected.
c. Select the checkbox if you want to Import new columns found in the source data.
d. Click Save to schedule import for your dataset.
13) After scheduling the import, schedule the export destination for your dataset; if not, the import will be done continuously, but without export, the data will be lost.
14) After scheduling, the new files with the same pattern will be fetched incrementally using the last synced time. Eg. leads7, leads8 etc., will be imported incrementally and exported at regular intervals.
Follow the steps below to import files from the middle during incremental fetch.
1) Import the file using a specific file pattern.
Eg leads6_2024-02-21_12-32-51.csv.*
2. Initially, only the specific file will be fetched. i.e. leads6_2024-02-21_12-32-51.csv
During every successful sync, the last sync time is updated with the new value, and the file created/modified after the sync time is imported.
3. After importing the data, click the Export now option from the Export menu on the DataPrep Studio page and export it to the required destination before reloading, or you'll lose your data.
4. Click the ruleset icon in the top-right corner of the DataPrep Studio page to view the Ruleset pane.
5. In the Ruleset pane, click the data source configuration icon and open the Data source details page.
6. In the data source details page, enter the generic file pattern from where you want to import next incrementally in the File pattern field. Click Update. Eg leads.*
7. Schedule the data import and export to set a pipeline.
To schedule the import,
a) Click the Schedule import link.
b) In the Schedule config section, select a Repeat method (Every 'N' hours, Every day, Weekly once, Monthly once). Choose a time to repeat (i.e. set a frequency) using the Perform every option.
Select the Time zone to export data. By default, your local timezone will be selected.
c) Select the checkbox if you want to import new columns found in the source data.
d) Click Save to schedule import for your dataset.
8. After scheduling the import, schedule the export destination for your dataset; if not, the import will be done continuously, but without export, the data will be lost.
9. After scheduling, the new files with the same pattern will be fetched incrementally using the last synced time. Eg. leads7, leads8 etc., and all the news files will be imported incrementally and exported at regular intervals.
SEE ALSO
How to import data from cloud databases?
How to import data from saved data connections?
What other cloud storage options are available in Zoho DataPrep?
How to import data from Google Drive?
Learn how to use the best tools for sales force automation and better customer engagement from Zoho's implementation specialists.
If you'd like a personalized walk-through of our data preparation tool, please request a demo and we'll be happy to show you how to get the best out of Zoho DataPrep.
You are currently viewing the help pages of Qntrl’s earlier version. Click here to view our latest version—Qntrl 3.0's help articles.