Language detection in Zoho DataPrep

Language detection




You can detect the language of the text in the selected text column using the Language detection operation powered DataPrep's own machine learning engine . As an example, if the text value in the selected column is, "Hello, World!", the Language detection transform will return: 'English'.

 

The transform supports over 178 languages in total. The list of languages supported are:


S.NoLanguageLanguage Code
1Afrikaansaf
2Tosk Albanianals
3Amharicam
4Aragonesean
5Arabicar
6Egyptian Arabicarz
7Assameseas
8Asturianast
9Avaricav
10Azerbaijaniaz
11South Azerbaijaniazb
12Bashkirba
13Bavarianbar
14Central Bikolbcl
15Belarusianbe
16Bulgarianbg
17Bihari languagesbh
18Banglabn
19Tibetanbo
20Bishnupriyabpy
21Bretonbr
22Bosnianbs
23Russia Buriatbxr
24Catalanca
25Chavacanocbk
26Chechence
27Cebuanoceb
28Central Kurdishckb
29Corsicanco
30Czechcs
31Chuvashcv
32Welshcy
33Danishda
34Germande
35Dimli (individual language)diq
36Lower Sorbiandsb
37Dotyalidty
38Divehidv
39Greekel
40Unknown language [eml]eml
41Englishen
42Esperantoeo
43Spanishes
44Estonianet
45Basqueeu
46Persianfa
47Finnishfi
48Frenchfr
49Northern Frisianfrr
50Western Frisianfy
51Irishga
52Scottish Gaelicgd
53Galiciangl
54Guaranign
55Goan Konkanigom
56Gujaratigu
57Manxgv
58Hebrewhe
59Hindihi
60Fiji Hindihif
61Croatianhr
62Upper Sorbianhsb
63Haitian Creoleht
64Hungarianhu
65Armenianhy
66Interlinguaia
67Indonesianid
68Interlingueie
69Ilokoilo
70Idoio
71Icelandicis
72Italianit
73Japaneseja
74Lojbanjbo
75Javanesejv
76Georgianka
77Kazakhkk
78Khmerkm
79Kannadakn
80Koreanko
81Karachay-Balkarkrc
82Kurdishku
83Komikv
84Cornishkw
85Kyrgyzky
86Latinla
87Luxembourgishlb
88Lezghianlez
89Limburgishli
90Lombardlmo
91Laolo
92Northern Lurilrc
93Lithuanianlt
94Latvianlv
95Maithilimai
96Malagasymg
97Eastern Marimhr
98Minangkabaumin
99Macedonianmk
100Malayalamml
101Mongolianmn
102Marathimr
103Western Marimrj
104Malayms
105Maltesemt
106Mirandesemwl
107Burmesemy
108Erzyamyv
109Mazanderanimzn
110Nahuatl languagesnah
111Neapolitannap
112Low Germannds
113Nepaline
114Newarinew
115Dutchnl
116Norwegian Nynorsknn
117Norwegian Bokmålno
118Occitanoc
119Odiaor
120Osseticos
121Punjabipa
122Pampangapam
123Palatine Germanpfl
124Polishpl
125Piedmontesepms
126Western Panjabipnb
127Pashtops
128Portuguesept
129Quechuaqu
130Romanshrm
131Romanianro
132Russianru
133Rusynrue
134Sanskritsa
135Sakhasah
136Sardiniansc
137Sicilianscn
138Scotssco
139Sindhisd
140Serbian (Latin)sh
141Sinhalasi
142Slovaksk
143Sloveniansl
144Somaliso
145Albaniansq
146Serbiansr
147Sundanesesu
148Swedishsv
149Swahilisw
150Tamilta
151Telugute
152Tajiktg
153Thaith
154Turkmentk
155Filipinotl
156Turkishtr
157Tatartt
158Tuviniantyv
159Uyghurug
160Ukrainianuk
161Urduur
162Uzbekuz
163Venetianvec
164Vepsvep
165Vietnamesevi
166West Flemishvls
167Volapükvo
168Walloonwa
169Waraywar
170Wu Chinesewuu
171Kalmykxal
172Mingrelianxmf
173Yiddishyi
174Yorubayo
175Cantoneseyue
176Chinesezh
177Chinese(Simplified)zh-CN
178Chinese(Traditional)zh-TW

To detect languages in a column

1. Right-click the column and select Language detection  transform from the context menu.



2. Provide a name to the resultant column in the New column name section. 


3. Select the type of output required. As the option name suggests, Language name will render the name of the language as the output, and Language code will render the code of the language.


4. For example, selecting the Language name will give 'English' as the output for an English text , and Language code for an English text will give 'en' as the output. 


5. DataPrep shows a live preview of the column during the transform. You can click the Preview button at the bottom of the side panel to preview the output column.


6. You can apply this transform to only one column. Click the Apply button to apply this transform.


Note : Language detection transform gives accurate results when the text length is 50 characters or more.

To apply filters

If you want to apply some filters along with this transform, you can use the filters functionality.

1. Click the  Filters  tab.

2. Click the   icon and add the required columns in the  Filters  section. You can also reorder the filters using the drag and drop method.



3. For every column added, you can select one of the following options from the drop-down:
  1. Actual: This option lets you filter rows based on the actual values in the column. Click here to know more.
  2. Data quality: This option lets you filter rows based on the quality of data in the column. Click here to know more.
  3. Patterns: This option helps you filter rows based on the data patterns in the selected column. Click here to know more.
  4. Seasonal: This option helps you filter rows based on the seasonal parameters such as quarter, month, week, etc. Click here to know more.
  5. Outliers: This option allows you to filter rows based on the outliers present in the data of the selected column. Click here to know more. 
Note: The filter options are displayed based on the datatype of the column added for the filter.

4. When you add more than one filter to the  Filters  section, the logical operators, AND or OR appear next to the filters. You can click to toggle the logical operator between AND and OR.
  1. Using the logical operators, you can combine the conditions and apply logic to determine the rule of precedence. The final expression is displayed in the  Criteria expression  box. You can click  Edit  to alter the default expression using logical operators and parenthesis to specify the precedence or the sequential order as to which condition should be evaluated first. Click  Save  after making the required changes. 
  1. For example, In the expression, ((1 OR 2) AND (3 OR 4)) , at first the condition ( 1 OR 2 ) will be executed and the condition ( 3 OR 4 ) will be executed next. Thirdly, since, the AND operator is used, the filter will be applied when both the conditions are true.
5. You can further drill down to choose specific values based on the filter option selected for each filter, in the next section.



For example, in the above screenshot, the  Data quality  option is selected for the All columns filter in the  Filters section. Based on the selection, further options to filter specific values are displayed in the  All columns (Data quality) section.

6. You can choose to include or exclude the selected items in the last section.

7. If you want to remove all the filters for some reason, you can use the  Clear  button.

8. A live preview of the filter transform is shown as you make changes. 

9. Click the  Apply  button to apply the transform along with the filters.

To sort data

Under the Sort tab, you can sort data in the ascending or descending order based on any column. You can choose the column in the Sort by column drop down and choose the order to be sorted. 

You can use this functionality only with the transform and not as a standalone function. However, you can use the Sort transform if you want only to sort data.



SEE ALSO
Learn about keyword extraction
Learn about sentiment analysis