Text and pattern matching in Zoho DataPrep

Text and pattern matching

Zoho DataPrep supports the following text and pattern matching types. 

Text literals: Text literals are used to match the exact specified text in your data. 

Regular expressions: Regex are used to match data based on the expression provided. For example, the regex ^\d+ is used to match any number of digits in the input data from the staring position.

Patterns: Pattern tokens provide a simpler and more readable alternative to regular expressions. 

The following section describes how pattern matching works in Zoho DataPrep.

Patterns

The following table contains the list of tokens and the syntax for each pattern supported in DataPrep.

Character patterns

These tokens apply to one or more characters part of the text data type.
 
 Pattern
Description
 {alpha}
Set of all alphabets [A-Za-z]
 {alpha_numeric}
Set of all alphabets and numbers [A-Za-z0-9]
 {lower}
Set of all lower characters [a-z]
 {upper}
Set of all upper characters [A-Z]
 {digit}
Set of all integers [0-9]
 {number}
Set of all integers and decimal numbers
 {special_character}
Set of all special characters [e.g. -/,*&^%#@! etc.]
 {white_space}
A white space character [' ']
 {any}
Set of all characters
 {other}
Set of all non-ASCII characters
 {'constant'}
Matches the given text enclosed inside single quotes or double quotes
 {word}
Set of all alphabets, numbers and underscore i.e. {alpha_numeric} (including underscore)
 {username}
Characters prefixed with @
 {hashtag}
Characters prefixed with #

Position patterns

These tokens apply to the positions relative to the text data type. 
  
 Pattern
Description
 {start}
Start of the line
 {end}
End of the line

Pattern construction

The following rules apply while constructing patterns for matching data.
  1. Enclose a pattern token within '{ }'.
    E.g. {alpha}

  2. Append a '*' after the token to match zero or more number of occurrences.
    E.g. {alpha}*

  3. Append a '+' after the token to match one or more number of occurrences.
    E.g. {alpha}+

  4. Append a number after the token to match the exact number of occurrence.
    E.g. {alpha}{3}

  5. Enclose a lower and upper bound constraint within '{ }' after the token to match within a specified range.
    E.g. {alpha}{2,5}

  6. Enclose constants within ' ' to form a token.
    E.g. {'Bob'} 
You can add tokens one after the other to construct a full length pattern with multiple matching conditions. If you wish to match either of the two patterns, use the logical OR condition using ' | ' in between the two patterns.
For e.g., {'Bob'} | {'Robert'}

Pattern examples

1. Pattern to match the first word in a text. 

Input data

DataPrep supports pattern matching.

Pattern

{start}{alpha}+

Matched data

DataPrep


2. Pattern to match the last 3 digits of a country calling code. 

Input data

+1 340

Pattern

{digit}{3}{end}

Matched data

340

 
3. Pattern to match a credit card number: 

Input data

1234-1234-1234-1234

Pattern

{start}{digit}{4}{special_character}{digit}{4}{special_character}{digit}{4}{special_character}{digit}{4}{end}

Matched data

1234-1234-1234-1234


4. Pattern to match the last 3 digits of a country calling code. 
NotesNote: You can include the {start} and {end} tokens to match the data only when the pattern is found across the entire cell value.

SEE ALSO