Text and pattern matching

Text and pattern matching

Zoho DataPrep supports the following text and pattern matching types. 

Text literals: Text literals are used to match the exact specified text in your data. Written using single quotes or double quotes ( '...' or "..." ).

Regular expressions: Regex are used to match data based on the expression provided. For example, the regex ^\d+ is used to match any number of digits in the input data from the staring position.

Patterns: Patterns provide a simpler and a more readable alternative to regular expressions. Using pattern tokens you can specify the data type, or a sequence of characters, or a specific data pattern you want to match. Patterns are written using back ticks (`...`) and each token is enclosed within { } curly brackets.

The following section describes how pattern matching works in Zoho DataPrep.

Patterns

The following table contains the list of tokens and the syntax for each pattern supported in DataPrep.

Character patterns

These tokens apply to one or more characters part of the text data type.

Pattern

Description

`{alpha}`

Set of all alphabets [A-Za-z]

`{alpha_numeric}`

Set of all alphabets and numbers [A-Za-z0-9]

`{lower}`

Set of all lower characters [a-z]

`{upper}`

Set of all upper characters [A-Z]

`{digit}`

Set of all integers [0-9]

`{number}`

Set of all integers and decimal numbers

`{special_character}`

Set of all special characters [e.g. -/,*&^%#@! etc.]

`{white_space}`

A white space character [' ']

`{any}`

Set of all characters

`{other}`

Set of all non-ASCII characters

`{word}`

Set of all alphabets, numbers and underscore i.e. {alpha_numeric} (including underscore)

`{username}`

Characters prefixed with @

`{email}`

A valid email address (e.g. abc@xyz.com)

`{url}`

A valid URL address (e.g. https://example.com/)

`{ip}`

A valid IP address (e.g. 175.16.252.1)

`{hashtag}`

Characters prefixed with #

Position patterns

These tokens apply to the positions relative to the text data type. 

Pattern

Description

`{start}`

Asserts position at the start of a line

`{end}`

Asserts position at the end of a line


Pattern construction

The following rules apply while constructing patterns for matching data.
  1. Enclose a pattern token within '{ }'.
    E.g. `{alpha}`

  2. Append a * after the token to match zero or more number of occurrences.
    E.g. `{alpha}*`

  3. Append a '+' after the token to match one or more number of occurrences.
    E.g. `{alpha}+`

  4. Append a '?' after the token to match zero or one number of occurrences.
    E.g. `{alpha}*`

  5. Append a number after the token to match the exact number of occurrence.
    E.g. `{alpha}{3}`

  6. Enclose a lower and upper bound constraint within '{ }' after the token to match within a specified range.
    E.g. `{alpha}{2,5}`

  7. Enclose text literal within single quotes ' ' or double quotes " " to form a token containing only a text literal.
    E.g. 'Bob', "Bob"
Text literals can also be added in a pattern. For example, `{alpha}+Bob`.  

However, if the text to match contains predefined DataPrep pattern token characters, they must be prefixed using the escape sequence character ' \ ' to exactly match the text. For example, if the text to match is "1234Rob{ert", you need to specify the pattern this way, `{digit}{4}Rob\{ert`.

If you wish to match either of the two patterns, use the logical OR condition using ' | ' in between the two patterns. For example, `({alpha}+Bob) | ({alpha}+Robert)`.

You can add tokens one after the other to construct a full length pattern with multiple matching conditions. Let us look at some of the examples below.

Pattern examples

1. Pattern to match the first word in a text. 

Input data

DataPrep supports pattern matching.

Pattern

`{start}{alpha}+`

Matched data

DataPrep


2. Pattern to match the last 3 digits of a country calling code. 

Input data

+1 340

Pattern

`{digit}{3}{end}`

Matched data

340


3. Pattern to match a credit card number. 
 

Input data

1234-1234-1234-1234

Pattern

`{start}{digit}{4}{special_character}{digit}{4}{special_character}{digit}{4}{special_character}{digit}{4}{end}`

Matched data

1234-1234-1234-1234


You can also create groups using parenthesis '( )' to further simplify writing patterns. For example, the pattern to match the credit card number has the {digit}{4}{special_character} tokens repeated thrice. They can be grouped together and the pattern can be rewritten as given below.
`{start}({digit}{4}{special_character}){3}{digit}{4}{end}`
Note: You can include the {start} and {end} tokens to match the data only when the pattern is found across the entire cell value.


    Zoho DataPrep Personalized Demo

    If you'd like a personalized walk-through of our data preparation tool, please request a demo and we'll be happy to show you how to get the best out of Zoho DataPrep.

    Zoho CRM Training

      Create, share, and deliver

      beautiful slides from anywhere.

      Get Started Now





              Zoho CRM Training Programs

              Learn how to use the best tools for sales force automation and better customer engagement from Zoho's implementation specialists.

              Zoho CRM Training

                  Zoho SalesIQ Resources



                      Zoho TeamInbox Resources




                                Zoho DataPrep Resources

                                  Zoho DataPrep Demo

                                  Get a personalized demo or POC

                                  REGISTER NOW


                                    Design. Discuss. Deliver.

                                    Create visually engaging stories with Zoho Show.

                                    Get Started Now











                                                          • Related Articles

                                                          • Extract from text

                                                            Zoho DataPrep offers options to identify and extract a subset of the data from a column. You can extract very specific portions of the column data using the extract transform. For example, your column contains a mixture of letters and numbers, but ...
                                                          • Target Matching

                                                            Target matching allows you to set a target and align the source dataset to match with your target before exporting data. You can import a target dataset to match the column structure, formats, and the data types in the existing source dataset. This ...
                                                          • Create buckets - Text

                                                            You can choose to implement bucketing in a text column using conditions. Consider a situation where you need to assign people to specific teams. You can implement the logic of adding values to a bucket, which in this case is the team name.  Name Team ...
                                                          • Find and Replace

                                                            Find and replace values in your column using the replace transform. You can find the value to replace by matching the value using different options available and replace it with your input. To find and replace column data: 1. Right-click the column ...
                                                          • Count

                                                            DataPrep allows you to count the occurrence of data in a column based on the condition given. You can count the values that match your selection based on the different patterns that fit your selection.  To count the desired value in a column: 1. ...

                                                          Resources

                                                          Videos

                                                          Watch comprehensive videos on features and other important topics that will help you master Zoho CRM.



                                                          eBooks

                                                          Download free eBooks and access a range of topics to get deeper insight on successfully using Zoho CRM.



                                                          Webinars

                                                          Sign up for our webinars and learn the Zoho CRM basics, from customization to sales force automation and more.



                                                          CRM Tips

                                                          Make the most of Zoho CRM with these useful tips.



                                                            Zoho Show Resources