How to extract Email address from string (Deluge)
The task was to extract an email address from a string variable, or any text-based field.
It is usually done in javascript using regular expressions.
Surprisingly, Deluge allows to replace substring using regular expression, but do not allow to search substring using regular expression. OMG.
I finally came to below solution. It works.
string GetEmailFromText (string Txt)
{
/* Extract first email address from the text Txt */
original_text = input.Txt;
EMAIL_REGEX = ("[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?");
masked_text1 = original_text.replaceFirst(EMAIL_REGEX,"##EML##");
masked_text2 = original_text.replaceFirst(EMAIL_REGEX,"");
email_address_position = masked_text1.indexOf("##EML##");
email_address_length = (input.Txt.length() - masked_text2.length());
email_address = original_text.subString(email_address_position,(email_address_position + email_address_length));
return email_address;
}
The idea is:
- the function uses regular expression to search for email address; I have not created regex myself, just copied from javascript discussions;
- I create first copy of original string, where I replace email address with stupid set of characters (in my case, I used ##EML## ) Then, I search for position of ##EML## in the string. It will be the same as position of email address in original text.
- Second task is to understand a length of email address. I create second copy of original string, where I delete email address entirely. Actually, I replace it with empty string. Now, length of this new string will be shorter than original string by a length of email address. So, we can calculate it.
- Finally, we know position of email address in original text, and its length, so we can extract it easily.
Are there more simple solutions?
What is the most effective way to extract multiple email addresses from a text?