Regex for email

Regex expressions (Regular Expression) are a very handy thing to find a text with a certain pattern within another text. However, if you need a regex expression to find an email address in a text, things quickly get complicated where they don’t need to be. In my research on the internet, I came across a great many examples of regex patterns and long forum discussions about the various regex expressions to find a mail address. Unfortunately, in my case, very few of them led to a match where there should have been a match, so I ended up putting together my own pattern.

In my specific case, I wanted to compose an automatic reply e-mail using a VBA macro and, in the case of forwarded e-mails, the original sender of an e-mail was to be determined from the mail text and then the original mail was to be marked as read and moved to another folder.

Regular expression to determine email address

To get the email address, I used the following VBA code. Of course, the regex code can also be used in any other programming language, such as C#. If necessary, you only have to consider a few specifics of the respective programming language.

 Dim myRegex As Object
    Dim results As Object
    Dim match As Object

    Set myRegex = New RegExp
    myRegex.pattern = "[\w.-]*@[\w.-]*\.[\w]{2,6}"
    myRegex.Global = True
    Set results = myRegex.Execute(body)

    For Each match In results
        Debug.Print match.Value
        If match.Value = "searched@mail.de" Then
            'do something here.
            Exit For
        End If
    Next

So the actual regular expression consists of:

[\w.-]*@[\w.-]*\.[\w]{2,6}

For explanation:

You can clearly see the @ sign. In front of it is the expression [\w.-]*. \w stands for any letters, underscore and any digits from 0-9. A dot “.” and a hyphen have also been added. The “*” asterisk after the square brackets states that the aforementioned characters can occur in any number. This is followed by the @ sign and again the same expression, which now stands for the domain name. The domain name can therefore also consist of any letters, numbers and also subdomains, separated by a dot.
This is followed by the dot that delimits the Top-Level Domain (TLD), e.g. .de, .us, .com, .fr, .es. Since the dot itself has a meaning in the regex, it stands for almost any character, the dot must be written with backslash “\.”.
And then follows the actual expression for the top-level domain [\w]{2,6}. This states that the top-level domain may consist of any characters (letters, digits, underscore) and may have a minimum of 2 characters and a maximum of 6 characters. This can certainly be made a little more precise, as I am not aware of any TLD that contain digits and, on the other hand, there are already TLDs with more than 6 characters, such as .example or .hamburg. In most cases, however, the above expression does a good job.

So this will capture all mail addresses, such as.

  • erwin@ekiwi-blog.de
  • erwin.mueller@ekiwi.de
  • pergamont@berlin.museum
  • funny_ow.1.outreach@gma1il.ffg.com

Not covered are mails with special characters or very long TLD. If there is such a thing, you can adjust the regex expression accordingly by adding the special characters or adjusting the maximum number of characters in the TLD.

  • jonny.doe+marry-ann@gmail.com
  • John.O’Connor@gmail.ie
  • “Daniel\ O’Miller”@googlemail.com
  • toIPDomain@[1.2.3.4]

Leave a Reply

Your email address will not be published. Required fields are marked *