{"id":54130,"date":"2023-08-13T16:37:44","date_gmt":"2023-08-13T15:37:44","guid":{"rendered":"https:\/\/ekiwi-blog.de\/54130\/regex-for-e-mail\/"},"modified":"2023-08-13T16:41:28","modified_gmt":"2023-08-13T15:41:28","slug":"regex-for-email","status":"publish","type":"post","link":"https:\/\/ekiwi-blog.de\/en\/54130\/regex-for-email\/","title":{"rendered":"Regex for email"},"content":{"rendered":"<p><abbr title=\"Regular Expression\">Regex<\/abbr> expressions (<em>Regular Expression<\/em>) are a very handy thing to find a text with a certain pattern within another text. However, if you need a regex expression to find an email address in a text, things quickly get complicated where they don&#8217;t need to be. In my research on the internet, I came across a great many examples of regex patterns and long forum discussions about the various <a href=\"https:\/\/en.wikipedia.org\/wiki\/Regular_expression\" target=\"_blank\" rel=\"noopener\">regex expressions<\/a> to find a mail address. Unfortunately, in my case, very few of them led to a match where there should have been a match, so I ended up putting together my own pattern.<\/p>\n<p>In my specific case, I wanted to compose an automatic reply e-mail using a VBA macro and, in the case of forwarded e-mails, the original sender of an e-mail was to be determined from the mail text and then the original <a title=\"VBA code to mark e-mail as read\" href=\"https:\/\/ekiwi-blog.de\/en\/53092\/vba-outlook-mark-e-mail-as-read\/\">mail was to be marked as read<\/a> and <a title=\"Automatically move mails to another folder with VBA and Outlook\" href=\"https:\/\/ekiwi-blog.de\/en\/54082\/vba-outlook-automatically-move-e-mail-to-another-folder-with-macro\/\">moved to another folder<\/a>.<\/p>\n<h2>Regular expression to determine email address<\/h2>\n<p>To get the email address, I used the following <abbr title=\"Visual Basic for Application\">VBA<\/abbr> code. Of course, the regex code can also be used in any other programming language, such as C#. If necessary, you only have to consider a few specifics of the respective programming language.<\/p>\n<pre><code><span style=\"color: #0000ff;\"> Dim<\/span> myRegex <span style=\"color: #0000ff;\">As<\/span> Object\r\n    <span style=\"color: #0000ff;\">Dim<\/span> results <span style=\"color: #0000ff;\">As<\/span> Object\r\n    <span style=\"color: #0000ff;\">Dim<\/span> match <span style=\"color: #0000ff;\">As<\/span> Object\r\n\r\n    <span style=\"color: #0000ff;\">Set<\/span> myRegex = <span style=\"color: #0000ff;\">New<\/span> RegExp\r\n    myRegex.pattern = <mark>\"[\\w.-]*@[\\w.-]*\\.[\\w]{2,6}\"<\/mark>\r\n    myRegex.Global = <span style=\"color: #0000ff;\">True<\/span>\r\n    <span style=\"color: #0000ff;\">Set<\/span> results = myRegex.Execute(body)\r\n\r\n    <span style=\"color: #0000ff;\">For Each<\/span> match In results\r\n        Debug.Print match.Value\r\n        <span style=\"color: #0000ff;\">If<\/span> match.Value = \"searched@mail.de\" <span style=\"color: #0000ff;\">Then<\/span>\r\n            <span style=\"color: #339966;\">'do something here<\/span>.\r\n            <span style=\"color: #0000ff;\">Exit For<\/span>\r\n        <span style=\"color: #0000ff;\">End If<\/span>\r\n    <span style=\"color: #0000ff;\">Next\r\n<\/span><\/code><\/pre>\n<p>So the actual regular expression consists of:<\/p>\n<p style=\"text-align: centre;\"><strong><mark>[\\w.-]*@[\\w.-]*\\.[\\w]{2,6}<\/mark><\/strong><\/p>\n<p><strong>For explanation:<\/strong><\/p>\n<p>You can clearly see the @ sign. In front of it is the expression <em>[\\w.-]*<\/em>. <em>\\w<\/em> stands for any letters, underscore and any digits from <em>0-9<\/em>. A dot &#8220;.&#8221; and a hyphen have also been added. The <em>&#8220;*&#8221;<\/em> asterisk after the square brackets states that the aforementioned characters can occur in any number. This is followed by the <strong>@ sign<\/strong> and again the same expression, which now stands for the domain name. The domain name can therefore also consist of any letters, numbers and also subdomains, separated by a dot.<br \/>\nThis is followed by the dot that delimits the <strong>Top-Level Domain (TLD)<\/strong>, e.g. <em>.de<\/em>, <em>.us<\/em>, <em>.com<\/em>, <em>.fr<\/em>, <em>.es<\/em>. Since the dot itself has a meaning in the regex, it stands for almost any character, the dot must be written with backslash <em>&#8220;\\.&#8221;<\/em>.<br \/>\nAnd then follows the actual expression for the top-level domain <em>[\\w]{2,6}<\/em>. This states that the top-level domain may consist of any characters (letters, digits, underscore) and may have a minimum of 2 characters and a maximum of 6 characters. This can certainly be made a little more precise, as I am not aware of any <abbr title=\"Top-Level-Domain\">TLD<\/abbr> that contain digits and, on the other hand, there are already TLDs with more than 6 characters, such as <em>.example<\/em> or <em>.hamburg<\/em>. In most cases, however, the above expression does a good job.<\/p>\n<p>So this will capture all mail addresses, such as.<\/p>\n<ul>\n<li>erwin@ekiwi-blog.de<\/li>\n<li>erwin.mueller@ekiwi.de<\/li>\n<li>pergamont@berlin.museum<\/li>\n<li>funny_ow.1.outreach@gma1il.ffg.com<\/li>\n<\/ul>\n<p><strong>Not<\/strong> covered are mails with special characters or very long TLD. If there is such a thing, you can adjust the regex expression accordingly by adding the special characters or adjusting the maximum number of characters in the TLD.<\/p>\n<ul>\n<li>jonny.doe+marry-ann@gmail.com<\/li>\n<li>John.O&#8217;Connor@gmail.ie<\/li>\n<li>&#8220;Daniel\\ O&#8217;Miller&#8221;@googlemail.com<\/li>\n<li>toIPDomain@[1.2.3.4]<\/li>\n<\/ul>","protected":false},"excerpt":{"rendered":"<p>Regex expressions (Regular Expression) are a very handy thing to find a text with a certain pattern within another text.<\/p>\n","protected":false},"author":2,"featured_media":24370,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1555],"tags":[1700,3239,1861,1862],"class_list":["post-54130","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-software-en","tag-programming","tag-regex-en","tag-vba-en-2","tag-visual-basic-for-application-en-2"],"_links":{"self":[{"href":"https:\/\/ekiwi-blog.de\/en\/wp-json\/wp\/v2\/posts\/54130","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ekiwi-blog.de\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/ekiwi-blog.de\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/ekiwi-blog.de\/en\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/ekiwi-blog.de\/en\/wp-json\/wp\/v2\/comments?post=54130"}],"version-history":[{"count":0,"href":"https:\/\/ekiwi-blog.de\/en\/wp-json\/wp\/v2\/posts\/54130\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/ekiwi-blog.de\/en\/wp-json\/wp\/v2\/media\/24370"}],"wp:attachment":[{"href":"https:\/\/ekiwi-blog.de\/en\/wp-json\/wp\/v2\/media?parent=54130"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/ekiwi-blog.de\/en\/wp-json\/wp\/v2\/categories?post=54130"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/ekiwi-blog.de\/en\/wp-json\/wp\/v2\/tags?post=54130"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}