by Andreas

Email validation – finally a .NET regular expression that works!

imageNote: this is only relevant for .NET 3.5 and earlier. In .NET 4 the System.Net.Mail.MailAddress class includes validation and will throw an exception if instantiated with an invalid address. See comments below. 

Validating an email address sounds simple, and it really is! Until your fancy validator is released into the real world and people’s actual email addresses start pouring in, instead of the three email addresses you’ve been testing while developing: your work address, your Gmail address and an old Yahoo address you created in 2001.

chris-@somedomain.com
!habla-hobla%wow@somedomain.com
!def!xyz%abc@example.com
joe_blow_@somedomain.com
love/hate=relationship@example.com
$ilovemoney1234@example.com

They are, believe it or not, all valid according to the RFC spec. That doesn’t mean that you’ll be allowed to actually register them everywhere. Some (most?) email providers have their own rules that are much stricter than the specifications, simply because it makes no sense to allow all sorts of rubbish.

But if you’re writing a system allowing people to register using their email address, and you confirm their ownership of the given address by sending a confirmation email you really can’t be any stricter than the RFC and this is where the problem lies. I was also under the impression that “blah, how often would THAT become an issue?!”, but recently I got feedback from three different users within just a few weeks that they were unable to register. So I thought I’d better find a regular expression that actually works.

I’ve tried them all.. examples from Microsoft, Stack Overflow, random forums, cooked up my own and also converted a couple from other languages. It wasn’t until yesterday I stumbled across an old blog post by Phil Haack called I Knew How To Validate An Email Address Until I Read The RFC. He did all the hard work (i.e. reading the #¤%&”¤ RFC specs), and came up with an expression that so far seems to be working. So I thought I’d better save it here, because I know I will need this one again:

string pattern = @"^(?!.)(""([^""r\]|\[""r\])*""|"
                + @"([-a-z0-9!#$%&'*+/=?^_`{|}~]|(?<!.).)*)(?<!.)"
                + @"@[a-z0-9][w.-]*[a-z0-9].[a-z][a-z.]*[a-z]$";

Regex regex = new Regex(pattern, RegexOptions.IgnoreCase);
return regex.IsMatch(emailAddress);

 

So thanks to Phil for his efforts! Let us know if anyone finds a flaw in this one as well…

  • http://blog.degree.no/bloggere/ Njål

    Interesting stuff and a difficult standard!

    But if you are actually sending emails (using System.Net.Mail) to the email addresses you are validating – then you should use new System.Net.Mail.MailAddress() in order to validate. If this fails – then you will not be able to send them email anyways – and you might as well tell the user that he/she has a borderline email address. (This is also pointed out in the comments section of the blogpost by Nik).

    Another thing that you can do to ensure valid email addresses is to do a DNS lookup of the domain name – to prevent email@asdfasdfasdf56456asdf.net to be accepted.

    Anyways – I tried to run Phil Hacks unit tests on this validation – and the only email address that .NET didn’t approve was

    • "test\rblah"@example.com (r = Carriage return).

    Actually .NET allowed these email addresses – which Phils regex rejected

    • "testblah"@example.com
    • wo..oly@example.com
    • pootietang.@example.com


    public static Boolean IsValidEmail( this string input ) {
    try {
    new MailAddress( input );
    return true;
    }
    catch( Exception ) {
    return false;
    }
    }

    UPDATE: After testing more of this approach with Andreas – we found out that it works a lot better in .NET 4 and 4.5 than in previous .NET versions

  • http://blog.degree.no/bloggere/ Andreas

    There is also a way to send an email to the destination, and if the email request is marked with a certain flag (that I can’t remember at the moment) the recipient doesn’t actually get the email. Instead, a response message is sent back to the caller, indicating whether or not this user exists for the given domain.

    This has several flaws, one of them being delayed relay between the caller and the recipient.

    BUT: we have discovered a VERY important thing: Njåls method using the constructor of System.Net.Mail.MailAddress behaves differently in .NET 3.5 and .NET 4.0. The latter does validate, while the former doesn’t.

    In other words: if you’re running on .NET 4 ++, use Njåls code. If not, use Phil Haacks reg ex.