by Wojciech Sura

Advanced regular expressions

The man, who invented regular expressions surely should get the Nobel prize. I lost track of how many tasks I have completed a lot faster thanks to this useful feature built in most of text editors.

Today I used a modern extension of regular expressions called negative lookbehind. Let me tell you what it is and how can you use it to your advantage.

Modern editors (including Visual Studio) supports the following regular expression syntax:

  • expr2(?=expr1)
  • expr2(?!expr1)
  • (?<=expr1)expr2
  • (?<!expr1)epxr2

In order, they are:

  • Positive lookahead – matches expr2 if it is immediately followed by expr1.
  • Negative lookahead – matches expr2 if it is not immediately followed by expr1.
  • Positive lookbehind – matches expr2 if it is immediately preceeded by expr1.
  • Negative lookbehind – matches expr2 if it is not immediately preceeded by expr1.

What is most important is the fact, that in each case expr1 does not become part of the match.

What did I use it for today? I needed to clean up part of the HTML – I got a lot of HTML tags in single line and I wanted to break them up automatically. I used the following patterns:

  • Find: (?<![\n\r])<(?!/)
  • Replace: \r\n<

This regular expression finds all “<” characters, which are not preceeded by a newline and not followed by “/” character and adds a newline before them.

Then it was only a matter of pressing Ctrl+K, Ctrl+F, such that Visual Studio formatted the whole HTML for me automatically.