Optimizing regular expresions is an exceptionally complex field. A vast majority of the problems within this field is NP or NP-COMPLETE. Meaning that their solutions cannot be bound (time/space) under any definite polynomial.
I am writing about this topic because I am in the process of learning ASP.NET and I was just testing some of it’s regular expression capabilities so I tried (as you can see I wasn’t thinking):
And it took longer than I thought. As soon as I changed the regular expression to the equivalent regular expression:
It ran much much faster. This is a special case of nesting the Kleene star (*) which always takes precedence over the (+) operator. Most text-books actually don’t even define the (+) operator because it can be simplied to the following:
a+ = aa*
or in our case
[a-z]+ = [a-z][a-z]*
If we take it a little futher we get:
([a-z]+)*! = ([a-z][a-z]*)*! = ([a-z]*[a-z]*)! = [a-z]*! (COMPLEX STEP)
The last step I would assume is where .NET couldn’t reduce any further. The last step although easy to see why they are equivalent it’s hard to develop a general algorithm to detect that.