Regex Performance Tester

Most regular expressions are fast enough. The ones that are not usually fail in a boring way: they work on small input and then get slow when the input is longer or almost matches.

I test for that before using a regex on untrusted or large input.

The pattern I watch for

The classic problem is nested repetition:

^(a+)+$

The pattern looks small. It can get expensive on input that almost matches:

aaaaaaaaaaaaaaaaaaaaaaaaaaaaa!

The engine tries many ways to split the a characters across the nested groups before it gives up.

I do not need a benchmark for every regex. I do need to pause when I see repetition inside repetition, especially if the input can be long.

Make the regex more specific

The first fix is usually to make the pattern less vague.

Instead of:

^(.+)+$

I look for something closer to the input I expect:

^[A-Za-z0-9_-]+$

Instead of matching “anything”, match the characters that are valid.

Use anchors when the whole string should match

If the whole string must match, I anchor it:

^[A-Z]+-[0-9]+$

Without anchors, the engine can search around inside a string. That may be correct. It may also be unnecessary work.

Test with bad input

I test with:

valid short input
valid long input
invalid input that fails quickly
invalid input that almost matches
empty input

The fourth case matters. Many slow regex problems show up when the input is almost valid.

Keep the target engine in mind

Regex performance depends on the engine. JavaScript, Swift, Ruby, nginx, grep, and sed do not all behave the same way.

I use Regex for macOS for the first pass. It is useful for building the expression and checking matches. If the pattern will run in a server, CLI tool, or app hot path, I also test it where it will run.

A small timing check

For a quick command-line check, I use the target language and a sample input.

Ruby example:

require "benchmark"

pattern = /\A[A-Za-z0-9_-]+\z/
input = "a" * 10_000

puts Benchmark.measure {
  1_000.times { pattern.match?(input) }
}

The script is a cheap way to catch obviously bad patterns. I still test the final code path if the regex matters.

My regex performance checklist

Before I use a regex on large or untrusted input, I check:

no nested repetition unless I have a good reason
specific character classes instead of .* everywhere
anchors when I need a whole-string match
clear maximum lengths where the app can enforce them
bad input that almost matches
the actual engine that will run the regex

If the pattern is hard to reason about, I simplify it or replace it with parsing code.

You can get Regex Tester/Builder on the Mac App Store.