One very common parsing operation is to collect all consecutive characters that meet some condition. For example, the parser below collects all consecutive digits, and can be used to parse integers.
val digit: Parser[Char] = Parser.charIn('0', '1', '2', '3', '4', '5', '6', '7', '8', '9') val number: Parser[String] = digit.map(_.toString).oneOrMore
I decided to start benchmarking with this common case. I created four benchmarks, which you can find in
numberParsermeasures the baseline performance of the parser combinator library;
numberPatternLoopmeasures the same algorithm written without the parser combinator framework, giving an idea of the overhead the combinator library adds;
numberCharacterClassLoopreplaces the explicit conditional with a call to the
numberPatternStringBuilderLoopmeasures the effect of using a
StringBuilderinstead of concatenating
You can run the benchmarks using the following
project benchmarksto change into the benchmarks projects within sbt; and
Jmh / runto run the benchmarks with the default JMH settings.
The benchmarks will take a few minutes to run, which is good for getting accurate results but not for quick iteration when developing optimizations. The command
Jmh / run -h will output the many arguments you can pass to JMH to change how it runs the benchmarks. Using
Jmh / run -i 1 -wi 1 -f 1 will run many fewer iterations, giving results in a few seconds at the risk of more inaccuracy in measurements. In my testing this was accurate enough for the large performance gain we're looking for here, though I did verify the results with a longer run once I'd finished an optimization.
Results from my initial benchmark run are below. Larger numbers are better and I've ordered the results from slowest to fastest. The absolute values don't matter; what is important is the relative differences in performance.
We can see:
- the parser combinator approach is at least an order of magnitude slower than any other approach, suggesting the combinator library adds significant overhead;
- using a
StringBuilderis about twice as fast as concatenating
- using the character class method
isDigitis a slight improvement over the explicit test.
[info] Benchmark Mode Cnt Score Error Units [info] RepeatBenchmark.numberParser thrpt 25 181950.007 ± 1698.243 ops/s [info] RepeatBenchmark.numberPatternLoop thrpt 25 2583254.936 ± 285273.465 ops/s [info] RepeatBenchmark.numberCharacterClassLoop thrpt 25 3278595.556 ± 21373.975 ops/s [info] RepeatBenchmark.numberPatternStringBuilderLoop thrpt 25 6619869.553 ± 54507.164 ops/s
We can use these findings to inform API design and internal optimizations.