Flashes of unstyled content: C++ String Performance Benchmark

In the context of the wonderful Shed Skin Python compiler, we are looking for a good string library.

There are lots of benchmarks available for abstract data types, languages, compilers... But not for strings. At least, I was unable to find one, even for simple byte-strings, which are widely used and still a common pitfall in many applications.

So I benchmarked a few string libraries written in C/C++, namely:

std::string from the GNU C++ Standard Library (with and without garbage collection).
Better String Library, by Paul Hsieh.
Cord, by Hans Boehm.
boost::const_string, by Maxim Yegorushkin.
QString, from Qt 4.6.2.
Python Strings (with an evil interpreter hack™).
Rope (GNU C++ Standard Library extension).
A null string class (to estimate benchmark overhead and subtract it from other tests).
Perl scalars were added for reference. JavaScript V8, STLport and others have also been tried but seemed not well adapted to this benchmark, either too close or too far performance-wise.

I focused on the most common string operations:

Creation.
Naive concatenation.
Equality testing.
Slicing.

The dataset consists of 100,000 short, real-life, Python strings borrowed from the Python 2.7 test suite. They range from 0 to 16kB, accounting for a total of ~1MB (median size: 8 bytes). See how they were obtained.

I used a modified version of GNU time (accurate to the microsecond) to gather the results. 50 iterations were run on each benchmark. The best from 10 runs was taken, with high-priority FIFO scheduling (thanks Maxim). See the rationale for taking the best time.

Creation

Construct each input string (from a const char*) and get its size().

Details

Concatenation

Append each input string (with +=). Libraries have a chance to optimize for const char*.

Details

Equality testing

Compare each input string with the previous one (with ==). Does one assignment to switch strings. 14.5% of strings match.

Details

Slicing

Extract substrings at pseudo-random indexes (computed after the two previous string lengths). This is a super-set of indexing and possibly iteration.

Details

Results were obtained on an Intel Core 2 Quad CPU Q9550 (2.83GHz, 12M cache, 1333MHz FSB) with 8GB of RAM running Linux 2.6.32-33-generic.

Programs were compiled with g++ 4.4.3 -03 -march=native -DNDEBUG.

The code is available at github. Benchmarks and libraries are easy to add and tune (conventional C++ interfaces require almost no work).

It's too early to draw conclusions from these results so far. You have to try for yourself. A few things to note though:

boost::const_string has quadratic concatenation so it was removed from the test (it implements expression templates to get around this).
Python String concatenation is quadratic too, but an optimization performed by the interpreter was emulated.
Concatenation is more of a "string grow" or "append" than a genuine concatenation: operand sizes are not evenly distributed (left-hand side tends to be large). The benchmark also lets a chance to authors to optimize for concatenation of const char* (which is very common in practice).
Note that QStrings attempt UTF-8 conversion on input.
Slicing of QStrings is not optimized with midRef (so they actually have pretty good performance).
Note that bstring is a bit impaired by it's C++ wrapper.
Don't be tempted to take this as a Python/Perl comparison, since the Perl examples embed an interpreter (for memory management) whereas Python does not. Perl scalars also have a larger scope than Python byte-strings.
The results are overall pretty reproducible, except maybe for the Cord library which shows some noticeable variance (a few tens of milliseconds).
For a feature comparison of string libraries, you can see Paul Hsieh's comparison table.

Patches, critics and new libraries are welcome.