String theory, and practice.

Sunday, July 17, 2011

C++ String Performance Benchmark

In the context of the wonderful Shed Skin Python compiler, we are looking for a good string library.

There are lots of benchmarks available for abstract data types, languages, compilers... But not for strings. At least, I was unable to find one, even for simple byte-strings, which are widely used and still a common pitfall in many applications.

So I benchmarked a few string libraries written in C/C++, namely:
  • std::string from the GNU C++ Standard Library (with and without garbage collection).
  • Better String Library, by Paul Hsieh.
  • Cord, by Hans Boehm.
  • boost::const_string, by Maxim Yegorushkin.
  • QString, from Qt 4.6.2.
  • Python Strings (with an evil interpreter hack™).
  • Rope (GNU C++ Standard Library extension).
  • A null string class (to estimate benchmark overhead and subtract it from other tests).
  • Perl scalars were added for reference. JavaScript V8, STLport and others have also been tried but seemed not well adapted to this benchmark, either too close or too far performance-wise.
I focused on the most common string operations:
The dataset consists of 100,000 short, real-life, Python strings borrowed from the Python 2.7 test suite. They range from 0 to 16kB, accounting for a total of ~1MB (median size: 8 bytes). See how they were obtained.
I used a modified version of GNU time (accurate to the microsecond) to gather the results. 50 iterations were run on each benchmark. The best from 10 runs was taken, with high-priority FIFO scheduling (thanks Maxim). See the rationale for taking the best time.

Creation

Construct each input string (from a const char*) and get its size().

Details


Concatenation

Append each input string (with +=). Libraries have a chance to optimize for const char*.

Details


Equality testing

Compare each input string with the previous one (with ==). Does one assignment to switch strings. 14.5% of strings match.

Details


Slicing

Extract substrings at pseudo-random indexes (computed after the two previous string lengths). This is a super-set of indexing and possibly iteration.

Details


Results were obtained on an Intel Core 2 Quad CPU Q9550 (2.83GHz, 12M cache, 1333MHz FSB) with 8GB of RAM running Linux 2.6.32-33-generic.

Programs were compiled with g++ 4.4.3 -03 -march=native -DNDEBUG.

The code is available at github. Benchmarks and libraries are easy to add and tune (conventional C++ interfaces require almost no work).

It's too early to draw conclusions from these results so far. You have to try for yourself. A few things to note though:
  • boost::const_string has quadratic concatenation so it was removed from the test (it implements expression templates to get around this).
  • Python String concatenation is quadratic too, but an optimization performed by the interpreter was emulated.
  • Concatenation is more of a "string grow" or "append" than a genuine concatenation: operand sizes are not evenly distributed (left-hand side tends to be large). The benchmark also lets a chance to authors to optimize for concatenation of const char* (which is very common in practice).
  • Note that QStrings attempt UTF-8 conversion on input.
  • Slicing of QStrings is not optimized with midRef (so they actually have pretty good performance).
  • Note that bstring is a bit impaired by it's C++ wrapper.
  • Don't be tempted to take this as a Python/Perl comparison, since the Perl examples embed an interpreter (for memory management) whereas Python does not. Perl scalars also have a larger scope than Python byte-strings.
  • The results are overall pretty reproducible, except maybe for the Cord library which shows some noticeable variance (a few tens of milliseconds).
  • For a feature comparison of string libraries, you can see Paul Hsieh's comparison table.
Patches, critics and new libraries are welcome.

No comments:

Post a Comment