Flashes of unstyled content

String theory, and practice.

Sunday, July 17, 2011

C++ String Performance Benchmark

In the context of the wonderful Shed Skin Python compiler, we are looking for a good string library.

There are lots of benchmarks available for abstract data types, languages, compilers... But not for strings. At least, I was unable to find one, even for simple byte-strings, which are widely used and still a common pitfall in many applications.

So I benchmarked a few string libraries written in C/C++, namely:
  • std::string from the GNU C++ Standard Library (with and without garbage collection).
  • Better String Library, by Paul Hsieh.
  • Cord, by Hans Boehm.
  • boost::const_string, by Maxim Yegorushkin.
  • QString, from Qt 4.6.2.
  • Python Strings (with an evil interpreter hack™).
  • Rope (GNU C++ Standard Library extension).
  • A null string class (to estimate benchmark overhead and subtract it from other tests).
  • Perl scalars were added for reference. JavaScript V8, STLport and others have also been tried but seemed not well adapted to this benchmark, either too close or too far performance-wise.
I focused on the most common string operations:
The dataset consists of 100,000 short, real-life, Python strings borrowed from the Python 2.7 test suite. They range from 0 to 16kB, accounting for a total of ~1MB (median size: 8 bytes). See how they were obtained.
I used a modified version of GNU time (accurate to the microsecond) to gather the results. 50 iterations were run on each benchmark. The best from 10 runs was taken, with high-priority FIFO scheduling (thanks Maxim). See the rationale for taking the best time.

Creation

Construct each input string (from a const char*) and get its size().

Details


Concatenation

Append each input string (with +=). Libraries have a chance to optimize for const char*.

Details


Equality testing

Compare each input string with the previous one (with ==). Does one assignment to switch strings. 14.5% of strings match.

Details


Slicing

Extract substrings at pseudo-random indexes (computed after the two previous string lengths). This is a super-set of indexing and possibly iteration.

Details


Results were obtained on an Intel Core 2 Quad CPU Q9550 (2.83GHz, 12M cache, 1333MHz FSB) with 8GB of RAM running Linux 2.6.32-33-generic.

Programs were compiled with g++ 4.4.3 -03 -march=native -DNDEBUG.

The code is available at github. Benchmarks and libraries are easy to add and tune (conventional C++ interfaces require almost no work).

It's too early to draw conclusions from these results so far. You have to try for yourself. A few things to note though:
  • boost::const_string has quadratic concatenation so it was removed from the test (it implements expression templates to get around this).
  • Python String concatenation is quadratic too, but an optimization performed by the interpreter was emulated.
  • Concatenation is more of a "string grow" or "append" than a genuine concatenation: operand sizes are not evenly distributed (left-hand side tends to be large). The benchmark also lets a chance to authors to optimize for concatenation of const char* (which is very common in practice).
  • Note that QStrings attempt UTF-8 conversion on input.
  • Slicing of QStrings is not optimized with midRef (so they actually have pretty good performance).
  • Note that bstring is a bit impaired by it's C++ wrapper.
  • Don't be tempted to take this as a Python/Perl comparison, since the Perl examples embed an interpreter (for memory management) whereas Python does not. Perl scalars also have a larger scope than Python byte-strings.
  • The results are overall pretty reproducible, except maybe for the Cord library which shows some noticeable variance (a few tens of milliseconds).
  • For a feature comparison of string libraries, you can see Paul Hsieh's comparison table.
Patches, critics and new libraries are welcome.

Saturday, July 16, 2011

Harvesting Python strings

This article explains how the C++ String Performance Benchmark strings were obtained.

First, Python 2.7.1 was build without threading support.

Then, the string object was hacked to log the method names and operands of the most common str operations, appending them to a single file.

The operations were printed in raw format (not escaped), and separated with an highly unlikely end of record (~ĥùëÔr~ to fit with Perl $/). Strings were themselves quoted {[(<like this>)]}.

Then, the test suite was run. This yielded relatively quickly a 3GB file until I decided to stop. Here are the results:
This file was in turn parsed with a little Perl script that turned \0 charterers into \1's and the end-of-record separators into \0's. Doing some statistics along the way. Picking 1% of them randomly for the actual benchmark (this biases a bit Equality testing...).

I guess I would have to run this on Django or something to have real real-world strings, but well... that's enough for now ;).

You can find more details on the benchmark page.