среда, 26 ноября 2014 г.

C++ Myths Debunking (part 1)

It is well known that small object allocation in C ++ is slow. This is a quote from Andrey Alexandrescu book “Modern C++ Design”:
For occult reasons, the default allocator is notoriously slow. A possible reason is that it is usually implemented as a thin wrapper around the C heap allocator (malloc/realloc/free). The C heap allocator is not focused on optimizing small chunk allocations.

In addition to being slow, the genericity of the default C++ allocator makes it very space inefficient for small objects. The default allocator manages a pool of memory, and such management often requires some extra memory. Usually, the bookkeeping memory amounts to a few extra bytes (4 to 32) for each block allocated with new. If you allocate 1024-byte blocks, the per-block space overhead is insignificant (0.4% to 3%). If you allocate 8-byte objects, the per-object overhead becomes 50% to 400%, a figure big enough to make you worry if you allocate many such small objects.
Book states that memory allocation is in fact slow and more of that, it states that allocation of small objects using malloc and new can cause high memory fragmentation. As far as I understand this is a common knowledge beyond C++ programmers. Many of us believes that fancy allocators and manual memory management is a Good Thing. Maybe this was true when book was released first (more than ten years ago) but not now! Let’s check the facts.

I’ve created this gist to show the state of things - link. This code allocates one million small objects using simple segregated storage (boost.pool) frees memory and then allocates another million of small objects using jemalloc. Time and memory usage is tracked. Result can be surprising - link.

First - malloc is slower than memory pool but not drastically. On my machine it’s five times slower than memory pool if deallocation time was taken into account and only three times slower if it wasn’t (this is relevant for some applications). Second - using jemalloc to allocate memory for small objects actually saves some space! Memory pool have used 20Mb of RAM and jemalloc have managed to fit everything into 16Mb.

This isn’t surprising because jemalloc implements simple segregated storage under the hood. It manages memory better than most of the fancy handwritten memory allocation schemes on Earth. It is a better option than custom allocator most of the time because it’s stable, fast and it can give you some feedback. It can be beaten by some custom allocation scheme in synthetic tests but not in practice.

And finally you can always switch different allocators (jemalloc/tcmalloc/whatsoever) using LD_PRELOAD. This is not the case with custom hand-coded allocators - if you make mistake you can’t fix it without rewriting your code.

Комментариев нет: