nedmalloc Homepage

This page has been accessed 16,032 times since the 1st January 2006.

 

View this page in: flag
English
Any language:
flag
Chinese
flag
French
flag
German
flag
Japanese
flag
Portuguese
flag
Russian
flag
Spanish

Translation to non-English languages provided by Google Language

 

 

   
nedmalloc is a VERY fast, VERY scalable, multithreaded memory allocator with little memory fragmentation. It is faster in real world code than Hoard, faster than tcmalloc, faster than ptmalloc2 and it scales with extra processing cores better than Hoard, better than tcmalloc and better than ptmalloc2 or ptmalloc3. Put another way, there is no faster portable memory allocator out there! Unlike other allocators, it is written in C and so can be used anywhere and it also comes under the Boost software license which permits commercial usage.

It has been tested on some very high end hardware with more than eight processing cores and more than 8Gb of RAM. It is in daily use by some of the world's major banks, root DNS servers, multinational airlines and consumer products (embedded). It also costs no money (though donations are welcome!)

It is more than 125 times faster than the standard Win32 memory allocator, 4-10 times faster than the standard FreeBSD memory allocator and up to twice as fast as ptmalloc2, the standard Linux memory allocator. It can sustain a minimum of between 7.3m and 8.2m malloc & free pair operations per second on a 3400 (2.20Ghz) AMD Athlon64 machine.

It scales with extra CPU's far better than either the standard Win32 memory allocator or ptmalloc2 and can cause significantly less memory bloating than ptmalloc2. It avoids processor serialisation (locking) entirely when the requested memory size is in the thread cache leading to the kind of scalability you can see in the graph on the right. In real world code:

  Memory Mapped   Packetised         nedmalloc's Improvement
Win32 (default) 123.72 46.29  45.38% 54.03%
nedmalloc v1.02 179.87 71.3 - -
nedmalloc v1.01 172.47 67.9 4.29% 5.01%
Win32 (low frag) 164.28 58.74 9.49% 21.38%
ptmalloc2 167.41 63.46 7.44% 12.35%
Hoard v3.4 167.4 64.65 7.45% 10.29%

If you want an explanation of the difference between the Packetised and Memory Mapped benchmarks, please see the Tn homepage (but basically, the Packetised involves performing a lot more memory ops in a more loaded multithreaded environment). As you can see above, the benefits of nedmalloc translate into real world code with more than a 50% speed increase over the default win32 allocator. The Tn speed test is very heavy on the memory bus, so you can expect your own applications to see greater improvements than this.

See below for a Frequently Asked Questions list. Below and to the right is a series of comparisons between nedmalloc, system allocators and a number of other replacement memory allocators such as tcmalloc and Hoard. The graphs below are for v1.00 but are still good for an idea of performance on a wide variety of systems, but note than nedmalloc has become much faster in recent revisions (as you can see on the right).

To my knowledge, nedmalloc is the fastest portable memory allocator available.

 

 

Downloads:

SourceForge.net Logo

ChangeLog (from SVN)

Current: v1.05 (svn 1078) of nedmalloc (80Kb)

 * { 1042 } Added error check for TLSSET() and TLSFREE() macros. Thanks to
Markus Elfring for reporting this.
 * { 1043 } Fixed a segfault when freeing memory allocated using
nedindependent_comalloc(). Thanks to Pavel Vozenilek for reporting this.

Previous: v1.04 (svn 1040) of nedmalloc (80Kb) v1.03 of nedmalloc (76.4Kb) v1.02 of nedmalloc (76.3Kb) v1.01 of nedmalloc (71.9Kb) v1.00 of nedmalloc (69.7Kb)

You can fetch nedmalloc from SVN here.

Frequently Asked Questions:

  1. When should I replace my memory allocator?
    If you want your program to run at the maximum possible speed, you should consider replacing your memory allocator. Fixing up your code to use a new memory allocator is usually easy for most C and C++ projects, but can become tricky if you must maintain compatibility with your system allocator (you must tag each memory block so you can discern between what has been allocated by the system and your custom allocator).
     
  2. Is nedmalloc faster than all other memory allocators?
    No, there are faster ones, especially for specialised circumstances eg; Hoard. However, nedmalloc is an excellent general-purpose allocator and it is based on dlmalloc, one of the most tried & tested memory allocators available as it is the core allocator in Linux. If you use nedmalloc, you will never be far from the best performing specialised allocator. As you might note in the real world benchmarks above, you get severely diminishing returns to allocator improvement once they get into a certain performance range.
     
  3. How space-efficient is nedmalloc?
    dlmalloc does not fragment the memory space as much as other allocators, but it does have a sixteen or thirty-two byte minimum allocation with an eight or sixteen byte granularity. nedmalloc's thread cache is a simple two power allocator which does cause bloating for items small enough to enter the thread cache (by default, 8Kb or less) but in general, this wastage across the entire program is small. You can configure nedmalloc to use finer grained bins to quarter the average wastage but this comes at a performance cost. When configured to only permit one memory space per thread, memory bloating is considerably less than that of ptmalloc2.
     
  4. Is tcmalloc better or worse than nedmalloc?
    As you can see in the graph above, nedmalloc is about equal to tcmalloc for threadcache-only ops and substantially beats it for non-threadcache ops. nedmalloc is also written in C rather than C++ and v0.5 of tcmalloc only works on Unix systems and not win32.
     
  5. Is Hoard better or worse than nedmalloc?
    As of v1.01, nedmalloc is close enough to Hoard to make little difference in real world code (see real world benchmarks above). nedmalloc's synthetic test seems to trigger a bug in Hoard causing dismal performance, however I trust its author and its design enough to say that Hoard may be slightly faster in certain circumstances eg; if code allocates a large block in one thread and frees it in another. However, Hoard is licensed under the GPL unless you pay which is not the case with nedmalloc.
     
  6. Is ptmalloc3 better or worse than nedmalloc?
    ptmalloc3 is also a new implementation of ptmalloc2 and is also based on a newer dlmalloc. ptmalloc3 currently outperforms nedmalloc for a low number of threads especially on uniprocessor hardware, but on dual processor and above or with a lot of threads nedmalloc is faster. nedmalloc also runs fine on Windows whereas ptmalloc3 would (to my knowledge) require extra support code.

Contact the webmaster: Niall Douglas @ webmaster2<at symbol>nedprod.com (Last updated: 15 June 2008 20:50:43 +0100)