nedmalloc Homepage

This page has been accessed 38,388 times since the 1st January 2006.

 

View this page in: flag
English
Any language:
flag
Chinese
flag
French
flag
German
flag
Japanese
flag
Portuguese
flag
Russian
flag
Spanish

Translation to non-English languages provided by Google Language

You are connecting to the IPv4 version of this website from the IP address 38.107.191.105. You can try the IPv6-only version if you want.

 

 

    nedmalloc is a VERY fast, VERY scalable, multithreaded memory allocator with little memory fragmentation. It is faster in real world code than Hoard, faster than tcmalloc, faster than ptmalloc2 and it scales with extra processing cores better than Hoard, better than tcmalloc and better than ptmalloc2 or ptmalloc3. Put another way, there is no faster portable memory allocator out there! Unlike other allocators, it is written in C and so can be used anywhere and it also comes under the Boost software license which permits commercial usage.

It has been tested on some very high end hardware with more than eight processing cores and more than 8Gb of RAM. It is in daily use by some of the world's major banks, root DNS servers, multinational airlines and consumer products (embedded). It also costs no money (though donations are welcome!). Thanks to work generously sponsored by Applied Research Associates, nedmalloc can patch itself into existing binaries to replace the system allocator on Windows - for example, Microsoft Word is noticeably quicker for very large documents after the nedmalloc DLL has been injected into it!

It is more than 125 times faster than the standard Win32 memory allocator, 4-10 times faster than the standard FreeBSD memory allocator and up to twice as fast as ptmalloc2, the standard Linux memory allocator. It can sustain a minimum of between 7.3m and 8.2m malloc & free pair operations per second on a 3400 (2.20Ghz) AMD Athlon64 machine.

It scales with extra CPU's far better than either the standard Win32 memory allocator or ptmalloc2 and can cause significantly less memory bloating than ptmalloc2. It avoids processor serialisation (locking) entirely when the requested memory size is in the thread cache leading to the kind of scalability you can see in the graph on the right. In real world code:

  Memory Mapped   Packetised         nedmalloc's Improvement
Win32 (default) 123.72 46.29  45.38% 54.03%
nedmalloc v1.02 179.87 71.3 - -
nedmalloc v1.01 172.47 67.9 4.29% 5.01%
Win32 (low frag) 164.28 58.74 9.49% 21.38%
ptmalloc2 167.41 63.46 7.44% 12.35%
Hoard v3.4 167.4 64.65 7.45% 10.29%

If you want an explanation of the difference between the Packetised and Memory Mapped benchmarks, please see the Tn homepage (but basically, the Packetised involves performing a lot more memory ops in a more loaded multithreaded environment). As you can see above, the benefits of nedmalloc translate into real world code with more than a 50% speed increase over the default win32 allocator. The Tn speed test is very heavy on the memory bus, so you can expect your own applications to see greater improvements than this.

See below for a Frequently Asked Questions list. Below and to the right is a series of comparisons between nedmalloc, system allocators and a number of other replacement memory allocators such as tcmalloc and Hoard. The graphs below are for v1.00 but are still good for an idea of performance on a wide variety of systems, but note than nedmalloc has become much faster in recent revisions (as you can see on the right).

To my knowledge, nedmalloc is the fastest portable memory allocator available.

 

 

Downloads:

SourceForge.net Logo

ChangeLog (from SVN)

Current: v1.05 (svn 1078) of nedmalloc (80Kb). Beta 1 of v1.06 (svn 1151) is also available. Changes in v1.06:

 E. ChangeLog:
-=-=-=-=-=-=-
v1.06 beta 1 13th January 2010:
* { 1079 } Fixed misdeclaration of struct mallinfo as C++ type. Thanks to James Mansion for reporting this.
* { 1082 } Fixed dlmalloc bug which caused header corruption to mmap() allocations when running under multiple threads
* { 1088 } Fixed assertion failure for nedblksize() with latest dlmalloc. Thanks to Anteru for reporting this.
* { 1088 } Added neddestroysyspool(). Thanks to Lars Wehmeyer for suggesting this.
* { 1088 } Fixed thread id high bit set bug causing SIGABRT on Mac OS X. Thanks to Chris Dillman for reporting this.
* { 1094 } Integrated dlmalloc v2.8.4 final.
* { 1095 } Added nedtrimthreadcache(). Thanks to Hayim Hendeles for suggesting this.
* { 1095 } Fixed silly assertion of null pointer dereference. Thanks to Ullrich Heinemann for reporting this.
* { 1096 } Fixed lots of level 4 warnings on MSVC. Thanks to Anteru for suggesting this.
* { 1098 } Improved non-nedmalloc block detection to 6.25% probability of being wrong. Thanks to Applied Research Associates for
sponsoring this.
* { 1099 } Added USE_MAGIC_HEADERS which allows nedmalloc to handle freeing a system allocated block. Added USE_ALLOCATOR which
allows the changing of which backend allocator to use (with choices between the system allocator and dlmalloc - choosing the system
allocator is intended for debug situations only e.g. valgrind). Thanks to Applied Research Associates for sponsoring this.
* { 1105 } Added ability to build nedmalloc as a DLL. Added support for a run time PE binary patcher which can patch all usage of
the system allocator replacing it with nedmalloc. Thanks to Applied Research Associates for sponsoring this.
* { 1108 } Added patcher loader which can load any arbitrary program injecting the nedmalloc DLL which then patches in its replacement
for the system allocator. Doesn't work on all programs, but does on most e.g. Microsoft Word. Thanks to Applied Research Associates
for sponsoring this.
* { 1116 } Finished debugging and optimising the latest additions to the codebase. The patcher now works well on x64 as well as x86.
Added support for large pages on Windows. Thanks to Applied Research Associates for sponsoring this.
* { 1125 } Added nedpoollist() which returns a snapshot of the nedpool's currently existing. The Windows DLL thread exit code now
disables the thread cache for all currently existing nedpool's. Thanks to Applied Research
Associates for sponsoring this.
* { 1126 } Added ENABLE_TOLERANT_NEDMALLOC which allows nedmalloc to recognise system allocator blocks and to do the right thing with them.
* { 1139 } Added link time code generation support for Windows builds. This currently has zero performance improvement on x64 (on
MSVC9) but can add 15% to x86 performance (on MSVC9). Also added scons SConstruct and SConscript files.

Previous: v1.04 (svn 1040) of nedmalloc (80Kb) v1.03 of nedmalloc (76.4Kb) v1.02 of nedmalloc (76.3Kb) v1.01 of nedmalloc (71.9Kb) v1.00 of nedmalloc (69.7Kb)

You can fetch nedmalloc from SVN here with a web view of SVN here.

Frequently Asked Questions:

  1. When should I replace my memory allocator?
    If you want your program to run at the maximum possible speed, you should consider replacing your memory allocator. Fixing up your code to use a new memory allocator is usually easy for most C and C++ projects, but can become tricky if you must maintain compatibility with your system allocator (you must tag each memory block so you can discern between what has been allocated by the system and your custom allocator).
     
  2. Is nedmalloc faster than all other memory allocators?
    No, there are faster ones, especially for specialised circumstances eg; Hoard. However, nedmalloc is an excellent general-purpose allocator and it is based on dlmalloc, one of the most tried & tested memory allocators available as it is the core allocator in Linux. If you use nedmalloc, you will never be far from the best performing specialised allocator. As you might note in the real world benchmarks above, you get severely diminishing returns to allocator improvement once they get into a certain performance range.
     
  3. How space-efficient is nedmalloc?
    dlmalloc does not fragment the memory space as much as other allocators, but it does have a sixteen or thirty-two byte minimum allocation with an eight or sixteen byte granularity. nedmalloc's thread cache is a simple two power allocator which does cause bloating for items small enough to enter the thread cache (by default, 8Kb or less) but in general, this wastage across the entire program is small. You can configure nedmalloc to use finer grained bins to quarter the average wastage but this comes at a performance cost. When configured to only permit one memory space per thread, memory bloating is considerably less than that of ptmalloc2.
     
  4. Is tcmalloc better or worse than nedmalloc?
    As you can see in the graph above, nedmalloc is about equal to tcmalloc for threadcache-only ops and substantially beats it for non-threadcache ops. nedmalloc is also written in C rather than C++ and v0.5 of tcmalloc only works on Unix systems and not win32.
     
  5. Is Hoard better or worse than nedmalloc?
    As of v1.01, nedmalloc is close enough to Hoard to make little difference in real world code (see real world benchmarks above). nedmalloc's synthetic test seems to trigger a bug in Hoard causing dismal performance, however I trust its author and its design enough to say that Hoard may be slightly faster in certain circumstances eg; if code allocates a large block in one thread and frees it in another. However, Hoard is licensed under the GPL unless you pay which is not the case with nedmalloc.
     
  6. Is ptmalloc3 better or worse than nedmalloc?
    ptmalloc3 is also a new implementation of ptmalloc2 and is also based on a newer dlmalloc. ptmalloc3 currently outperforms nedmalloc for a low number of threads especially on uniprocessor hardware, but on dual processor and above or with a lot of threads nedmalloc is faster. nedmalloc also runs fine on Windows whereas ptmalloc3 would (to my knowledge) require extra support code.

Contact the webmaster: Niall Douglas @ webmaster2<at symbol>nedprod.com (Last updated: 13 January 2010 14:27:34 -0000)