nedalloc v1.06 ?:
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
by Niall Douglas (http://www.nedprod.com/programs/portable/nedmalloc/)
Enclosed is nedalloc, an alternative malloc implementation for multiple
threads without lock contention based on dlmalloc v2.8.4. It is more
or less a newer implementation of ptmalloc2, the standard allocator in
Linux (which is based on dlmalloc v2.7.0) but also contains a per-thread
cache for maximum CPU scalability.
It is licensed under the Boost Software License which basically means
you can do anything you like with it. This does not apply to the malloc.c.h
file which remains copyright to others.
It has been tested on win32 (x86), win64 (x64), Linux (x64), FreeBSD (x64)
and Apple Mac OS X (x86). It works very well on all of these and is very
significantly faster than the system allocator on Windows and FreeBSD. If
you are using a recent Apple Mac OS X then you probably won't see much
improvement (and kudos to Apple for adopting an excellent allocator).
By literally dropping in this allocator as a replacement for your system
allocator, you can see real world improvements of up to three times in normal
code!
Table of Contents:
A: How to use
B: Notes
C: Speed Comparisons
D: Troubleshooting
E: Changelog
A. To use:
-=-=-=-=-=
Drop in nedmalloc.h, nedmalloc.c and malloc.c.h into your project.
Configure using the instructions in nedmalloc.h. Make sure that you call
neddisablethreadcache() for every pool you use on thread exit, and don't
forget neddisablethreadcache(0) for the system pool if necessary. Run and
enjoy!
To test, compile test.c. It will run a comparison between your system
allocator and nedalloc and tell you how much faster nedalloc is. It also
serves as an example of usage.
If you'd like nedmalloc as a Windows DLL or ELF shared object, the easiest
thing to do is to use scons (http://www.scons.org/) to build nedmalloc (or
use the enclosed MSVC project files).
Windows-only features:
-=-=-=-=-=-=-=-=-=-=-=
If you are running on Windows, there are quite a few extra options available
thanks to work generously sponsored by Applied Research Associates (USA).
Firstly you can build nedmalloc as a DLL and link that into your application
- this has the particular advantage that the DLL can trap thread exits in
your application and therefore call neddisablethreadcache() on all currently
existing nedpool's for you.
If you define REPLACE_SYSTEM_ALLOCATOR when building the DLL then the DLL
will replace most usage of the MSVCRT allocator within any process it is
loaded into with nedmalloc's routines instead, whilst remaining able to
handle the odd free() of a MSVCRT allocated block allocated during CRT init -
this very conveniently allows you to simply link with the nedmalloc DLL
and your application magically now uses it with no code changes required.
The following code is suggested:
#pragma comment(lib, "nedmalloc.lib")
This auto-patching feature can also be combined with Microsoft's Detours
(http://research.microsoft.com/en-us/projects/detours/) to run any
arbitrary application using nedmalloc instead of the system allocator:
withdll /d:nedmalloc.dll program.exe
For those not able to use Microsoft Detours, there is an enclosed
nedmalloc_loader program which does one variant of the same thing. It may
or may not be useful to you - it is not intended to be maintained.
When building the nedmalloc DLL for the purposes of DLL insertion, you NEED
to match MSVCRT versions or you will have a CRT heap conflict. In other words,
if the program using nedmalloc is linked against MSVCRTD, then so must be
nedmalloc or vice versa. As a result of this issue, by default nedmalloc
ALWAYS LINKS TO MSVCRT EVEN IN DEBUG BUILDS unless configured otherwise.
This allows problem-free usage with release build applications which is
where nedmalloc tends to be most commonly deployed.
Lastly for some applications defining ENABLE_LARGE_PAGES can give a 10-15%
performance increase by having nedmalloc allocate using large pages only.
Large pages take much less space in the TLB cache and can greatly benefit
programs with a large working set. For this to work, your computer must
have the "Lock pages in memory" local security setting enabled for the
process' user as well as be running on Windows Server 2003/Vista or later.
If you are using the DLL then the DLL attempts to enable the
SeLockMemoryPrivilege during initialisation - therefore if you are not
using the DLL you will have to do this manually yourself.
B. Notes:
-=-=-=-=-
If you want the very latest version of this allocator, get it from the
TnFOX SVN repository at svn://svn.berlios.de/viewcvs/tnfox/trunk/src/nedmalloc
Because of how nedalloc allocates an mspace per thread, it can cause
severe bloating of memory usage under certain allocation patterns.
You can substantially reduce this wastage by setting MAXTHREADSINPOOL
or the threads parameter to nedcreatepool() to a fraction of the number of
threads which would normally be in a pool at once. This will reduce
bloating at the cost of an increase in lock contention. If allocated size
is less than THREADCACHEMAX, locking is avoided 90-99% of the time and
if most of your allocations are below this value, you can safely set
MAXTHREADSINPOOL to one.
You will suffer memory leakage unless you call neddisablethreadcache()
per pool for every thread which exits. This is because nedalloc cannot
portably know when a thread exits and thus when its thread cache can
be returned for use by other code. Don't forget pool zero, the system pool.
This of course is not an issue if you use nedmalloc as a DLL on Windows.
On some POSIX threads implementations there exists a pthread_atexit() which
registers a termination handler for thread exit - if you don't have one of
these then you'll have to do it manually.
Equally if you use nedmalloc from a dynamically loaded DLL or shared object
which you later kick out of memory, you will leak memory if you don't disable
all thread caches for all pools (as per the preceding paragraph), destroy all
thread pools using neddestroypool() and destroy the system pool using
neddestroysyspool().
For C++ type allocation patterns (where the same sizes of memory are
regularly allocated and deallocated as objects are created and destroyed),
the threadcache always benefits performance. If however your allocation
patterns are different, searching the threadcache may significantly slow
down your code - as a rule of thumb, if cache utilisation is below 80%
(see the source for neddisablethreadcache() for how to enable debug
printing in release mode) then you should disable the thread cache for
that thread. You can compile out the threadcache code by setting
THREADCACHEMAX to zero.
C. Speed comparisons:
-=-=-=-=-=-=-=-=-=-=-
See Benchmarks.xls for details.
The enclosed test.c can do two things: it can be a torture test or a speed
test. The speed test is designed to be a representative synthetic
memory allocator test. It works by randomly mixing allocations with frees
with half of the allocation sizes being a two power multiple less than
512 bytes (to mimic C++ stack instantiated objects) and the other half
being a simple random value less than 16Kb.
The real world code results are from Tn's TestIO benchmark. This is a
heavily multithreaded and memory intensive benchmark with a lot of branching
and other stuff modern processors don't like so much. As you'll note, the
test doesn't show the benefits of the threadcache mostly due to the saturation
of the memory bus being the limiting factor.
D. Troubleshooting:
-=-=-=-=-==-=-=-=-=
I get a quite a few bug reports about code not working properly under nedmalloc.
I do not wish to sound presumptuous, however in an overwhelming majority of cases the
problem is in your application code and not nedmalloc (see below for all th
没有合适的资源?快使用搜索试试~ 我知道了~
nedmalloc_v1.06
共508个文件
svn-base:230个
c:53个
ans:40个
4星 · 超过85%的资源 需积分: 10 283 下载量 67 浏览量
2014-04-26
21:01:50
上传
评论
收藏 7.88MB RAR 举报
温馨提示
里面有vs2010的sln,可以直接编译,省去了很多麻烦
资源推荐
资源详情
资源评论
收起资源包目录
nedmalloc_v1.06 (508个子文件)
._build 4KB
all-wcprops 11KB
all-wcprops 1KB
all-wcprops 1KB
all-wcprops 1KB
all-wcprops 676B
all-wcprops 303B
all-wcprops 297B
all-wcprops 285B
all-wcprops 214B
all_funcs 605B
test19.ans 12KB
test9.ans 11KB
test17.ans 3KB
test29.ans 725B
test35.ans 400B
test8.ans 361B
test26.ans 346B
test11.ans 346B
test33.ans 346B
test34.ans 346B
test31.ans 346B
test32.ans 346B
test30.ans 346B
test12.ans 344B
test28.ans 318B
test18.ans 316B
test25.ans 284B
test27.ans 268B
test37.ans 190B
test4.ans 176B
test1.ans 176B
test39.ans 157B
test2.ans 133B
test5.ans 133B
test13.ans 120B
test36.ans 110B
test3.ans 88B
test23.ans 79B
test10.ans 66B
test14.ans 39B
test16.ans 35B
test24.ans 23B
test6.ans 17B
test40.ans 16B
test15.ans 16B
test38.ans 14B
test21.ans 13B
test22.ans 7B
test20.ans 6B
test7.ans 0B
nedmalloc.c 45KB
winpatcher.c 31KB
hashscan.c 14KB
nedmalloc_loader.c 12KB
test.c 9KB
keystat.c 8KB
embedded_printf.c 7KB
example.c 3KB
test28.c 3KB
test1.c 3KB
test25.c 3KB
test27.c 2KB
bloom_perf.c 2KB
test22.c 2KB
test17.c 2KB
test2.c 1KB
test16.c 1KB
test37.c 1KB
test19.c 1KB
test10.c 1KB
test6.c 1KB
test11.c 1KB
test14.c 1KB
test36.c 1KB
test13.c 1003B
emit_keys.c 984B
test29.c 981B
test23.c 980B
test12.c 976B
test3.c 939B
test7.c 926B
test5.c 912B
test8.c 906B
test35.c 878B
test18.c 842B
test34.c 800B
test32.c 797B
test33.c 797B
test31.c 794B
test30.c 794B
test26.c 783B
test2.c 777B
test4.c 774B
test9.c 766B
test20.c 760B
test40.c 759B
test_sleep.c 743B
test15.c 743B
test39.c 715B
共 508 条
- 1
- 2
- 3
- 4
- 5
- 6
grefen
- 粉丝: 47
- 资源: 58
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功
- 1
- 2
- 3
前往页