A Simple Tool to Test malloc Performance

I was porting this very interesting feature to an older Glibc version for ARMv7. I need a simple tool to gauge any performance boost (or drop for that matter). Surprisingly few information is available on this topic.

I didn't search hard but most papers on malloc performance dating back to many years ago. Other benchmark suites are too heavy for my simple task. I found some smaller tools but they are either lacking portability or credibility in test methodologies.

I learned Glibc's dev team has a set of very well engineered benchtests for their benchmark and regression tests. And this folk has extracted the necessary bits from benchtests and make it compilable on x86 Linux standalone.

I prefer my tool to be essential and simple. So what I really need is the Glibc team's test routine. With little tinkering, I cross compile it for ARMv7 Entware as well as stock FW's uClibc. Then I wrote a script to direct the tests and drive the binary routines.

So here comes the extremely simple package, 'bench-malloc' for RT-AC56U and compatible machines.

About the Tests

The tool will harness and report the performance of memory allocations in the system's LIBC library in a simulated and multi-threaded application environment.

The included script tests with 1, 2, 4, 8 and 16 threads. Beyond 16 threads may be crash prone on small systems. Each test will last one minute. The thread(s) will perform as many 'malloc/free' as possible within the minute. Each 'malloc' is a request of random size between 4 to 32768 bytes.

The pdf of random numbers seem favor small values. Hence, the per-thread cache feature is expected to performs very well in such workload. That's also typical in real-world applications.

At the end of each test, the actual time taken (by all threads) are divided by the total number of times of 'malloc/free' performed by all the threads.

The script presents this number as "per malloc(ns)", which is the average time one 'malloc' takes in nano-seconds. Less "per malloc(ns)" indicates better performance!

Test Runs

Hardware: 1.2GHz Cortex-A9 ASUS RT-AC56U

Stock firmware (uClibc 0.9.33.2)    Stock Entware (Glibc 2.23)
# th  per malloc(ns)  max rss(kB)   # th  per malloc(ns)  max rss(kB)
1     534.4           752           1     284.2           744
2     3156.3          756           2     596.0           872
4     6924.8          812           4     1203.6          1180
8     14555.6         1136          8     2568.0          1628
16    32240.9         1744          16    5418.9          2240

The per-thread cache patch for Entware's LIBC is still under tests. Below are some preliminary numbers!

Entware Glibc 2.23 /w per-thread cache.
# th	per malloc(ns)	max rss(kB)
1       158.9           756
2       315.4           956
4       646.9           1324
8       1415.4          1892
16      3131.5          2812

Hardware: 1.8GHz Cortex-A53 ASUS RT-AC86U

Stock firmware (Glibc 2.22)         Stock Entware (Glibc 2.27)
# th  per malloc(ns)  max rss(kB)   # th  per malloc(ns)  max rss(kB)
1     201.0           2108          1     107.6           1976
2     414.7           2120          2     221.9           1984
4     826.5           2124          4     444.4           1988
8     1687.5          2124          8     894.2           3084
16    3460.5          2584          16    1805.2          4816

Test performed by SNBforum member Asad Ali. Entware uses a newer Glibc version for armv8 devices. I believe the per-thread feature is already in its Glibc. Hence, performance beats stock firmware in which Glibc 2.22 doesn't have this feature.

Get bench-malloc package

Download & Extract

cd /opt/local
wget -qO- https://gitlab.com/kvic/Entware-Goodies/raw/master/bench-malloc.tgz | tar xzf -

Package Content

README
bench-malloc-thread.Glibc-Entware
bench-malloc-thread.Glibc-Entware-aarch64
bench-malloc-thread.Glibc-FW-aarch64
bench-malloc-thread.Glibc-FW-armv8
bench-malloc-thread.uClibc-FW
bench-malloc-thread.uClibc-NPTL
runbench.sh

To Run

cd /opt/local/bench-malloc
./runbench.sh

Without argument, the script outputs usage and quit.

Last update: Aug 5, 2018

comments powered by Disqus