A Simple Tool to Test malloc() Performance

I was porting this very interesting feature to an older Glibc version for ARMv7. I need a simple tool to gauge any performance boost (or drop for that matter). Surprisingly few information is available on this topic.

I didn't search hard but most papers on malloc performance dating back to many years ago. Other benchmark suites are too heavy for my simple task. I found some smaller tools but they are either lacking portability or credibility in test methodologies.

I learned Glibc's dev team has a set of very well engineered benchtests for their benchmark and regression tests. And this folk has extracted the necessary bits from benchtests and make it compilable on x86 Linux standalone.

I prefer my tool to be essential and simple. So what I really need is the Glibc team's test routine. With little tinkering, I cross compile it for ARMv7 Entware as well as stock FW's uClibc. Then I wrote a script to direct the tests and drive the binary routines.

So here comes the extremely simple package, 'bench-malloc' for RT-AC56U and compatible machines.

About the Tests

The tool will harness and report the performance of memory allocations in the system's LIBC library in a simulated and multi-threaded application environment.

The included script tests with 1, 2, 4, 8 and 16 threads. Beyond 16 threads may be crash prone on small systems. Each test will last one minute. The thread(s) will perform as many 'malloc/free' as possible within the minute. Each 'malloc' is a request of random size between 4 to 32768 bytes.

The pdf of random numbers seem favor small values. Hence, the per-thread cache feature is expected to performs very well in such workload. That's also typical in real-world applications.

At the end of each test, the actual time taken (by all threads) are divided by the total number of times of 'malloc/free' performed by all the threads.

The script presents this number as "per malloc(ns)", which is the average time one 'malloc' takes in nano-seconds. Less "per malloc(ns)" indicates better performance!

Test Runs

Hardware: 1.2GHz Cortex-A9 ASUS RT-AC56U

Stock firmware (uClibc 0.9.33.2)      Stock Entware (Glibc 2.23)  
# th    per malloc(ns)  max rss(kB)   # th  per malloc(ns)  max rss(kB)
1       534.4           752           1     284.2           744  
2       3156.3          756           2     596.0           872  
4       6924.8          812           4     1203.6          1180  
8       14555.6         1136          8     2568.0          1628  
16      32240.9         1744          16    5418.9          2240  

The per-thread cache patch for Entware's LIBC is still under tests. Below are some preliminary numbers!

Entware Glibc 2.23 /w per-thread cache.  
# th    per malloc(ns)  max rss(kB)
1       158.9           756  
2       315.4           956  
4       646.9           1324  
8       1415.4          1892  
16      3131.5          2812  

Get bench-malloc package

Download & Extract

cd /opt/local  
wget -qO- https://gitlab.com/kvic/Entware-Goodies/raw/master/bench-malloc.tgz | tar xzf -  

Package Content

README  
bench-malloc-thread.Glibc-Entware  
bench-malloc-thread.uClibc-FW  
runbench.sh  

To Run

cd /opt/local/bench-malloc  
./runbench.sh

Without argument, the script outputs usage and quit.

Last update: Jul 19, 2018

Author

Stephen Yip

Something about you know. Come and share.

comments powered by Disqus