Quick Benchmark: CBC vs GCM

In my previous post, we touched OpenVPN 2.4 and its new inclusion of GCM ciphers. SFX2000 over SNBforums reminded me to check performance in current OpenSSL. Let me share my quick and dirty benchmarks.

64-bit Sandy Bridge @2.7GHz single-thread

AES-128-CBC with AES-NI
$ openssl speed -evp aes-128-cbc
OpenSSL 1.0.1t  3 May 2016
built on: Thu Jan 26 23:29:15 2017
options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) blowfish(idx) 
compiler: gcc -I. -I.. -I../include  -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -m64 -DL_ENDIAN -DTERMIO -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -Wl,-z,relro -Wa,--noexecstack -Wall -DMD32_REG_T=int -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM

type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-128-cbc     646925.71k   687550.87k   700152.06k   704402.09k   703504.38k
AES-128-CBC without AES-NI
$ OPENSSL_ia32cap="~0x200000200000000" openssl speed -evp aes-128-cbc

aes-128-cbc     316612.58k   357779.50k   364188.42k   370465.45k   373915.65k
AES-128-CBC-HMAC-SHA1 with AES-NI
$ openssl speed -evp aes-128-cbc-hmac-sha1

aes-128-cbc-hmac-sha1   240811.82k   316779.16k   468535.21k   546526.55k   573931.52k
AES-128-GCM with AES-NI
$ openssl speed -evp aes-128-gcm

aes-128-gcm     338365.83k   814063.89k  1062716.33k  1200949.25k  1220588.89k
AES-128-GCM without AES-NI
$ OPENSSL_ia32cap="~0x200000200000000" openssl speed -evp aes-128-gcm

aes-128-gcm      76655.27k    87150.40k   211574.53k   227477.50k   235880.45k

To disable AES-NI, read this post on OpenSSL mailing list or this post on StackOverflow.

So we see AES-NI doubles CBC across the board. AES-NI also speeds up GCM by 4.2 to 8.5 times. Without AES-NI, CBC is faster than GCM in all packet sizes. With AES-NI, GCM almost takes back the crown of raw speed except the "16 bytes" category.

In applications like VPN, we need to account for HMAC hashing when using CBC ciphers. Hence, a fairer comparison is between AES-128-CBC-HMAC-SHA1 and AES-128-GCM. Look at the chart. GCM beats CBC categorically, as much as 2x faster except the "16 bytes" category where GCM is still 40% faster than CBC. Figure 1 in this article by Intel corroborates the result.

GCM looks very promising!

ARM Cortex-A9 @800MHz single-thread

AES-128-CBC without HW accelearation
Phaeo:~$ openssl speed -evp aes-128-cbc
OpenSSL 1.0.2k  26 Jan 2017
built on: reproducible build, date unspecified
options:bn(64,32) rc4(ptr,char) des(idx,cisc,2,long) aes(partial) blowfish(ptr) 
compiler: arm-openwrt-linux-gnueabi-gcc -I. -I.. -I../include  -fPIC -DOPENSSL_PIC -DZLIB_SHARED -DZLIB -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -DOPENSSL_SMALL_FOOTPRINT -DOPENSSL_NO_ERR -DTERMIOS -O2 -pipe -march=armv7-a -mtune=cortex-a9 -fno-caller-saves -fhonour-copts -Wno-error=unused-but-set-variable -Wno-error=unused-result -mfloat-abi=soft  -fpic -I/media/ware4/Entware-ng.2017.02/package/libs/openssl/include -fomit-frame-pointer -Wall

type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-128-cbc      17323.98k    19870.02k    21037.92k    21193.69k    21137.01k
AES-128-GCM without HW acceleration
Phaeo:~$ openssl speed -evp aes-128-gcm

aes-128-gcm       7069.21k     7351.37k     7416.55k     7444.96k     7405.68k

The Cortex-A9 is my RT-AC56U where I run OpenVPN server. It doesn't have any crypto acceleration in HW. CBC is faster than GCM by 145% to 185%. I thought I had to swallow my words and should not switch to GCM ciphers. Looking closer, I might get away with it.

My Conclusion..

RT-AC56U @800MHz achieves max. 50Mbps on OpenVPN with AES-128-CBC (@1400MHz max. 70Mbps). Now look at the "16 bytes" category where GCM performs slowest. The throughput is 7351.37kbyte/s which translates to 59Mbps. Hence, GCM is not the weakest link in the overall throughput. OpenVPN has other bottle neck that limits its maximum throughput.

In addition, the computation saved from doing 160-bit SHA1 HMAC (OpenVPN's default) or 128-bit MD5 HMAC (my choice for saving space) might improve the overall throughput a bit. Some real world tests would be good to see.

Updated 2017-4-13: added data for AES-128-CBC-HMAC-SHA1 with AES-NI.

comments powered by Disqus