In my previous post, we touched OpenVPN 2.4 and its new inclusion of GCM ciphers. SFX2000 over SNBforums reminded me to check performance in current OpenSSL. Let me share my quick and dirty benchmarks.
64-bit Sandy Bridge @2.7GHz single-thread
AES-128-CBC with AES-NI
$ openssl speed -evp aes-128-cbc OpenSSL 1.0.1t 3 May 2016 built on: Thu Jan 26 23:29:15 2017 options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) blowfish(idx) compiler: gcc -I. -I.. -I../include -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -m64 -DL_ENDIAN -DTERMIO -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -Wl,-z,relro -Wa,--noexecstack -Wall -DMD32_REG_T=int -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128-cbc 646925.71k 687550.87k 700152.06k 704402.09k 703504.38k
AES-128-CBC without AES-NI
$ OPENSSL_ia32cap="~0x200000200000000" openssl speed -evp aes-128-cbc aes-128-cbc 316612.58k 357779.50k 364188.42k 370465.45k 373915.65k
AES-128-CBC-HMAC-SHA1 with AES-NI
$ openssl speed -evp aes-128-cbc-hmac-sha1 aes-128-cbc-hmac-sha1 240811.82k 316779.16k 468535.21k 546526.55k 573931.52k
AES-128-GCM with AES-NI
$ openssl speed -evp aes-128-gcm aes-128-gcm 338365.83k 814063.89k 1062716.33k 1200949.25k 1220588.89k
AES-128-GCM without AES-NI
$ OPENSSL_ia32cap="~0x200000200000000" openssl speed -evp aes-128-gcm aes-128-gcm 76655.27k 87150.40k 211574.53k 227477.50k 235880.45k
To disable AES-NI, read this post on OpenSSL mailing list or this post on StackOverflow.
So we see AES-NI doubles CBC across the board. AES-NI also speeds up GCM by 4.2 to 8.5 times. Without AES-NI, CBC is faster than GCM in all packet sizes. With AES-NI, GCM almost takes back the crown of raw speed except the "16 bytes" category.
In applications like VPN, we need to account for HMAC hashing when using CBC ciphers. Hence, a fairer comparison is between AES-128-CBC-HMAC-SHA1 and AES-128-GCM. Look at the chart. GCM beats CBC categorically, as much as 2x faster except the "16 bytes" category where GCM is still 40% faster than CBC. Figure 1 in this article by Intel corroborates the result.
GCM looks very promising!
ARM Cortex-A9 @800MHz single-thread
AES-128-CBC without HW accelearation
Phaeo:~$ openssl speed -evp aes-128-cbc OpenSSL 1.0.2k 26 Jan 2017 built on: reproducible build, date unspecified options:bn(64,32) rc4(ptr,char) des(idx,cisc,2,long) aes(partial) blowfish(ptr) compiler: arm-openwrt-linux-gnueabi-gcc -I. -I.. -I../include -fPIC -DOPENSSL_PIC -DZLIB_SHARED -DZLIB -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -DOPENSSL_SMALL_FOOTPRINT -DOPENSSL_NO_ERR -DTERMIOS -O2 -pipe -march=armv7-a -mtune=cortex-a9 -fno-caller-saves -fhonour-copts -Wno-error=unused-but-set-variable -Wno-error=unused-result -mfloat-abi=soft -fpic -I/media/ware4/Entware-ng.2017.02/package/libs/openssl/include -fomit-frame-pointer -Wall type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128-cbc 17323.98k 19870.02k 21037.92k 21193.69k 21137.01k
AES-128-GCM without HW acceleration
Phaeo:~$ openssl speed -evp aes-128-gcm aes-128-gcm 7069.21k 7351.37k 7416.55k 7444.96k 7405.68k
The Cortex-A9 is my RT-AC56U where I run OpenVPN server. It doesn't have any crypto acceleration in HW. CBC is faster than GCM by 145% to 185%. I thought I had to swallow my words and should not switch to GCM ciphers. Looking closer, I might get away with it.
RT-AC56U @800MHz achieves max. 50Mbps on OpenVPN with AES-128-CBC (@1400MHz max. 70Mbps). Now look at the "16 bytes" category where GCM performs slowest. The throughput is 7351.37kbyte/s which translates to 59Mbps. Hence, GCM is not the weakest link in the overall throughput. OpenVPN has other bottle neck that limits its maximum throughput.
In addition, the computation saved from doing 160-bit SHA1 HMAC (OpenVPN's default) or 128-bit MD5 HMAC (my choice for saving space) might improve the overall throughput a bit. Some real world tests would be good to see.
Updated 2017-4-13: added data for AES-128-CBC-HMAC-SHA1 with AES-NI.