This blog post deals with two separate technologies that integrate to enhance security and acceleration for encrypted web communications from your servers. By selecting the correct configuration on your server you and your clients can benefit from faster and more secure encrypted communications of your systems. And for free too, if you already have modern CPUs. Faster, better, cheaper. A CIOs dream.
With end-to-end HTTPS becoming more of a requirement you may have hardware sitting in your datacenter or *aaS with a quiescent performance value. Whilst Transport Layer Security (TLS) and the now outdated insecure Secure Sockets Layer (SSL) are the protocols for secure communication over HTTP, Advanced Encryption Standard (AES) is the most popular specification for the encryption of that data. But it is computationally intensive and will potentially cause an increased load on systems. So we should try to use existing specialised on-chip AES crypto instructions built into modern CPU’s for accelerated encryption/decryption to mitigate this load if possible.
AES can be implemented in software, but what we’re interested in here are the much faster hardware implementations. We’ll deal with Intel, although similar functionality can be found in others including AMD, POWER7+, SPARC and ARM.
Intel introduced a handful of extra CPU instructions in 2010 starting with the Westmere CPU family to optimise the complex performance intensive steps of the AES algorithm. Unimaginatively it is termed AES New Instructions (AES-NI). These hardware instructions can execute in considerably less cycles than the software equivalent.
What’s a practical use of AES-NI?
In 2013 Edward Snowden released thousands of classified NSA documents to journalists. One of the first revelations was a programme called PRISM. This programme allowed the NSA to make sweeping collections of emails, chats, photos, file transfers etc. from cloud providers. It was able to do this partly due to the lack of HTTPS, but also because where HTTPS was employed they also managed to obtain the “master” private key that companies held to unlock it.
This is where we introduce Forward Secrecy (also known as Perfect Forward Secrecy). Forward Secrecy is a method of exchanging ephemeral session specific keys that are only valid for that session. The simple idea here is that even when the Elbonians crack the key at a later date, it'll only do them good for that particular TLS session (unless the host has been storing them). But Forward Secrecy is computationally hard work; fortunately implementing it is not.
How do I use Forward Secrecy
When the TLS handshake occurs, one of the steps is to agree on the cipher. Since the handshake picks the highest common cipher supported by both server and client, if your server lists the cipher as Diffie-Hellman (DHE) or preferably the elliptic-curve variant (ECDHE) and the browser/client is also capable of that cipher you get the benefit of FS. Note that DHE is significantly more computationally intensive than ECDHE. Re-enter AES-NI.
There are many good guides for deploying Forward Secrecy, just use your favourite search engine. But essentially you ensure that ciphers starting with ECDHE are listed first (for example, nginx:
ssl_prefer_server_ciphers; apache httpd:
SSLCipherSuite). And within ECDHE try to use the AES in Galois Counter-mode (GCM) as this uses further hardware optimisation called Carry-less Multiplication (CLMUL). As you dig further in for additional performance gains it gets technical very quickly.
VMware and other hypervisors do expose AES-NI to virtualised guests but you may need to check the configuration. Web servers generally use the underlying SSL library. Be aware that some O.S.’s older versions of OpenSSL (eg. RHEL5) do not support AES-NI but you may be able to patch or replace it. Run
openssl version to see if you are running above version 1.0.1.
On the chart below, created with
openssl speed you can see a comparison of AES-NI disabled (orange bars) and enabled (blue) for three different AES-NI enabled CPU’s. The Y-axis shows the i7-4800MQ able to encrypt AES-256-CBC at a rate of almost 560MB/second for a single core using hardware acceleration whereas without AES-NI it manages encryption at just 243MB/sec.
Your server’s Xeon CPU will show further potential. To get an indication of a servers full ability try
openssl speed -multi n to benchmark parallel threads.
As a comparison of two AES 256bit ciphers, AES-256-GCM is significantly faster than AES-256-CBC thanks to CLMUL mentioned above. The rate of AES-256-GCM is about 1.5GB/sec on the faster CPU. So make sure you put that at the front of your cipher list.
You already use Forward Secrecy, what’s another practical use of AES-NI?
OpenSSH can make use of AES-NI too, to provide a substantial throughput enhancement to your file transfers. To illustrate the difference I used a rudimentary test copying 4GB using dd over the loopback device.
In this case the tests were done on OS X 10.11. For the non-AES-NI test I compiled OpenSSH-7.2p2 with the old OpenSSL version 0.9.8zh. For AES-NI it was the OS X default of OpenSSH_6.9p1 with LibreSSL 2.1.8.
The command was like this
dd if=/dev/zero bs=4096 count=1000000 | ssh -x -c aes128-ctr -m email@example.com jeremy@localhost 'dd of=/dev/null'
I tested aes-128-ctr and the default chacha20 (a software only cipher implementation) on the i7-2720QM processor (the slowest of the three above). In this case chacha20 was negotiated by default when I didn’t specify a cipher with the -c option and hence did not make use of AES-NI. In OpenSSH you can view the available ciphers or MAC options by
ssh -Q cipher or
mac. You then modify your
ssh_config files and add something similar to this:
In the chart below you can see AES-NI enables a far faster throughput in OpenSSH, going from 85MB/sec to 260MB/sec using aes-128-ctr, more than 3x faster or almost 2x the default ssh configuration on this CPU. Again, your Xeon will be much better for raw performance but remember if you’re still transferring across a 1Gbit LAN you won’t be able to fully utilise it.
If you want to supercharge your ssh further, take a look at the high performance patches supplied by a team at the Pittsburgh Supercomputer Centre. They have provided some stunning improvements using dynamic TCP windows and multithreaded AES-CTR. To use these, you need to download OpenSSH source and apply their .diff patch.