Equinox IT Blog

Using your existing hardware, Forward Secrecy and AES-NI to enhance system speed and security for free

Using your existing hardware, Forward Secrecy and AES-NI to enhance system speed and security for free

This blog post deals with two separate technologies that integrate to enhance security and acceleration for encrypted web communications from your servers. By selecting the correct configuration on your server you and your clients can benefit from faster and more secure encrypted communications of your systems. And for free too, if you already have modern CPUs. Faster, better, cheaper. A CIOs dream.

With end-to-end HTTPS becoming more of a requirement you may have hardware sitting in your datacenter or *aaS with a quiescent performance value. Whilst Transport Layer Security (TLS) and the now outdated insecure Secure Sockets Layer (SSL) are the protocols for secure communication over HTTP, Advanced Encryption Standard (AES) is the most popular specification for the encryption of that data. But it is computationally intensive and will potentially cause an increased load on systems. So we should try to use existing specialised on-chip AES crypto instructions built into modern CPU’s for accelerated encryption/decryption to mitigate this load if possible.

AES can be implemented in software, but what we’re interested in here are the much faster hardware implementations. We’ll deal with Intel, although similar functionality can be found in others including AMD, POWER7+, SPARC and ARM.

Intel introduced a handful of extra CPU instructions in 2010 starting with the Westmere CPU family to optimise the complex performance intensive steps of the AES algorithm. Unimaginatively it is termed AES New Instructions (AES-NI). These hardware instructions can execute in considerably less cycles than the software equivalent.

What’s a practical use of AES-NI?

In 2013 Edward Snowden released thousands of classified NSA documents to journalists. One of the first revelations was a programme called PRISM. This programme allowed the NSA to make sweeping collections of emails, chats, photos, file transfers etc. from cloud providers. It was able to do this partly due to the lack of HTTPS, but also because where HTTPS was employed they also managed to obtain the “master” private key that companies held to unlock it.

This is where we introduce Forward Secrecy (also known as Perfect Forward Secrecy). Forward Secrecy is a method of exchanging ephemeral session specific keys that are only valid for that session. The simple idea here is that even when the Elbonians crack the key at a later date, it'll only do them good for that particular TLS session (unless the host has been storing them). But Forward Secrecy is computationally hard work; fortunately implementing it is not.

How do I use Forward Secrecy

When the TLS handshake occurs, one of the steps is to agree on the cipher. Since the handshake picks the highest common cipher supported by both server and client, if your server lists the cipher as Diffie-Hellman (DHE) or preferably the elliptic-curve variant (ECDHE) and the browser/client is also capable of that cipher you get the benefit of FS. Note that DHE is significantly more computationally intensive than ECDHE. Re-enter AES-NI.

There are many good guides for deploying Forward Secrecy, just use your favourite search engine. But essentially you ensure that ciphers starting with ECDHE are listed first (for example, nginx: ssl_ciphers with ssl_prefer_server_ciphers; apache httpd: SSLHonorCipherOrder with SSLCipherSuite). And within ECDHE try to use the AES in Galois Counter-mode (GCM) as this uses further hardware optimisation called Carry-less Multiplication (CLMUL). As you dig further in for additional performance gains it gets technical very quickly.

On the browser or client, you can check https://www.howsmyssl.com/ to see the given cipher suites. Use https://www.ssllabs.com/ssltest/ to see what other sites are using.

VMware and other hypervisors do expose AES-NI to virtualised guests but you may need to check the configuration. Web servers generally use the underlying SSL library. Be aware that some O.S.’s older versions of OpenSSL (eg. RHEL5) do not support AES-NI but you may be able to patch or replace it. Run openssl version to see if you are running above version 1.0.1.

On the chart below, created with openssl speed you can see a comparison of AES-NI disabled (orange bars) and enabled (blue) for three different AES-NI enabled CPU’s. The Y-axis shows the i7-4800MQ able to encrypt AES-256-CBC at a rate of almost 560MB/second for a single core using hardware acceleration whereas without AES-NI it manages encryption at just 243MB/sec.

Your server’s Xeon CPU will show further potential. To get an indication of a servers full ability try openssl speed -multi n to benchmark parallel threads.

"AES-256-CBC CPU Comparison, in Using your existing hardware, Forward Secrecy and AES-NI to enhance system speed and security for free

As a comparison of two AES 256bit ciphers, AES-256-GCM is significantly faster than AES-256-CBC thanks to CLMUL mentioned above. The rate of AES-256-GCM is about 1.5GB/sec on the faster CPU. So make sure you put that at the front of your cipher list. 

AES-256-CBC vs GCM, in Using your existing hardware, Forward Secrecy and AES-NI to enhance system speed and security for free

You already use Forward Secrecy, what’s another practical use of AES-NI?

OpenSSH can make use of AES-NI too, to provide a substantial throughput enhancement to your file transfers. To illustrate the difference I used a rudimentary test copying 4GB using dd over the loopback device.

In this case the tests were done on OS X 10.11. For the non-AES-NI test I compiled OpenSSH-7.2p2 with the old OpenSSL version 0.9.8zh. For AES-NI it was the OS X default of OpenSSH_6.9p1 with LibreSSL 2.1.8.

The command was like this

dd if=/dev/zero bs=4096 count=1000000 | ssh -x -c aes128-ctr -m umac-128-etm@openssh.com jeremy@localhost 'dd of=/dev/null'

I tested aes-128-ctr and the default chacha20 (a software only cipher implementation) on the i7-2720QM processor (the slowest of the three above). In this case chacha20 was negotiated by default when I didn’t specify a cipher with the -c option and hence did not make use of AES-NI. In OpenSSH you can view the available ciphers or MAC options by ssh -Q cipher or mac. You then modify your sshd_config and ssh_config files and add something similar to this:

Ciphers aes128-gcm@openssh.com,aes256-gcm@openssh.com,aes128-ctr,aes192-ctr,aes256-ctr,chacha20-poly1305@openssh.com,arcfour128,arcfour256,arcfour

MACs umac-128-etm@openssh.com,umac-128@openssh.com,hmac-sha1-etm@openssh.com,hmac-sha1,hmac-sha2-512-etm@openssh.com,hmac-sha2-512,hmac-sha2-256-etm@openssh.com,hmac-ripemd160-etm@openssh.com,hmac-sha2-256,hmac-ripemd160@openssh.com,hmac-ripemd160

In the chart below you can see AES-NI enables a far faster throughput in OpenSSH, going from 85MB/sec to 260MB/sec using aes-128-ctr, more than 3x faster or almost 2x the default ssh configuration on this CPU. Again, your Xeon will be much better for raw performance but remember if you’re still transferring across a 1Gbit LAN you won’t be able to fully utilise it.

SSH Time to Transfer 4GB, in Using your existing hardware, Forward Secrecy and AES-NI to enhance system speed and security for free

If you want to supercharge your ssh further, take a look at the high performance patches supplied by a team at the Pittsburgh Supercomputer Centre. They have provided some stunning improvements using dynamic TCP windows and multithreaded AES-CTR. To use these, you need to download OpenSSH source and apply their .diff patch.

References

Subscribe by email