Share this
Using your existing hardware, Forward Secrecy and AES-NI to enhance system speed and security for free
by Jeremy Cook on 16 June 2016
This blog post deals with two separate technologies that integrate to enhance security and acceleration for encrypted web communications from your servers. By selecting the correct configuration on your server you and your clients can benefit from faster and more secure encrypted communications of your systems. And for free too, if you already have modern CPUs. Faster, better, cheaper. A CIOs dream.
With end-to-end HTTPS becoming more of a requirement you may have hardware sitting in your datacenter or *aaS with a quiescent performance value. Whilst Transport Layer Security (TLS) and the now outdated insecure Secure Sockets Layer (SSL) are the protocols for secure communication over HTTP, Advanced Encryption Standard (AES) is the most popular specification for the encryption of that data. But it is computationally intensive and will potentially cause an increased load on systems. So we should try to use existing specialised on-chip AES crypto instructions built into modern CPU’s for accelerated encryption/decryption to mitigate this load if possible.
AES can be implemented in software, but what we’re interested in here are the much faster hardware implementations. We’ll deal with Intel, although similar functionality can be found in others including AMD, POWER7+, SPARC and ARM.
Intel introduced a handful of extra CPU instructions in 2010 starting with the Westmere CPU family to optimise the complex performance intensive steps of the AES algorithm. Unimaginatively it is termed AES New Instructions (AES-NI). These hardware instructions can execute in considerably less cycles than the software equivalent.
What’s a practical use of AES-NI?
In 2013 Edward Snowden released thousands of classified NSA documents to journalists. One of the first revelations was a programme called PRISM. This programme allowed the NSA to make sweeping collections of emails, chats, photos, file transfers etc. from cloud providers. It was able to do this partly due to the lack of HTTPS, but also because where HTTPS was employed they also managed to obtain the “master” private key that companies held to unlock it.
This is where we introduce Forward Secrecy (also known as Perfect Forward Secrecy). Forward Secrecy is a method of exchanging ephemeral session specific keys that are only valid for that session. The simple idea here is that even when the Elbonians crack the key at a later date, it'll only do them good for that particular TLS session (unless the host has been storing them). But Forward Secrecy is computationally hard work; fortunately implementing it is not.
How do I use Forward Secrecy
When the TLS handshake occurs, one of the steps is to agree on the cipher. Since the handshake picks the highest common cipher supported by both server and client, if your server lists the cipher as Diffie-Hellman (DHE) or preferably the elliptic-curve variant (ECDHE) and the browser/client is also capable of that cipher you get the benefit of FS. Note that DHE is significantly more computationally intensive than ECDHE. Re-enter AES-NI.
There are many good guides for deploying Forward Secrecy, just use your favourite search engine. But essentially you ensure that ciphers starting with ECDHE are listed first (for example, nginx: ssl_ciphers
with ssl_prefer_server_ciphers;
apache httpd: SSLHonorCipherOrder
with SSLCipherSuite
). And within ECDHE try to use the AES in Galois Counter-mode (GCM) as this uses further hardware optimisation called Carry-less Multiplication (CLMUL). As you dig further in for additional performance gains it gets technical very quickly.
On the browser or client, you can check https://www.howsmyssl.com/ to see the given cipher suites. Use https://www.ssllabs.com/ssltest/ to see what other sites are using.
VMware and other hypervisors do expose AES-NI to virtualised guests but you may need to check the configuration. Web servers generally use the underlying SSL library. Be aware that some O.S.’s older versions of OpenSSL (eg. RHEL5) do not support AES-NI but you may be able to patch or replace it. Run openssl version
to see if you are running above version 1.0.1.
On the chart below, created with openssl speed
you can see a comparison of AES-NI disabled (orange bars) and enabled (blue) for three different AES-NI enabled CPU’s. The Y-axis shows the i7-4800MQ able to encrypt AES-256-CBC at a rate of almost 560MB/second for a single core using hardware acceleration whereas without AES-NI it manages encryption at just 243MB/sec.
Your server’s Xeon CPU will show further potential. To get an indication of a servers full ability try openssl speed -multi n
to benchmark parallel threads.
As a comparison of two AES 256bit ciphers, AES-256-GCM is significantly faster than AES-256-CBC thanks to CLMUL mentioned above. The rate of AES-256-GCM is about 1.5GB/sec on the faster CPU. So make sure you put that at the front of your cipher list.
You already use Forward Secrecy, what’s another practical use of AES-NI?
OpenSSH can make use of AES-NI too, to provide a substantial throughput enhancement to your file transfers. To illustrate the difference I used a rudimentary test copying 4GB using dd over the loopback device.
In this case the tests were done on OS X 10.11. For the non-AES-NI test I compiled OpenSSH-7.2p2 with the old OpenSSL version 0.9.8zh. For AES-NI it was the OS X default of OpenSSH_6.9p1 with LibreSSL 2.1.8.
The command was like this
dd if=/dev/zero bs=4096 count=1000000 | ssh -x -c aes128-ctr -m umac-128-etm@openssh.com jeremy@localhost 'dd of=/dev/null'
I tested aes-128-ctr and the default chacha20 (a software only cipher implementation) on the i7-2720QM processor (the slowest of the three above). In this case chacha20 was negotiated by default when I didn’t specify a cipher with the -c option and hence did not make use of AES-NI. In OpenSSH you can view the available ciphers or MAC options by ssh -Q cipher
or mac
. You then modify your sshd_config
and ssh_config
files and add something similar to this:
Ciphers aes128-gcm@openssh.com,aes256-gcm@openssh.com,aes128-ctr,aes192-ctr,aes256-ctr,chacha20-poly1305@openssh.com,arcfour128,arcfour256,arcfour
MACs umac-128-etm@openssh.com,umac-128@openssh.com,hmac-sha1-etm@openssh.com,hmac-sha1,hmac-sha2-512-etm@openssh.com,hmac-sha2-512,hmac-sha2-256-etm@openssh.com,hmac-ripemd160-etm@openssh.com,hmac-sha2-256,hmac-ripemd160@openssh.com,hmac-ripemd160
In the chart below you can see AES-NI enables a far faster throughput in OpenSSH, going from 85MB/sec to 260MB/sec using aes-128-ctr, more than 3x faster or almost 2x the default ssh configuration on this CPU. Again, your Xeon will be much better for raw performance but remember if you’re still transferring across a 1Gbit LAN you won’t be able to fully utilise it.
If you want to supercharge your ssh further, take a look at the high performance patches supplied by a team at the Pittsburgh Supercomputer Centre. They have provided some stunning improvements using dynamic TCP windows and multithreaded AES-CTR. To use these, you need to download OpenSSH source and apply their .diff patch.
References
- https://en.wikipedia.org/wiki/Advanced_Encryption_Standard://en.wikipedia.org/wiki/Advanced_Encryption_Standard
- https://software.intel.com/en-us/articles/intel-advanced-encryption-standard-instructions-aes-ni
- https://en.wikipedia.org/wiki/AES_instruction_set
- http://ark.intel.com/search/advanced?AESTech=true
- https://www-ssl.intel.com/content/www/us/en/processors/carry-less-multiplication-instruction-in-gcm-mode-paper.html
- https://en.wikipedia.org/wiki/Forward_secrecy
- http://vincent.bernat.im/en/blog/2011-ssl-perfect-forward-secrecy.html
- http://www.wreck.net/ssh_speed
Share this
- Agile Development (153)
- Software Development (126)
- Agile (76)
- Scrum (66)
- Application Lifecycle Management (50)
- Capability Development (47)
- Business Analysis (46)
- DevOps (43)
- IT Professional (42)
- Equinox IT News (41)
- Agile Transformation (38)
- IT Consulting (38)
- Knowledge Sharing (36)
- Lean Software Development (35)
- Requirements (35)
- Strategic Planning (35)
- Solution Architecture (34)
- Digital Disruption (32)
- IT Project (31)
- International Leaders (31)
- Cloud (26)
- Digital Transformation (26)
- Project Management (26)
- Azure DevOps (23)
- Coaching (23)
- IT Governance (23)
- System Performance (23)
- Innovation (21)
- Change Management (20)
- MIT Sloan CISR (15)
- Client Briefing Events (13)
- Architecture (12)
- Working from Home (12)
- IT Services (10)
- Data Visualisation (9)
- Kanban (9)
- People (9)
- Business Architecture (8)
- Communities of Practice (8)
- Continuous Integration (7)
- Business Case (4)
- Enterprise Analysis (4)
- Angular UIs (3)
- Business Rules (3)
- GitHub (3)
- Java Development (3)
- Lean Startup (3)
- Satir Change Model (3)
- AI (2)
- API (2)
- Automation (2)
- Scaling (2)
- Security (2)
- Toggles (2)
- ✨ (2)
- .Net Core (1)
- Diversity (1)
- Microsoft Azure (1)
- Testing (1)
- December 2024 (1)
- August 2024 (1)
- February 2024 (3)
- January 2024 (1)
- September 2023 (2)
- July 2023 (3)
- August 2022 (4)
- August 2021 (1)
- July 2021 (1)
- June 2021 (1)
- May 2021 (1)
- March 2021 (1)
- February 2021 (2)
- November 2020 (2)
- September 2020 (1)
- July 2020 (1)
- June 2020 (3)
- May 2020 (3)
- April 2020 (2)
- March 2020 (8)
- February 2020 (1)
- November 2019 (1)
- August 2019 (1)
- July 2019 (2)
- June 2019 (2)
- April 2019 (3)
- March 2019 (2)
- February 2019 (1)
- December 2018 (3)
- November 2018 (3)
- October 2018 (3)
- September 2018 (1)
- August 2018 (4)
- July 2018 (5)
- June 2018 (1)
- May 2018 (1)
- April 2018 (5)
- March 2018 (3)
- February 2018 (2)
- January 2018 (2)
- December 2017 (2)
- November 2017 (3)
- October 2017 (4)
- September 2017 (5)
- August 2017 (3)
- July 2017 (3)
- June 2017 (1)
- May 2017 (1)
- March 2017 (1)
- February 2017 (3)
- January 2017 (1)
- November 2016 (1)
- October 2016 (6)
- September 2016 (1)
- August 2016 (5)
- July 2016 (3)
- June 2016 (4)
- May 2016 (7)
- April 2016 (13)
- March 2016 (8)
- February 2016 (8)
- January 2016 (7)
- December 2015 (9)
- November 2015 (12)
- October 2015 (4)
- September 2015 (2)
- August 2015 (3)
- July 2015 (8)
- June 2015 (7)
- April 2015 (2)
- March 2015 (3)
- February 2015 (2)
- December 2014 (4)
- September 2014 (2)
- July 2014 (1)
- June 2014 (2)
- May 2014 (9)
- April 2014 (1)
- March 2014 (2)
- February 2014 (2)
- December 2013 (1)
- November 2013 (2)
- October 2013 (3)
- September 2013 (2)
- August 2013 (6)
- July 2013 (2)
- June 2013 (1)
- May 2013 (4)
- April 2013 (5)
- March 2013 (2)
- February 2013 (2)
- January 2013 (2)
- December 2012 (1)
- November 2012 (1)
- October 2012 (2)
- September 2012 (3)
- August 2012 (3)
- July 2012 (3)
- June 2012 (1)
- May 2012 (1)
- April 2012 (1)
- February 2012 (1)
- December 2011 (4)
- November 2011 (2)
- October 2011 (2)
- September 2011 (4)
- August 2011 (2)
- July 2011 (3)
- June 2011 (4)
- May 2011 (2)
- April 2011 (2)
- March 2011 (3)
- February 2011 (1)
- January 2011 (4)
- December 2010 (2)
- November 2010 (3)
- October 2010 (1)
- September 2010 (1)
- May 2010 (1)
- February 2010 (1)
- July 2009 (1)
- April 2009 (1)
- October 2008 (1)