Whitepaper Exploring Intel®QAT on MX series blade servers Computing offloads for encryption and compression Andy Butcher, Technical Staff, Server Advanced Engineering, Dell EMC Gordon McFadden, Lead Architect, Intel® QuickAssist Technology Abstract Integrated Intel® QuickAssist Technology on the MX blade servers provides beneficial CPU offloads for encryption and compression operations. Quantitative examples are described, including information on how to enable and use this feature of the chipset.
Table of contents 1. Introduction .............................................................................................................................. 3 1.1 Encryption and Key Generation ........................................................................................................................................... 3 1.2 Data Compression and Decompression .............................................................................................................................. 3 1.
1. Introduction PowerEdge MX is the first Dell EMC server to offer a software licensing option to enable Intel ® QuickAssist Technology. It provides a software-enabled foundation for security, authentication, and compression, and significantly increases the performance and efficiency of standard platform solutions. This paper will explore uses of Intel ® QAT with two examples. 1.1 Encryption and Key Generation Many users will be familiar with the “https” prefix on frequently-visited websites.
1.4 Software Software for the Intel® QuickAssist Technology is provided through the Intel open source site.1 The applicable drivers are associated with the C62x chipset. Application and library examples are posted on 01.org along with the Quick Start Guide API Programmer’s Guide and other useful collateral allowing users to build upon these open source libraries and examples or build their own applications. Release notes identify operating system compatibility.
2. Intel® QAT on MX7000 This section includes several examples of Intel® QAT applications, but this list of potential uses of Intel® QAT is by no means exhaustive. Drivers and APIs are available for custom applications that require encryption or compression. Not included in this section is the NGINX web server, which has been integrated with Intel® QAT encryption and PKE for superior performance measured in connections per second.
3. Example 1 – Compression 3.1 Background – Platform Hardware and Capability The latest generation the Intel® QAT device provides three separate PCIe end-points, each with 10 compression/ decompression engines. In the Dell EMC MX740c, the device is integrated into the Intel® C62x Chipset Platform Hub Controller (PCH) and is connected to the CPU with a 16-lane PCIe3 link. High speed DMA (Direct Memory Access) transactions are used by the engines to transfer data.
3.3 Programming Example The QATzip has been carefully designed to provide a simple programming API. The following code segment is complete: [gmcfadde@localhost small]$ cat Makefile INC_DIR=/opt/intel/QATzip/include/ smallQz: main.c gcc -g -O0 -I$(INC_DIR) main.c -o smallQz -lqatzip [gmcfadde@localhost small]$ cat main.c #include #include #include “qatzip.
3.4 Example Software stack Figure 6 shows the integration of QAT into NGINX, providing both encryption and compression for the web server. This diagram puts the QATzip library in context, showing its relation to application software and the driver. 3.5 Experiment Results The results shown in Table 2, Figure 7, and Figure 8 were obtained on an MX740c blade server with two Intel® Xeon® Gold 5117 CPUs running at 2 GHz.
4. Example 2 – IPsec In this experiment, the performance benefit of offloading encryption to the Intel QAT device in a simulated VPN tunnel is demonstrated. The tunneling machine, running VPP,6 was exercised with a traffic generator running TRex. To perform the IPsec encryption, three methods were compared, employed by VPP at the tunnel: 1) openssl library without offload 2) Intel® AES-NI, and 3) Intel® QAT offload. 4.1 Lab Setup Figure 10 shows the lab setup.
5. Conclusion Acceleration and offload functions of Intel® QuickAssist Technology (QAT) demonstrated in this paper are encryption/ decryption and compression/decompression. IPsec tunneling was used as a workload to show the benefit of encryption and decryption offload. QATzip was used to demonstrate the performance of accelerated compression and decompression. In both cases, faster performance was offered by offloading operations onto the QAT engines.
6.3 Server Setup for Traffic Generator The section describes the steps required to run TRex on the traffic generating server. See links for additional information on DPDK and getting started with IPsec.9 There is additional reference material for TRex online.10 1. Disable the network ports being used for DPDK: cd /etc/sysconfig/network-scripts ifdown p1p1 ifdown p1p2 service network restart 2. Download and untar DPDK in directory “dpdk”. cd /home/dell git clone http://dpdk.org/git/dpdk 3.
6.4 Server Setup for the Tunneling Server The section describes the steps required to run VPP on Tunneling Server that performs the IPsec encryption. A command line reference for VPP can be found online,11 with additional reference material from Intel.12 6.4.1 Install and Enable Intel® QAT These steps enable Intel® QAT on then Server-Sun. 1. Check for PCI devices (information only) [root@sun dell]# lspci | grep processor 60:00.
6.4.3 Run VPP with OpenSSL To run VPP with OpenSSL, on a Linux Shell, execute: vpp -c vpp_config_hw.txt_withIPSec_withopenssl_works 6.4.4 Run VPP with Intel AES-NI (AES New Instructions set for x86 Processors) To run VPP with AESNI, on a Linux Shell, execute: vpp -c vpp_config_hw.txt_withIPSec_withaesni_works 6.4.5 Run VPP with QAT To run VPP with QAT, on a Linux Shell, execute: vpp -c vpp_config_hw.txt_withIPSec_withQAT_works 6.
6.6 VPP Configurations For AESNI unix { exec /home/dell/vpp_manual_cfgs/vpp_with_ipsec/vpp_config_with_ipsec.txt nodaemon cli-listen /run/vpp/cli.sock log /tmp/vpp.log interactive } cpu { main-core 6 corelist-workers 2,4 } dpdk { socket-mem 2048,2048 log-level debug no-tx-checksum-offload dev default{ num-tx-desc 1024 num-rx-desc 1024 } dev 0000:3b:00.0 { workers 0 } dev 0000:3b:00.1 { workers 1 } #dev 0000:60:01.0 #dev 0000:61:01.
6.7 VPP Configurations For Intel® QAT unix { exec /home/dell/vpp_manual_cfgs/vpp_with_ipsec/vpp_config_with_ipsec.txt nodaemon cli-listen /run/vpp/cli.sock log /tmp/vpp.log interactive } cpu { main-core 6 corelist-workers 2,4 } dpdk { socket-mem 2048,2048 log-level debug no-tx-checksum-offload dev default{ num-tx-desc 1024 num-rx-desc 1024 } dev 0000:3b:00.0 { workers 0 } dev 0000:3b:00.1 { workers 1 } dev 0000:60:01.0 dev 0000:61:01.
6.8 VPP Common IPSec Configuration for OpenSSL/AESNI/Intel® QAT set interface ip address TwentyFiveGigabitEthernet3b/0/1 10.10.1.1/24 set interface ip address TwentyFiveGigabitEthernet3b/0/0 10.10.2.1/24 set ip arp TwentyFiveGigabitEthernet3b/0/1 10.10.1.2 24:6e:96:9c:e5:df set ip arp TwentyFiveGigabitEthernet3b/0/0 10.10.2.
6.9 VPP Common Settings – Huge Pages Configure Huge Pages for VPP Contents of file: /etc/sysctl.d/80-vpp.conf # Number of 2MB hugepages desired #vm.nr_hugepages=1024 vm.nr_hugepages=4096 # Must be greater than or equal to (2 * vm.nr_hugepages). #vm.max_map_count=3096 vm.max_map_count=9216 # All groups allowed to access hugepages vm.hugetlb_shm_group=0 # Shared Memory Max must be greater or equal to the total size of hugepages. # For 2MB pages, TotalHugepageSize = vm.
7. Acknowledgements This paper would not have been possible without the capable efforts of Imran Masud. 8. References 1 https://01.org/intel-quickassist-technology 2 https://doc.dpdk.org/guides/cryptodevs/qat.html 3 https://software.intel.com/en-us/articles/get-started-with-ipsec-acceleration-in-the-fdio-vpp-project 4 https://doc.dpdk.org/guides-16.04/sample_app_ug/ipsec_secgw.html 5 https://www.ietf.org/rfc/rfc1952.txt 6 https://fd.io/technology/ 7 https://wiki.fd.