DELL EMC ISILON F800 AND H600 I/O PERFORMANCE ABSTRACT This white paper provides F800 and H600 performance data. It is intended for performance-minded administrators of large compute clusters that access Isilon storage.
The information in this publication is provided “as is.” Dell Inc. makes no representations or warranties of any kind with respect to the information in this publication, and specifically disclaims implied warranties of merchantability or fitness for a particular purpose. Use, copying, and distribution of any software described in this publication requires an applicable software license. Copyright © 2018 Dell Inc. or its subsidiaries. All Rights Reserved.
TABLE OF CONTENTS ABSTRACT ................................................................................................................................1 EXECUTIVE SUMMARY ...........................................................................................................4 INTRODUCTION ........................................................................................................................4 DELL EMC ISILON .............................................................................
EXECUTIVE SUMMARY HPC applications vary widely in their I/O profiles. For instance, applications may read/write to and from files in an N-to-1 or N-to-N manner. A failure to match these I/O needs with the proper filesystem(s) can result in poor application performance, system underutilization and user frustration. This Dell EMC technical white paper describes sequential and random I/O performance results for Dell EMC Isilon F800 and H600 node types.
PERFORMANCE EVALUATION This Dell EMC technical white paper describes sequential and random I/O performance results for Dell EMC Isilon F800 and H600 node types and compares them to other storage offerings, like the Dell HPC Lustre Storage Solution and the Dell NFS Storage Solution (NSS), where possible. The data is intended to inform administrators on the suitability of Isilon storage clusters for various HPC workloads.
DELL HPC INNOVATION LABS ZENITH COMPUTE CLUSTER Compute clients 64 x PowerEdge C6320s Processor CPU: Intel(R) Xeon(R) CPU E5-2697 v4 @ 2.30GHz No. of cores = 18 per processor (36 per node) Processor Base Frequency: 2.3GHz AVX Base: 2.0GHz Memory 128 GB @ 2400 MHz per node Operating System Red Hat Enterprise Linux Server release 7.2 (Maipo) Kernel 3.10.0-327.13.1.el7.
Network connectivity The Zenith cluster and F800 storage system were connected via 8 x 40GbE links. Figure 2 shows the network topology used in the tests. The H600 was configured in the exact same way as the F800. Figure 3 shows the network configuration of the NSS-7.0-HA. Figure 2. Network diagram of the F800 benchmark configuration Figure 3. Network diagram of the NSS-7.0-HA benchmark configuration Test tools iperf2 2.0.5 was used for testing network bandwidth IOR3 v.3.0.
Eight Zenith nodes to a single F800 node: ~40.06 Gbits/sec. Twenty Zenith nodes to four F800 nodes: ~154.29 Gbits/sec. The iperf results demonstrate that the bandwidth between a single Zenith node and a single F800 node is approximately 9.9 Gbits/sec (or ~1.2 GB/sec), maximum bandwidth between Zenith and a single F800 node is approximately 40 Gbits/sec (or ~5GB/sec), and the aggregate bandwidth between the Zenith nodes and the F800 cluster is approximately 154.29 Gbits/sec (or ~19 GB/sec).
Figure 4. Sequential write performance (N-N) Sequential write performance summary: • Peak write performance of F800 was ~8GB/sec • Peak write performance of H600 was ~7.4GB/sec • Peak write performance of NSS-7.0-HA was ~2GB/sec • Both F800 and H600 scale similarly Future tests will utilize more than 64 client compute nodes in an attempt to maximize write throughput on the F800 and H600. The IOR sequential read results for all three storage systems is shown in Figure 5.
Figure 5. Sequential read performance (N-N) Sequential read performance summary: • Peak read performance of F800 was ~12GB/sec • Peak read performance of H600 was ~6GB/sec • Peak read performance of NSS-7.0-HA was ~5.9GB/sec • F800 scaled well Future tests will utilize more than 64 client compute nodes in an attempt to maximize read throughput on the F800 and H600.
Figure 6. Sequential write performance (N-1) Write performance for both the F800 and H600 peaked with the 2-client test case at approximately 1.1GB/sec and didn’t change much as more compute nodes were added. It appears that file synchronization overhead limits write performance as an increasing number of client nodes attempt to concurrently write to the same file. In contrast, the N-to-1 read performance generally increases with an increase in client nodes and peak read performance was approximately 5.
Figure 7. Sequential read performance (N-1) Sequential read performance summary: • Peak read performance of F800 was ~5.6 GB/sec • Peak read performance of H600 was ~2 GB/sec • Read performance of H600 did not increase with more than 2 clients Future tests will utilize more than 64 client compute nodes in an attempt to maximize read throughput on the F800.
Figure 8A. Random write performance (N-N) Figure 8B. Random write performance (N-N) Random write performance summary: • Peak write performance of F800 was ~1.
• Peak write performance of H600 was ~800 MB/sec (32 GB file) • Peak write performance of H600 was ~5.8 GB/sec (8 GB file) • Peak write performance of NSS-7.0-HA was ~1.5 GB/sec • H600 performed better with an 8 GB file vs. a 32 GB file. Future tests will utilize more than 64 client compute nodes in an attempt to maximize write throughput on the F800 and H600. The random F800 and H600 tests in Figure 9A used a 4 KB block size and each client reads a 32 GB file. The random H600 and NSS7.
Figure 9B. Random read performance (N-N) Random read performance summary: • Peak read performance of F800 was ~2.7 GB/sec • Peak read performance of H600 was ~800 MB/sec (32 GB file) • Peak read performance of H600 was ~3.7 GB/sec (8 GB file) • Peak read performance of NSS-7.0-HA was ~2.6 GB/sec • H600 performed better with an 8 GB file vs. a 32 GB file. Future tests will utilize more than 64 client compute nodes in an attempt to maximize read throughput on the F800 and H600.
Figure 10. Sequential Read Performance Figure 11 illustrates that sequential write performance on all three filesystems is similar up to the 2-client case, where the NSS peaks at approximately 1.9 GB/sec. As more clients are added, the Lustre filesystem improves to a peak of approximately 14.2 GB/sec while the F800 peaks at 8 GB/sec.
Figure 11. Sequential Write Performance Figures 12 and 13 illustrate that F800 random read and write performance is better than Lustre and NSS for every test case. F800 random read performance is 7x greater than Lustre (140K vs 20K IOPS) at the 64-client case, while F800 random write performance is 3x greater than Lustre (45K vs 15K IOPS). Future tests will utilize more than 64 client compute nodes in an attempt to maximize F800 random read/write performance.
Figure 12. Random Read Performance Figure 13.
Summary Sequential I/O Performance o N-to-N tests F800 had better sequential read and write performance than NSS-7.0-HA and H600 F800 and H600 write performance up to 400% better than NSS-7.0-HA F800 write performance up to 20% better than H600 F800 read performance up to 251% better than NSS-7.0-HA F800 read performance up to 111% better than H600 H600 and NSS-7.
#2-client write mpirun --allow-run-as-root -np 2 -npernode 1 -hostfile hosts -nolocal /home/xin/bin/ior -a POSIX -v -i 1 -d 3 -e -F -k -o /mnt/nfs/test -w -s 1 -t 1m -b 1024g #4-client write mpirun --allow-run-as-root -np 4 -npernode 1 -hostfile hosts -nolocal /home/xin/bin/ior -a POSIX -v -i 1 -d 3 -e -F -k -o /mnt/nfs/test -w -s 1 -t 1m -b 512g … #64-client write mpirun --allow-run-as-root -np 64 -npernode 1 -hostfile hosts -nolocal /home/xin/bin/ior -a POSIX -v -i 1 -d 3 -e -F -k -o /mnt/nfs/test -w s 1
mpirun --allow-run-as-root -np 1 -npernode 1 -hostfile hosts -nolocal /home/xin/bin/ior -a POSIX -v -i 1 -d 3 -e -k -o /mnt/nfs/test -r -s 1 -t 1m -b 2048g #2-client read mpirun --allow-run-as-root -np 2 -npernode 1 -hostfile hosts -nolocal /home/xin/bin/ior -a POSIX -v -i 1 -d 3 -e -k -o /mnt/nfs/test -r -s 1 -t 1m -b 1024g #4-client read mpirun --allow-run-as-root -np 4 -npernode 1 -hostfile hosts -nolocal /home/xin/bin/ior -a POSIX -v -i 1 -d 3 -e -k -o /mnt/nfs/test -r -s 1 -t 1m -b 512g ...