Administrator Guide

11 Deep Learning Performance Scale-Out
ResNet-50’s Performance with TF 1.14 + XLA
In this section, we evaluated the performance of ResNet-50 model trained with TF 1.14 and TF
1.14 with XLA enabled. The tests were run with 1 GPU, 4 GPUs, and 8 GPUs and the results
were compared with those obtained for version TF 1.10 from our previous paper
[0]
. Also, we
explored the performance using batch size of 128 and 256. See Figure 9 and Figure 10.
Figure 9: Multi Node PowerEdge C4140-M. ResNet-50 BS 128 TF 1.10 vs TF 1.14 vs TF 1.14 + XLA
As we saw in the previous section ResNet-50 with batch size 128 with 8 GPUs had a performance
gain of ~3% with TF 1.10 vs TF 1.14, and ~35% of performance gain with TF 1.10 vs TF 1.14 with
XLA enabled, see Figure 9. On the other hand, ResNet-50 with batch size 256 with 8 GPUs had
a performance gain of ~2% with TF 1.10 vs TF 1.14, and ~46% of performance gain with TF 1.10
vs TF 1.14 with XLA enabled, see Figure 10. Due to the higher performance of ResNet-50 with
batch size 256, we have selected it to further optimize performance.