Administrator Guide

11 Deep Learning Performance Scale-Out

ResNet-50’s Performance with TF 1.14 + XLA

In this section, we evaluated the performance of ResNet-50 model trained with TF 1.14 and TF

1.14 with XLA enabled. The tests were run with 1 GPU, 4 GPUs, and 8 GPUs and the results

were compared with those obtained for version TF 1.10 from our previous paper

[0]

. Also, we

explored the performance using batch size of 128 and 256. See Figure 9 and Figure 10.

Figure 9: Multi Node PowerEdge C4140-M. ResNet-50 BS 128 TF 1.10 vs TF 1.14 vs TF 1.14 + XLA

As we saw in the previous section ResNet-50 with batch size 128 with 8 GPUs had a performance

gain of ~3% with TF 1.10 vs TF 1.14, and ~35% of performance gain with TF 1.10 vs TF 1.14 with

XLA enabled, see Figure 9. On the other hand, ResNet-50 with batch size 256 with 8 GPUs had

a performance gain of ~2% with TF 1.10 vs TF 1.14, and ~46% of performance gain with TF 1.10

vs TF 1.14 with XLA enabled, see Figure 10. Due to the higher performance of ResNet-50 with

batch size 256, we have selected it to further optimize performance.