Whitepaper Retail Analytics with Malong RetailAI® on DELL EMC PowerEdge servers Revision: 1.1 Issue Date: 10/14/2019 Issue Date: 10/14/2019 Abstract This blog evaluates the performance and efficiency of running the Malong RetailAI® software stack on Dell EMC PowerEdge R7425 server for retail analytics. The objective is to show how the stack can deliver high throughput & low latency inferencing performance on NVIDIA’s AI software platform powered by NVIDIA Tensor Core T4 GPUs.
Revisions Date Description 02 October 2019 Initial release Acknowledgements This paper was produced by the following people: Name Role Bhavesh Patel Server Advanced Engineering, Dell EMC Matt Scott CTO, Malong Hao Wei VP of Engineering, Malong 2 Retail Analytics with Malong RetailAI® on DELL EMC PowerEdge servers
Overview of Deep Learning Deep learning consists of two phases: Training and inference. As illustrated in Figure 1, training involves learning a neural network model from a given training dataset over a certain number of training iterations and loss function [1]. The output of this phase, the learned model, is then used in the inference phase to speculate on new data.
Figure 2. Inference Flow. Why NVIDIA T4 GPU? The NVIDIA® Tesla® T4 is single-slot, low profile, PCIE Express Gen3 Deep learning accelerator card based on the TU104 NVIDIA graphics processing unit (GPU). The NVIDIA T4 has 16GB GDDR6 memory and a 70W maximum power limit. It is a passively cooled board. Tesla T4 is powered by NVIDIA Turing™ Tensor Cores to accelerate inference, video transcoding and virtual desktops.
Figure 4. DELL EMC PowerEdge R7425 The Dell™ EMC PowerEdge™ R7425 is Dell’s latest 2-socket, 2U rack server designed to run complex workloads using highly scalable memory, I/O, and network options The system features are based on AMD High performance processor, AMD SP3 which supports up to 32 AMD “Zen” x86 cores (AMD Naples Zeppelin SP3), up to 16 DIMMs, PCI Express® (PCIe) 3.0 enabled expansion slots, and a choice of OCP technologies.
large percentages of false positives, which cause inconvenience to shoppers and add operational overhead to the business. Malong addresses these problems by leveraging novel computer vision algorithms, to perform intelligent video analytics (IVA) for loss prevention at large scale, by accurately discovering mis-scans or ticket-switching at SCOs and staffed lanes in near real time. Scalability is key.
products into a trace. Every trace is represented as a sequence of binary elements. The 1st part in an element is a “location” within the field of view, the other part is a “timestamp” indicating approximately when this product moved to this location. 3. The POS message handler will interface with the register messaging system to incorporate into the processing pipeline. Each scan signal must contain when the scan happened and what product scanned.
Conclusions The Dell EMC PowerEdge R7425 using AMD EPYC processor is a powerful platform in conjunction with NVIDIA T4 GPU running the Malong container stack, it provides end-to-end retail analytic capability. Leveraging a Dell EMC PowerEdge server in a store with Malong RetailAI®, brings to bear industryleading computer vision technology to help solve retail business problems.
Server R7425-T4 Global Memory Size (GB) 16 Constant Memory Size (KB) 65 L2 Cache Size (MB) 4 Bus Interface PCle Generation 3 Link Width 16 Peak Performance Floating Point Operations (FLOP) and TOPS Single-Precision - FP32 (Teraflop/s) 8.1 Mixed Precision - FP16/FP32 (Teraflop/s) 65 Integer 8 - INT8 (Tera Operations /s) 130 Integer 4 – INT4-16GB (Tera Operations /s) 260 Min Power Limit (W) 60 Max Power Limit (W) 70 Power References [1] H. Zhao, O. Gallo, I. Frosio and J.