White Papers

PowerEdge Product Group
Direct from
Development
Elastic AI Infrastructure using Dell EMC PowerEdge and
Bitfusion FlexDirect
Tech Note by:
Ramesh Radhakrishnan, Dell EMC
Subbu Rama, Bitfusion
SUMMARY
Bitfusion FlexDirect disaggregates
GPU accelerators and re-
aggregates them in real-time over
Ethernet, Infiniband RDMA or
RoCE network, to create an
elastic AI infrastructure.
Just like network attached
storage, FlexDirect allows
customers to do network attached
GPUs.
FlexDirect on Dell EMC
PowerEdge servers offers a
seamless way for any machine in
the network to access any
arbitrary fraction of GPU or
multiple GPUs, anytime.
Application performance demands have increasingly been outpacing Moore’s
Law in a variety of fields, particularly AI and deep learning. Co-processors like
GPUs offer immense speedup to applications in fields like AI and deep learning,
compared to CPUs. AI and deep learning applications requires truly elastic
compute infrastructure from dev-test to model training and inference in order to
achieve high utilization of infrastructure resources. In organizations, GPU
accelerated servers are usually operated as silo-ed, stand-alone assets,
causing increased CAPEX and OPEX as well as slow datacenter
modernization.
The benefit of combining Dell EMC’s powerful portfolio of compute, storage and
networking with Bitfusion’s FlexDirect software allows our customers to
consolidate multiple silo-ed GPU clusters into a single shared resource pool, to
decrease CAPEX and OPEX as well as increase productivity.
Composable Elastic AI Compute Platform
Bitfusion FlexDirect enables GPUs to be available as first-class resource on any
machine in a PowerEdge Cluster that can be abstracted, partitioned, automated
and shared much like traditional compute or storage resource. GPU accelerators
can be partitioned into multiple virtual GPUs of any size and accessed remotely
by any machine, over the network. With this, GPU accelerators are now part of a
common infrastructure resource pool and available for use by anyone in the
environment.
Organizations can scale the operations with policies and business logic (time of
day policies, class of users, permission to access the top performance GPUs per
user class, etc.) for AI development and production use cases. GPUs from
different departments can be pooled to create bigger clusters to increase
compute performance and infrastructure utilization.
Figure 1: FlexDirect on Dell EMC PowerEdge Servers to create Elastic AI Infrastructure
© 2018 Dell Inc. or its subsidiaries. All Rights Reserved. Dell, EMC and other trademarks are trademarks of Dell Inc. or its subsidiaries

Summary of content (3 pages)