Whitepaper Balanced Memory with 2nd Generation AMD EPYCTM Processors for PowerEdge Servers Optimizing Memory Performance Revision: 1.4 Issue Date: 4/21/2020 Abstract Properly configuring a server with balanced memory is critical to ensure memory bandwidth is maximized and latency is minimized. When server memory is configured incorrectly, unwanted variables are introduced into the memory controllers’ algorithm, which inadvertently slows down overall system performance.
Revisions Date Description 12 September 2019 Initial release for 1st wave of AMD CPUs 21 April 2020 Includes all AMD CPU SKUs Acknowledgements This paper was produced by the following people: 2 Name Role Matt Ogle Technical Product Marketing, Dell EMC Trent Bates Product Management, Dell EMC Jose Grande Software Senior Principal Engineer, Dell EMC Andres Fadul Software Senior Principal Engineer, Dell EMC Balanced Memory with 2 nd Generation AMD EPYCTM Processors for PowerEdge Servers
Table of Contents 1. Introduction........................................................................................................................4 2. Memory Topography and Terminology............................................................................5 3. Memory Interleaving .........................................................................................................6 3.1 NPS and Quadrant Pairing ....................................................................................
1. Introduction Understanding the relationship between a server processor (CPU) and its memory subsystem is critical when optimizing overall server performance. Every processor generation has a unique architecture, with volatile controllers, channels and slot population guidelines, that must be satisfied to attain high memory bandwidth and low memory access latency.
2. Memory Topography and Terminology Figure 1: CPU-to-memory subsystem connectivity for Rome processors To understand the relationship between the CPU and memory, terminology illustrated in Figure 1 must first be defined: 5 • Memory controllers are digital circuits that manage the flow of data going from the computer’s main memory to the corresponding memory channels.2 Rome processors have eight memory controllers in the processor I/O die, with one controller assigned to each channel.
3. Memory Interleaving Memory interleaving allows a CPU to efficiently spread memory accesses across multiple DIMMs. When memory is put in the same interleave set, contiguous memory accesses go to different memory banks. Memory accesses no longer must wait until the prior access is completed before initiating the next memory operation. For most workloads, performance is maximized when all DIMMs are in one interleave set creating a single uniform memory region that is spread across as many DIMMs as possible.
3.2 NPS and Quadrant Pairing NPS 0 and NPS 1 will typically yield the best memory performance, followed by NPS 2 and then NPS 4. The Dell EMC default setting for BIOS NUMA NPS is NPS 1 and may need to be manually adjusted to match the NPS option that supports the CPU model. As seen below in Figure 3 there are various CPUs that will not support NPS 2 or 4 that require awareness of which memory configurations are optimized for each CPU.
Figure 4: Recommended NPS setting for each # of DIMMs per CPU If the NPS setting for a memory configuration will limit performance (as seen in Figure 5), Dell EMC BIOS will return the following informative prompts to the user: UEFI0391: Memory configuration supported but not optimal for the enabled NUMA node Per Socket (NPS) setting. Please consider the following actions: 1) Changing NPS setting under System Setup>System BIOS>Processor Settings>NUMA Nodes Per Socket, if supported.
4. Memory Population Guidelines 4.1 Overview DIMMs must be populated into a balanced configuration to yield the highest memory bandwidth and lowest memory access latency. Various factors will dictate whether a configuration is balanced or not.
Figure 6: DIMM population order, starting with A1 and ending with A16 4.3 Identical CPU and DIMM Parts Identical DIMMs must be used across all DIMM slots (i.e. same Dell part number). Dell EMC does not support DIMM mixing in Rome systems. This means that only one rank, speed, capacity and DIMM type shall exist within the system. This principle applies to the processors as well; multi-socket Rome systems shall be populated with identical CPUs. 4.
5. Balanced Configurations (Recommended) Balanced configurations satisfy NPS 0/1 conditions by requiring each memory channel to be populated with one or two identical DIMMs. By doing this, one interleave set can optimally distribute memory access requests across all the available DIMM slots; therefore, maximizing performance. Memory controller logic was designed around fully populated memory channels, so it should come as no surprise that eight or sixteen populated DIMMs are recommended.
6. Near Balanced Configurations Near balanced configurations satisfy NPS 1 or 2 conditions by populating either four or twelve identical DIMMs per CPU. These configurations are not optimized because the channels are partially populated, which creates disjointed memory regions that reduce performance (making it near balanced). Performance for near balanced configurations will undergo degradation when compared to balanced configurations.
7. Unbalanced Configurations Unbalanced configurations can only satisfy NPS 4 conditions. More than two interleave sets can now be introduced to the memory controller algorithm which causes very disjointed regions. Memory performance for the unbalanced configurations below are significantly less than balanced or near balanced and are not recommended.
Figure 14: Three DIMMs are populated in an unbalanced configuration Figure 15: Five DIMMs are populated in an unbalanced configuration 14 Balanced Memory with 2 nd Generation AMD EPYCTM Processors for PowerEdge Servers
Figure 16: Six DIMMs are populated in a near balanced configuration Figure 17: Seven DIMMs are populated in an unbalanced configuration 15 Balanced Memory with 2 nd Generation AMD EPYCTM Processors for PowerEdge Servers
Figure 18: Nine DIMMs are populated in an unbalanced configuration Figure 19: Ten DIMMs are populated in a near balanced configuration 16 Balanced Memory with 2 nd Generation AMD EPYCTM Processors for PowerEdge Servers
Figure 20: Eleven DIMMs are populated in an unbalanced configuration Figure 21: Thirteen DIMMs are populated in an unbalanced configuration 17 Balanced Memory with 2 nd Generation AMD EPYCTM Processors for PowerEdge Servers
Figure 22: Fourteen DIMMs are populated in a near balanced configuration Figure 23: Fifteen DIMMs are populated in an unbalanced configuration 18 Balanced Memory with 2 nd Generation AMD EPYCTM Processors for PowerEdge Servers
8. Conclusion Balancing memory with 2 nd Generation EPYCTM server processors increases memory bandwidth and reduces memory access latency. When memory modules are configured in such a way that the memory subsystems are identical, and channels are fully populated with one or two DIMMs, one interleave set will create a single uniform memory region that is spread across as many DIMMs as possible. This allows the distribution of data to perform most efficiently on Dell EMC PowerEdge servers.