Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring Guide March 2012 Reference Number: 327043-001
® INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT.
Contents 1 Introduction .............................................................................................................. 9 1.1 Introduction ....................................................................................................... 9 1.2 Uncore PMON Overview........................................................................................ 9 1.3 Section References ............................................................................................ 10 1.
2.6 2.7 2.8 2.9 2.10 4 2.5.4.1 MC Box Level PMON State ..........................................................60 2.5.4.2 MC PMON state - Counter/Control Pairs ........................................61 2.5.5 iMC Performance Monitoring Events...........................................................62 2.5.5.1 An Overview:............................................................................62 2.5.6 iMC Box Events Ordered By Code ..............................................................
Figures 1-1 1-2 Uncore Sub-system Block Diagram of Intel Xeon Processor E5-2600 Family ................ 9 Perfmon Control/Counter Block Diagram............................................................... 11 Tables 1-1 1-2 1-3 2-1 2-2 2-3 2-4 2-5 2-8 2-9 2-10 2-11 2-12 2-13 2-33 2-34 2-35 2-36 2-37 2-38 2-39 2-59 2-60 2-61 2-62 2-63 2-73 2-74 2-75 2-76 2-77 2-78 2-79 2-80 2-84 2-85 2-86 2-87 2-88 2-89 2-90 2-91 2-92 Per-Box Performance Monitoring Capabilities............................................
2-93 2-104 2-105 2-106 2-107 2-118 2-119 2-120 2-121 2-142 2-143 2-144 6 QPI_RATE_STATUS Register – Field Definitions ......................................................94 R2PCIe Performance Monitoring Registers ........................................................... 112 R2_PCI_PMON_BOX_CTL Register – Field Definitions ............................................ 112 R2_PCI_PMON_CTL{3-0} Register – Field Definitions............................................
Revision History Revision 327043-001 Description Initial release.
Reference Number: 327043-001
Introduction 1 Introduction 1.1 Introduction The uncore subsystem of the Intel® Xeon® processor E5-2600 product family is shown in Figure 1-1. The uncore subsystem also applies to the Intel® Xeon® processor E5-1600 product family in a single-socket platform1. The uncore sub-system consists of a variety of components, ranging from the CBox caching agent to the power controller unit (PCU), integrated memory controller (iMC) and home agent (HA), to name a few.
Introduction Events can be collected by reading a set of local counter registers. Each counter register is paired with a dedicated control register used to specify what to count (i.e. through the event select/umask fields) and how to count it. Some units provide the ability to specify additional information that can be used to ‘filter’ the monitored events (e.g., C-box; see Section 2.3.3.3, “CBo Filter Register (Cn_MSR_PMON_BOX_FILTER)”).
Introduction 1.4 Uncore PMON - Typical Control/Counter Logic Following is a diagram of the standard perfmon counter block illustrating how event information is routed and stored within each counter and how its paired control register helps to select and filter the incoming information. Details for how control bits affect event information is presented in each of the box subsections of Chapter 2, with some summary information below. Note: The PCU uses an adaptation of this block (refer to Section 2.6.
Introduction Additional control bits include: Applying a Threshold to Incoming Events: .thresh - since most counters can increment by a value greater than 1, a threshold can be applied to generate an event based on the outcome of the comparison. If the .thresh is set to a non-zero value, that value is compared against the incoming count for that event in each cycle. If the incoming count is >= the threshold value, then the event count captured in the data register will be incremented by 1.
Introduction Table 1-2.
Introduction Table 1-3.
Introduction • e.g., POWER_THROTTLE_CYCLES.RANKx / MC_Chy_PCI_PMON_CTR_FIXED Requires more input to software to determine the specific event/subevent • In some cases, there may be multiple events/subevents that cover the same information across multiple like hardware units. Rather than manufacturing a derived event for each combination, the derived event will use a lower case variable in the event name. • e.g., POWER_CKE_CYCLES.
Introduction 16 Reference Number: 327043-001
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring 2 Intel® Xeon® Processor E52600 Product Family Uncore Performance Monitoring 2.1 Uncore Per-Socket Performance Monitoring Control The uncore PMON does not support interrupt based sampling.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring Program the .ev_sel and .umask bits in the control register with the encodings necessary to capture the requested event along with any signal conditioning bits (.thresh/.edge_det/.invert) used to qualify the event. e.g., Set C0_MSR_PMON_CT2.{ev_sel, umask} to {0x03, 0x1} in order to capture LLC_VICTIMS.M_STATE in CBo 0’s C0_MSR_PMON_CTR2.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring 2.2 UBox Performance Monitoring 2.2.1 Overview of the UBox The UBox serves as the system configuration controller for the Intel Xeon Processor E5-2600 family uncore. In this capacity, the UBox acts as the central unit for a variety of functions: • The master for reading and writing physically distributed registers across the uncore using the Message Channel.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring 2.2.3.2 UBox PMON state - Counter/Control Pairs The following table defines the layout of the UBox performance monitor control registers. The main task of these configuration registers is to select the event to be monitored by their respective data counter (.ev_sel, .umask). Additional control bits are provided to shape the incoming events (e.g. .invert, .edge_det, .
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring Table 2-4. U_MSR_PMON_FIXED_CTL Register – Field Definitions Field rsv en Table 2-5. HW Rese t Val Attr 31:23 RV 0 Description Reserved (?) 22 RW 0 Enable counter when global enable is set. rsv 21:20 RV 0 Reserved. SW must write to 0 for proper operation. rsv 19:0 RV 0 Reserved (?) U_MSR_PMON_FIXED_CTR Register – Field Definitions Field rsv event_count 2.2.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring Table 2-7. Unit Masks for EVENT_MSG Extension umask [15:8] VLW_RCVD bxxxxxxx1 MSI_RCVD bxxxxxx1x IPI_RCVD bxxxxx1xx DOORBELL_RCVD bxxxx1xxx INT_PRIO bxxx1xxxx Description LOCK_CYCLES • • • • • Title: IDI Lock/SplitLock Cycles Category: LOCK Events Event Code: 0x44 Max. Inc/Cyc: 1, Register Restrictions: 0-1 Definition: Number of times an IDI Lock/SplitLock sequence was started 2.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring 2.3.2 CBo Performance Monitoring Overview Each of the CBos in the uncore supports event monitoring through four 44-bit wide counters (Cn_MSR_PMON_CTR{3:0}). Event programming in the CBo is restricted such that each events can only be measured in certain counters within the CBo. For example, counter 0 is dedicated to occupancy events. No other counter may be used to capture occupancy events.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring 2.3.3 CBo Performance Monitors Table 2-8.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring Table 2-8.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring Table 2-8.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring Table 2-8. CBo Performance Monitoring MSRs (Sheet 4 of 4) MSR Address MSR Name Size (bits) Description C7_MSR_PMON_CTL3 0x0DF3 32 CBo 7 PMON Control for Counter 3 C7_MSR_PMON_CTL2 0x0DF2 32 CBo 7 PMON Control for Counter 2 C7_MSR_PMON_CTL1 0x0DF1 32 CBo 7 PMON Control for Counter 1 C7_MSR_PMON_CTL0 0x0DF0 32 CBo 7 PMON Control for Counter 0 Box-Level Control/Status C7_MSR_PMON_BOX_CTL 2.3.3.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring Table 2-10. Cn_MSR_PMON_CTL{3-0} Register – Field Definitions (Sheet 2 of 2) Field invert Bits 23 Attr RW-V HW Reset Val 0 Description Invert comparison against Threshold. 0 - comparison will be ‘is event increment >= threshold?’. 1 - comparison is inverted - ‘is event increment < threshold?’ NOTE: .invert is in series following .thresh, Due to this, the .thresh field must be set to a non-0 value.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring Note: Not all transactions can be associated with a specific thread. For example, when a snoop triggers a WB, it does not have an associated thread. Transactions that are associated with PCIe will come from “0x1E” (b11110). Note: Only one of these filtering criteria may be applied at a time. Table 2-12.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring Table 2-13. Opcode Match by IDI Packet Type for Cn_MSR_PMON_BOX_FILTER.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring AK (Acknowledge) Ring - Acknowledges Intel® QPI to CBo and CBo to Core. Carries snoop responses from Core to CBo. IV (Invalidate) Ring - CBo Snoop requests of core caches Internal CBo Queues: IRQ - Ingress Request Queue on AD Ring. Associated with requests from core. IPQ - Ingress Probe Queue on AD Ring. Associated with snoops from Intel® QPI LL. ISMQ - Ingress Subsequent Messages (response queue).
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring Table 2-14. Performance Monitor Events for CBO (Sheet 2 of 2) Event Code Ctrs Max Inc/ Cyc RxR_ISMQ_RETRY 0x33 0-1 1 ISMQ Retries LLC_LOOKUP 0x34 0-1 1 Cache Lookups TOR_INSERTS 0x35 0-1 1 TOR Inserts TOR_OCCUPANCY 0x36 0 20 TOR Occupancy LLC_VICTIMS 0x37 0-1 1 Lines Victimized MISC 0x39 0-1 1 Cbo Misc Symbol Name 2.3.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring Table 2-15. Metrics Derived from CBO Events (Sheet 2 of 3) Symbol Name: Definition Equation AVG_TOR_DRD_REM_MISS_LATENCY: Average Latency of Data Reads through the TOR that miss the LLC and were satsified by a Remote cache or Remote Memory. Only valid at processor level == don't add counts across Cbos. (TOR_OCCUPANCY.MISS_OPCODE / TOR_INSERTS.MISS_OPCODE) with:Cn_MSR_PMON_BOX_FILTER.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring Table 2-15. Metrics Derived from CBO Events (Sheet 3 of 3) Symbol Name: Definition Equation PCIE_DATA_BYTES: Data from PCIe in Number of Bytes (TOR_INSERTS.OPCODE with:Cn_MSR_PMON_BOX_FILTER.opc=0x194 + TOR_INSERTS.OPCODE with:Cn_MSR_PMON_BOX_FILTER.opc=0x19C) * 64 RING_THRU_DNEVEN_BYTES: Ring throughput in the Down direction, Even polarity in Bytes RING_BL_USED.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring ISMQ_DRD_MISS_OCC • • • • • Title: Category: ISMQ Events Event Code: 0x21 Max. Inc/Cyc: 20, Register Restrictions: 0-1 Definition: LLC_LOOKUP • • • • • Title: Cache Lookups Category: CACHE Events Event Code: 0x34 Max. Inc/Cyc: 1, Register Restrictions: 0-1 Definition: Counts the number of times the LLC was accessed - this includes code, data, prefetches and hints coming from L2. This has numerous filters available.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring Table 2-17. Unit Masks for LLC_VICTIMS (Sheet 2 of 2) Extension umask [15:8] S_STATE bxxxxx1xx MISS bxxxx1xxx NID bx1xxxxxx Filter Dep Description Lines in S State CBoFilter[1 7:10] Victimized Lines that Match NID: The NID is programmed in Cn_MSR_PMON_BOX_FILTER.nid. In conjunction with STATE = I, it is possible to monitor misses to specific NIDs in the system.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring Table 2-19. Unit Masks for RING_AD_USED Extension umask [15:8] Description UP_EVEN bxxxxxxx1 Up and Even: Filters for the Up and Even ring polarity. UP_ODD bxxxxxx1x Up and Odd: Filters for the Up and Odd ring polarity. DOWN_EVEN bxxxxx1xx Down and Even: Filters for the Down and Even ring polarity. DOWN_ODD bxxxx1xxx Down and Odd: Filters for the Down and Odd ring polarity.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring Table 2-21. Unit Masks for RING_BL_USED Extension umask [15:8] Description UP_EVEN bxxxxxxx1 Up and Even: Filters for the Up and Even ring polarity. UP_ODD bxxxxxx1x Up and Odd: Filters for the Up and Odd ring polarity. DOWN_EVEN bxxxxx1xx Down and Even: Filters for the Down and Even ring polarity. DOWN_ODD bxxxx1xxx Down and Odd: Filters for the Down and Odd ring polarity.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring RxR_EXT_STARVED • • • • • Title: Ingress Arbiter Blocking Cycles Category: INGRESS Events Event Code: 0x12 Max. Inc/Cyc: 1, Register Restrictions: 0-1 Definition: Counts cycles in external starvation. This occurs when one of the ingress queues is being starved by the other queues. Table 2-24.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring Table 2-26. Unit Masks for RxR_IPQ_RETRY Extension umask [15:8] Description ANY bxxxxxxx1 Any Reject: Counts the number of times that a request form the IPQ was retried because of a TOR reject. TOR rejects from the IPQ can be caused by the Egress being full or Address Conflicts.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring Table 2-27. Unit Masks for RxR_IRQ_RETRY (Sheet 2 of 2) Extension umask [15:8] Description RTID bxxxx1xxx No RTIDs: Counts the number of times that requests from the IRQ were retried because there were no RTIDs available. RTIDs are required after a request misses the LLC and needs to send snoops and/or requests to memory. If there are no RTIDs available, requests will queue up in the IRQ and retry until one becomes available.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring RxR_OCCUPANCY • • • • • • Title: Ingress Occupancy Category: INGRESS Events Event Code: 0x11 Max. Inc/Cyc: 20, Register Restrictions: 0 Definition: Counts number of entries in the specified Ingress queue in each cycle. NOTE: IRQ_REJECTED should not be Ored with the other umasks. Table 2-29.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring Table 2-30. Unit Masks for TOR_INSERTS (Sheet 2 of 2) Extension umask [15:8] Filter Dep Description CBoFilter[3 1:23] Miss Opcode Match: Miss transactions inserted into the TOR that match an opcode. MISS_OPCODE b00000011 MISS_ALL b00001010 NID_OPCODE b01000001 CBoFilter[3 1:23], CBoFilter[1 7:10] NID and Opcode Matched: Transactions inserted into the TOR that match a NID and an opcode.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring Table 2-31. Unit Masks for TOR_OCCUPANCY (Sheet 2 of 2) Extension umask [15:8] Filter Dep Description ALL b00001000 Any: All valid TOR entries. This includes requests that reside in the TOR for a short time, such as LLC Hits that do not need to snoop cores or requests that get rejected and have to be retried through one of the ingress queues.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring Table 2-32. Unit Masks for TxR_INSERTS Extension umask [15:8] Description AD_CACHE bxxxxxxx1 AD - Cachebo: Ring transactions from the Cachebo destined for the AD ring. Some example include outbound requests, snoop requests, and snoop responses. AK_CACHE bxxxxxx1x AK - Cachebo: Ring transactions from the Cachebo destined for the AK ring. This is commonly used for credit returns and GO responses.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring The Home Agent supports Intel® QPI’s home snoop protocol by initiating snoops on behalf of requests. Closely tied to the directory feature, the home agent has the ability to issue snoops to the peer caching agents for requests based on the directory information.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring In the case of the HA, the HA_PCI_PMON_BOX_CTL register governs what happens when a freeze signal is received (.frz_en). It also provides the ability to manually freeze the counters in the box (.frz). Table 2-34. HA_PCI_PMON_BOX_CTL Register – Field Definitions Field Bits Attr HW Reset Val Description rsv 31:18 RV 0 Reserved (?) rsv 17 RV 0 Reserved; SW must write to 0 else behavior is undefined.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring Table 2-35. HA_PCI_PMON_CTL{3-0} Register – Field Definitions (Sheet 2 of 2) Field rsv Bits Attr HW Reset Val Description 17:16 RV 0 Reserved. SW must write to 0 else behavior is undefined. umask 15:8 RW-V 0 Select subevents to be counted within the selected event. ev_sel 7:0 RW-V 0 Select event to be counted. The HA performance monitor data registers are 48-bit wide.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring Table 2-39. HA_PCI_PMON_BOX_ADDRMATCH0 Register – Field Definitions Field lo_addr rsv HW Reset Val HW Reset Val 31:6 RWS 0 Match to this System Address - Least Significant 26b of cache aligned address [31:6] 5:0 RV 0 Reserved (?) Bits Description Note: The address comparison always ignores the lower 12 bits of the physical address, even if they system is interleaving between sockets at the cache-line level.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring 2.4.5 HA Box Events Ordered By Code The following table summarizes the directly measured HA Box events. Table 2-40. Performance Monitor Events for HA Symbol Name 2.4.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring Table 2-41. Metrics Derived from HA Events (Sheet 2 of 2) Symbol Name: Definition 2.4.7 Equation PCT_RD_REQUESTS: Percentage of HA traffic that is from Read Requests REQUESTS.READS / (REQUESTS.READS + REQUESTS.WRITES) PCT_WR_REQUESTS: Percentage of HA traffic that is from Write Requests REQUESTS.WRITES / (REQUESTS.READS + REQUESTS.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring Table 2-43. Unit Masks for CONFLICT_CYCLES umask [15:8] Extension Description NO_CONFLICT bxxxxxxx1 No Conflict: Counts the number of cycles that we are NOT handling conflicts. CONFLICT bxxxxxx1x Conflict Detected: Counts the number of cycles that we are handling conflicts. DIRECT2CORE_COUNT • • • • • Title: Direct2Core Messages Sent Category: DIRECT2CORE Events Event Code: 0x11 Max.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring DIRECTORY_UPDATE • • • • • Title: Directory Updates Category: DIRECTORY Events Event Code: 0x0D Max. Inc/Cyc: 1, Register Restrictions: 0-3 Definition: Counts the number of directory updates that were required. These result in writes to the memory controller. This can be filtered by directory sets and directory clears. • NOTE: Only valid for parts that implement the Directory Table 2-45.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring Table 2-47. Unit Masks for IMC_WRITES Extension umask [15:8] Description FULL bxxxxxxx1 Full Line Non-ISOCH PARTIAL bxxxxxx1x Partial Non-ISOCH FULL_ISOCH bxxxxx1xx ISOCH Full Line PARTIAL_ISOCH bxxxx1xxx ISOCH Partial ALL b00001111 All Writes REQUESTS • • • • • Title: Read and Write Requests Category: REQUESTS Events Event Code: 0x01 Max.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring TAD_REQUESTS_G0 • • • • • Title: HA Requests to a TAD Region - Group 0 Category: TAD Events Event Code: 0x1B Max. Inc/Cyc: 2, Register Restrictions: 0-3 Definition: Counts the number of HA requests to a given TAD region. There are up to 11 TAD (target address decode) regions in each home agent. All requests destined for the memory controller must first be decoded to determine which TAD region they are in.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring Table 2-51. Unit Masks for TAD_REQUESTS_G1 (Sheet 2 of 2) Extension umask [15:8] Description REGION10 bxxxxx1xx TAD Region 10: Filters request made to TAD Region 10 REGION11 bxxxx1xxx TAD Region 11: Filters request made to TAD Region 11 TRACKER_INSERTS • • • • • Title: Tracker Allocations Category: TRACKER Events Event Code: 0x06 Max.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring Table 2-54. Unit Masks for TxR_AD_CYCLES_FULL Extension umask [15:8] Description SCHED0 bxxxxxxx1 Scheduler 0: Filter for cycles full from scheduler bank 0 SCHED1 bxxxxxx1x Scheduler 1: Filter for cycles full from scheduler bank 1 ALL bxxxxxx11 All: Cycles full from both schedulers TxR_AK_CYCLES_FULL • • • • • Title: AK Egress Full Category: AK_EGRESS Events Event Code: 0x32 Max.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring Table 2-56. Unit Masks for TxR_BL Extension umask [15:8] Description DRS_CACHE bxxxxxxx1 Data to Cache: Filter for data being sent to the cache. DRS_CORE bxxxxxx1x Data to Core: Filter for data being sent directly to the requesting core. DRS_QPI bxxxxx1xx Data to Intel® QPI: Filter for data being sent to a remote socket over Intel® QPI.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring 2.5 Memory Controller (iMC) Performance Monitoring 2.5.1 Overview of the iMC The integrated Memory Controller provides the interface to DRAM and communicates to the rest of the uncore through the Home Agent (i.e. the iMC does not connect to the Ring).
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring For information on how to setup a monitoring session, refer to Section 2.1, “Uncore Per-Socket Performance Monitoring Control”. 2.5.4 iMC Performance Monitors Table 2-59.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring Table 2-60. MC_CHy_PCI_PMON_BOX_CTL Register – Field Definitions (Sheet 2 of 2) Field Bits Attr HW Reset Val Description rsv 15:9 RV 0 Reserved (?) frz 8 WO 0 Freeze. If set to 1 and the .frz_en is 1, the counters in this box will be frozen. rsv 7:2 RV 0 Reserved (?) rsv 1:0 RV 0 Reserved; SW must write to 0 else behavior is undefined. U 2.5.4.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring All MC performance monitor data registers are 48-bit wide. Should a counter overflow (a carry out from bit 47), the counter will wrap and continue to collect events. If accessible, software can continuously read the data registers without disabling event collection. This is a counter that always tracks the number of DRAM clocks (dclks - half of DDR speed) in the iMC.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring RPQ - Read Pending Queue. NOTE: HA also tracks some information related to the iMC’s RPQ. WPQ - Write Pending Queue. 2.5.6 NOTE: HA also tracks some information related to the iMC’s WPQ. iMC Box Events Ordered By Code The following table summarizes the directly measured iMC Box events. Table 2-64.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring Table 2-65. Metrics Derived from iMC Events Symbol Name: Definition 2.5.8 Equation MEM_BW_READS: Memory bandwidth consumed by reads. Expressed in bytes. (CAS_COUNT.RD * 64) MEM_BW_TOTAL: Total memory bandwidth. Expressed in bytes. MEM_BW_READS + MEM_BW_WRITES MEM_BW_WRITES: Memory bandwidth consumed by writes Expressed in bytes. (CAS_COUNT.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring • Definition: Counts the number of DRAM Activate commands sent on this channel. Activate commands are issued to open up a page on the DRAM devices so that it can be read or written to with a CAS. One can calculate the number of Page Misses by subtracting the number of Page Miss precharges from the number of Activates. CAS_COUNT • • • • • Title: DRAM RD_CAS and WR_CAS Commands. Category: CAS Events Event Code: 0x04 Max.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring Table 2-67. Unit Masks for DRAM_REFRESH Extension umask [15:8] PANIC bxxxxxx1x HIGH bxxxxx1xx Description ECC_CORRECTABLE_ERRORS • • • • • Title: ECC Correctable Errors Category: ECC Events Event Code: 0x09 Max. Inc/Cyc: 1, Register Restrictions: 0-3 Definition: Counts the number of ECC errors detected and corrected by the iMC on this channel. This counter is only useful with ECC DRAM devices.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring POWER_CHANNEL_PPD • • • • • Title: Channel PPD Cycles Category: POWER Events Event Code: 0x85 Max. Inc/Cyc: 4, Register Restrictions: 0-3 Definition: Number of cycles when all the ranks in the channel are in PPD mode. If IBT=off is enabled, then this can be used to count those cycles. If it is not enabled, then this can count the number of cycles when that could have been taken advantage of.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring • Definition: Counts the number of cycles when the iMC is in self-refresh and the iMC still has a clock. This happens in some package C-states. For example, the PCU may ask the iMC to enter self-refresh even though some of the cores are still processing. One use of this is for Monroe technology.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring PRE_COUNT • • • • • Title: DRAM Precharge commands. Category: PRE Events Event Code: 0x02 Max. Inc/Cyc: 1, Register Restrictions: 0-3 Definition: Counts the number of DRAM Precharge commands sent on this channel. Table 2-72. Unit Masks for PRE_COUNT Extension umask [15:8] Description PAGE_MISS bxxxxxxx1 Precharges due to page miss: Counts the number of DRAM Precharge commands sent on this channel as a result of page misses.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring being sent from the HA to the iMC. They deallocate after the CAS command has been issued to memory. This includes both ISOCH and non-ISOCH requests. RPQ_OCCUPANCY • • • • • Title: Read Pending Queue Occupancy Category: RPQ Events Event Code: 0x80 Max. Inc/Cyc: 22, Register Restrictions: 0-3 Definition: Accumulates the occupancies of the Read Pending Queue each cycle.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring WPQ_OCCUPANCY • • • • • Title: Write Pending Queue Occupancy Category: WPQ Events Event Code: 0x81 Max. Inc/Cyc: 32, Register Restrictions: 0-3 Definition: Accumulates the occupancies of the Write Pending Queue each cycle. This can then be used to calculate both the average queue occupancy (in conjunction with the number of cycles not empty) and the average latency (in conjunction with the number of allocations).
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring 2.6 Power Control (PCU) Performance Monitoring 2.6.1 Overview of the PCU The PCU is the primary Power Controller. The uncore implements a power control unit acting as a core/uncore power and thermal manager. It runs its firmware on an internal micro-controller and coordinates the socket’s power states. The PCU algorithmically governs the P-state of the processor, C-state of the core and the package Cstate of the socket.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring Table 2-73.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring U 2.6.3.2 PCU PMON state - Counter/Control Pairs The following table defines the layout of the PCU performance monitor control registers. The main task of these configuration registers is to select the event to be monitored by their respective data counter (.ev_sel, .umask). Additional control bits are provided to shape the incoming events (e.g. .invert, .edge_det, .
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring Table 2-75. PCU_MSR_PMON_CTL{3-0} Register – Field Definitions (Sheet 2 of 2) Field invert Bits 23 Attr RW-V HW Reset Val 0 Description Invert comparison against Threshold. 0 - comparison will be ‘is event increment >= threshold?’. 1 - comparison is inverted - ‘is event increment < threshold?’ NOTE: .invert is in series following .thresh, Due to this, the .thresh field must be set to a non-0 value.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring • For frequency/voltage band filters, the multipler is at 100MHz granularity. So, a value of 32 (0x20) would represent a frequency of 3.2GHz. • Support for limited Frequency/Voltage Band histogramming.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring • Core State Transitions - there are a larger number of events provided to track when cores transition C-state, when the enter/exit specific C-states, when they receive a C-state demotion, etc. • Frequency/Voltage Banding - ability to measure the number of cycles the uncore was operating within a frequency or voltage ‘band’ that can be specified in a seperate filter register.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring 2.6.5 PCU Box Events Ordered By Code The following table summarizes the directly measured PCU Box events. Table 2-81.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring Table 2-81. Performance Monitor Events for PCU (Sheet 2 of 2) Event Code Extra Select Bit Ctrs Max Inc/ Cyc CORE7_TRANSITION_CYCLES 0x0A 1 0-3 1 Core C State Transition Cycles TOTAL_TRANSITION_CYCLES 0x0B 1 0-3 1 Total Core C State Transition Cycles Symbol Name 2.6.6 Description PCU Box Common Metrics (Derived Events) The following table summarizes metrics commonly calculated from PCU Box events. Table 2-82.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring CORE1_TRANSITION_CYCLES • • • • • • Title: Core C State Transition Cycles Category: CORE_C_STATE_TRANSITION Events Event Code: 0x04 Extra Select Bit: Y Max. Inc/Cyc: 1, Register Restrictions: 0-3 Definition: Number of cycles spent performing core C state transitions. There is one event per core.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring CORE6_TRANSITION_CYCLES • • • • • • Title: Core C State Transition Cycles Category: CORE_C_STATE_TRANSITION Events Event Code: 0x09 Extra Select Bit: Y Max. Inc/Cyc: 1, Register Restrictions: 0-3 Definition: Number of cycles spent performing core C state transitions. There is one event per core.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring DEMOTIONS_CORE4 • • • • • • Title: Core C State Demotions Category: CORE_C_STATE_TRANSITION Events Event Code: 0x22 Max. Inc/Cyc: 1, Register Restrictions: 0-3 Filter Dependency: PCUFilter[7:0] Definition: Counts the number of times when a configurable cores had a C-state demotion DEMOTIONS_CORE5 • • • • • • Title: Core C State Demotions Category: CORE_C_STATE_TRANSITION Events Event Code: 0x23 Max.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring FREQ_BAND1_CYCLES • • • • • • Title: Frequency Residency Category: FREQ_RESIDENCY Events Event Code: 0x0C Max. Inc/Cyc: 1, Register Restrictions: 0-3 Filter Dependency: PCUFilter[15:8] Definition: Counts the number of cycles that the uncore was running at a frequency greater than or equal to the frequency that is configured in the filter.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring FREQ_MAX_CURRENT_CYCLES • • • • • • Title: Current Strongest Upper Limit Cycles Category: FREQ_MAX_LIMIT Events Event Code: 0x07 Max. Inc/Cyc: 1, Register Restrictions: 0-3 Definition: Counts the number of cycles when current is the upper limit on frequency. NOTE: This is fast path, will clear our other limits when it happens. The slow loop portion, which covers the other limits, can double count EDP.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring FREQ_MIN_PERF_P_CYCLES • • • • • • Title: Perf P Limit Strongest Lower Limit Cycles Category: FREQ_MIN_LIMIT Events Event Code: 0x02 Extra Select Bit: Y Max. Inc/Cyc: 1, Register Restrictions: 0-3 Definition: Counts the number of cycles when Perf P Limit is preventing us from dropping the frequency lower. Perf P Limit is an algorithm that takes input from remote sockets when determining if a socket should drop it's frequency down.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring PROCHOT_EXTERNAL_CYCLES • • • • • Title: External Prochot Category: PROCHOT Events Event Code: 0x0A Max. Inc/Cyc: 1, Register Restrictions: 0-3 Definition: Counts the number of cycles that we are in external PROCHOT mode. This mode is triggered when a sensor off the die determines that something off-die (like DRAM) is too hot and must throttle to avoid damaging the chip.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring • Definition: Counts the number of cycles when the system is increasing voltage. There is no filtering supported with this event. One can use it as a simple event, or use it conjunction with the occupancy events to monitor the number of cores or threads that were impacted by the transition. VR_HOT_CYCLES • • • • • Title: VR Hot Category: VR_HOT Events Event Code: 0x32 Max. Inc/Cyc: 1, Register Restrictions: 0-3 Definition: 2.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring 2.7.3 Intel® QPI Performance Monitors Table 2-84.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring Table 2-85. Q_Py_PCI_PMON_BOX_CTL Register – Field Definitions Field rsv Bits Attr 31:18 RV HW Reset Val 0 Description Reserved (?) rsv 17 RV 0 Reserved; SW must write to 0 else behavior is undefined. frz_en 16 WO 0 Freeze Enable. If set to 1 and a freeze signal is received, the counters will be stopped or ‘frozen’, else the freeze signal will be ignored. rsv 15:9 RV 0 Reserved (?) frz 8 WO 0 Freeze.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring Table 2-86. Q_Py_PCI_PMON_CTL{3-0} Register – Field Definitions (Sheet 2 of 2) Field Bits edge_det 18 Attr RW-V HW Reset Val 0 Description When set to 1, rather than measuring the event in each cycle it is active, the corresponding counter will increment when a 0 to 1 transition (i.e. rising edge) is detected. When 0, the counter will increment in each cycle that the event is asserted. NOTE: .edge_det is in series following .
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring Table 2-88. Q_Py_PCI_PMON_PKT_MATCH1 Registers Bits HW Reset Val --- 31:20 0x0 Reserved; Must write to 0 else behavior is undefined. RDS 19:16 0x0 Response Data State (valid when MC == DRS and Opcode == 0x02). Bit settings are mutually exclusive.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring Table 2-90. Q_Py_PCI_PMON_PKT_MASK1 Registers Field Bits HW Reset Val Description --- 31:20 0x0 Reserved; Must write to 0 else behavior is undefined. RDS 19:16 0x0 Response Data State (valid when MC == DRS and Opcode == 0x02). Bit settings are mutually exclusive.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring Table 2-92. Message Events Derived from the Match/Mask filters (Sheet 1 of 2) Match [12:0] Mask [12:0] DRS.AnyDataC 0x1C00 0x1F80 Any Data Response message containing a cache line in response to a core request. The AnyDataC messages are only sent to an S-Box. The metric DRS.AnyResp - DRS.AnyDataC will compute the number of DRS writeback and non snoop write messages. DRS.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring Table 2-92. Message Events Derived from the Match/Mask filters (Sheet 2 of 2) Match [12:0] Mask [12:0] NCB.AnyMsg9flits 0x1800 0x1F00 Any Non-Coherent Bypass message that is 9 flits in length. A 9 flit NCB message contains a full 64 byte cache line. NCB.AnyMsg11flits 0x1900 0x1F00 Any Non-Coherent Bypass message that is 11 flits in length. An 11 flit NCB message contains either partial data or an interrupt.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring 2.7.4.2 Acronyms frequently used in Intel® QPI Events: RxL (aka IGR) - “Receive from Link” referring to Ingress (requests from the Ring) queues. TxL (aka EGR) - “Transmit to Link” referring to Egress (requests headed for the Ring) queues. 2.7.5 Intel® QPI LL Box Events Ordered By Code The following table summarizes the directly measured Intel QPI LL Box events. Table 2-94.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring Table 2-94. Performance Monitor Events for Intel® QPI LL (Sheet 2 of 2) 2.7.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring Table 2-95. Metrics Derived from Intel QPI LL Events (Sheet 2 of 3) Symbol Name: Definition Equation DRS_F_OR_E_FROM_QPI: DRS response in F or E states received from QPI in bytes. To calculate the total data response for each cache line state, it's necessary to add the contribution from three flavors {DataC, DataC_FrcAckCnflt, DataC_Cmp} of data response packets for each cache line state.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring Table 2-95. Metrics Derived from Intel QPI LL Events (Sheet 3 of 3) Symbol Name: Definition 2.7.7 Equation PCT_LINK_FULL_POWER_CYCLES: Percent of Cycles the QPI link is at Full Power RxL0_POWER_CYCLES / CLOCKTICKS PCT_LINK_HALF_DISABLED_CYCLES: Percent of Cycles the QPI link in power mode where half of the lanes are disabled.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring Table 2-96. Unit Masks for DIRECT2CORE Extension umask [15:8] Description SUCCESS bxxxxxxx1 Spawn Success: The spawn was successful. There were sufficient credits, and the message was marked to spawn direct2core. FAILURE_CREDITS bxxxxxx1x Spawn Failure - Egress Credits: The spawn failed because there were not enough Egress credits. Had there been enough credits, the spawn would have worked as the RBT bit was set.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring RxL_BYPASSED • • • • • Title: Rx Flit Buffer Bypassed Category: RXQ Events Event Code: 0x09 Max. Inc/Cyc: 1, Register Restrictions: 0-3 Definition: Counts the number of times that an incoming flit was able to bypass the flit buffer and pass directly across the BGF and into the Egress. This is a latency optimization, and should generally be the common case.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring • Definition: Counts the number of cycles that the Intel® QPI RxQ was not empty. Generally, when data is transmitted across Intel® QPI, it will bypass the RxQ and pass directly to the ring interface. If things back up getting transmitted onto the ring, however, it may need to allocate into this buffer, thus increasing the latency.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring "transfers" here refer to "fits". Therefore, in L0, the system will transfer 1 "flit" at the rate of 1/4th the Intel® QPI speed. One can calculate the bandwidth of the link by taking: flits*80b/time. Note that this is not the same as "data" bandwidth.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring Table 2-100. Unit Masks for RxL_FLITS_G2 Extension umask [15:8] Description NDR_AD bxxxxxxx1 Non-Data Response Rx Flits - AD: Counts the total number of flits received over the NDR (Non-Data Response) channel. This channel is used to send a variety of protocol flits including grants and completions. This is only for NDR packets to the local socket which use the AK ring.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring RxL_INSERTS_HOM • • • • • • Title: Rx Flit Buffer Allocations - HOM Category: RXQ Events Event Code: 0x0C Extra Select Bit: Y Max. Inc/Cyc: 1, Register Restrictions: 0-3 Definition: Number of allocations into the Intel® QPI Rx Flit Buffer. Generally, when data is transmitted across Intel® QPI, it will bypass the RxQ and pass directly to the ring interface.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring RxL_INSERTS_SNP • • • • • • Title: Rx Flit Buffer Allocations - SNP Category: RXQ Events Event Code: 0x0D Extra Select Bit: Y Max. Inc/Cyc: 1, Register Restrictions: 0-3 Definition: Number of allocations into the Intel® QPI Rx Flit Buffer. Generally, when data is transmitted across Intel® QPI, it will bypass the RxQ and pass directly to the ring interface.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring RxL_OCCUPANCY_NCB • • • • • • Title: RxQ Occupancy - NCB Category: RXQ Events Event Code: 0x16 Extra Select Bit: Y Max. Inc/Cyc: 128, Register Restrictions: 0-3 Definition: Accumulates the number of elements in the Intel® QPI RxQ in each cycle. Generally, when data is transmitted across Intel® QPI, it will bypass the RxQ and pass directly to the ring interface.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring TxL0P_POWER_CYCLES • • • • • Title: Cycles in L0p Category: POWER_TX Events Event Code: 0x0D Max. Inc/Cyc: 1, Register Restrictions: 0-3 Definition: Number of Intel® QPI qfclk cycles spent in L0p power mode. L0p is a mode where we disable 1/2 of the Intel® QPI lanes, decreasing our bandwidth in order to save power. It increases snoop and data transfer latencies and decreases overall bandwidth.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring "speed" (for example, 8.0 GT/s), the "transfers" here refer to "fits". Therefore, in L0, the system will transfer 1 "flit" at the rate of 1/4th the Intel® QPI speed. One can calculate the bandwidth of the link by taking: flits*80b/time. Note that this is not the same as "data" bandwidth.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring Table 2-102. Unit Masks for TxL_FLITS_G1 Extension umask [15:8] Description SNP bxxxxxxx1 SNP Flits: Counts the number of snoop request flits transmitted over Intel® QPI. These requests are contained in the snoop channel. This does not include snoop responses, which are transmitted on the home channel. HOM_REQ bxxxxxx1x HOM Request Flits: Counts the number of data request transmitted over Intel® QPI on the home channel.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring Table 2-103. Unit Masks for TxL_FLITS_G2 Extension umask [15:8] Description NDR_AD bxxxxxxx1 Non-Data Response Tx Flits - AD: Counts the total number of flits transmitted over the NDR (Non-Data Response) channel. This channel is used to send a variety of protocol flits including grants and completions. This is only for NDR packets to the local socket which use the AK ring.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring VNA_CREDIT_RETURNS • • • • • • Title: VNA Credits Returned Category: VNA_CREDIT_RETURN Events Event Code: 0x1C Extra Select Bit: Y Max. Inc/Cyc: 1, Register Restrictions: 0-3 Definition: Number of VNA credits returned. VNA_CREDIT_RETURN_OCCUPANCY • • • • • • Title: VNA Credits Pending Return - Occupancy Category: VNA_CREDIT_RETURN Events Event Code: 0x1B Extra Select Bit: Y Max.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring 2.8.3 R2PCIe Performance Monitors Table 2-104.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring 2.8.3.2 R2PCIe PMON state - Counter/Control Pairs The following table defines the layout of the R2PCIe performance monitor control registers. The main task of these configuration registers is to select the event to be monitored by their respective data counter (.ev_sel, .umask). Additional control bits are provided to shape the incoming events (e.g. .invert, .edge_det, .
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring 2.8.4 R2PCIe Performance Monitoring Events 2.8.4.1 An Overview R2PCIe provides events to track information related to all the traffic passing through it’s boundaries. • IIO credit tracking - credits rejected, acquired and used all broken down by message Class. • Ring Stop Events To track Ingress/Egress Traffic and Ring Utilization (broken down by direction and ring type) statistics. 2.8.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring Table 2-109. Metrics Derived from R2PCIe Events (Sheet 2 of 2) Symbol Name: Definition 2.8.7 Equation IIO_RDS_TO_RING_IN_BYTES: IIO Reads, data transmitted to Ring in Bytes TxR_INSERTS.BL * 32 RING_THRU_DNEVEN_BYTES: Ring throughput in the Down direction, Even polarity in Bytes RING_BL_USED.CCW_EVEN * 32 RING_THRU_DNODD_BYTES: Ring throughput in the Down direction, Odd polarity in Bytes RING_BL_USED.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring RING_AK_USED • • • • • Title: R2 AK Ring in Use Category: RING Events Event Code: 0x08 Max. Inc/Cyc: 1, Register Restrictions: 0-3 Definition: Counts the number of cycles that the AK ring is being used at this ring stop. This includes when packets are passing by and when packets are being sunk, but does not include when packets are being sent from the ring stop. Table 2-111.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring Table 2-113. Unit Masks for RING_IV_USED Extension ANY umask [15:8] b00001111 Description Any: Filters any polarity RxR_AK_BOUNCES • • • • • Title: AK Ingress Bounced Category: INGRESS Events Event Code: 0x12 Max. Inc/Cyc: 1, Register Restrictions: 0 Definition: Counts the number of times when a request destined for the AK ingress bounced.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring TxR_CYCLES_NE • • • • • Title: Egress Cycles Not Empty Category: EGRESS Events Event Code: 0x23 Max. Inc/Cyc: 1, Register Restrictions: 0 Definition: Counts the number of cycles when the R2PCIe Egress is not empty. This tracks one of the three rings that are used by the R2PCIe agent. This can be used in conjunction with the R2PCIe Egress Occupancy Accumulator event in order to calculate average queue occupancy.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring 2.9 R3QPI Performance Monitoring 2.9.1 Overview of the R3QPI Box R3QPI is the interface between the Intel® QPI Link Layer, which packetizes requests, and the Ring. R3QPI is the interface between the ring and the Intel® QPI Link Layer. It is responsible for translating between ring prototcol packets and flits that are used for transmitting data across the Intel® QPI interface.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring 2.9.3 R3QPI Performance Monitors Table 2-118.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring 2.9.3.2 R3QPI PMON state - Counter/Control Pairs The following table defines the layout of the R3QPI performance monitor control registers. The main task of these configuration registers is to select the event to be monitored by their respective data counter (.ev_sel, .umask). Additional control bits are provided to shape the incoming events (e.g. .invert, .edge_det, .
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring 2.9.4 R3QPI Performance Monitoring Events 2.9.4.1 An Overview R3QPI provides events to track information related to all the traffic passing through it’s boundaries. • VN/IIO credit tracking - in addition to tracking the occupancy of the full VNA queue, R3QPI provides a great deal of additional information: credits rejected, acquired and used often broken down by Message Class.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring 2.9.6 R3QPI Box Common Metrics (Derived Events) The following table summarizes metrics commonly calculated from R3QPI Box events. Table 2-123. Metrics Derived from R3QPI Events Symbol Name: Definition Equation QPI_RDS_TO_RING_IN_BYTES: QPI Reads, data transmitted to Ring in Bytes 2.9.7 TxR_INSERTS.BL * 32 R3QPI Box Performance Monitor Event List The section enumerates the performance monitoring events for the R3QPI Box.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring used for transfering data with coherency (cachable PCI transactions). This event can only track one message class at a time. Table 2-125. Unit Masks for IIO_CREDITS_REJECT Extension umask [15:8] DRS bxxxx1xxx NCB bxxx1xxxx NCS bxx1xxxxx Description IIO_CREDITS_USED • • • • • Title: to IIO BL Credit In Use Category: IIO_CREDITS Events Event Code: 0x22 Max.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring RING_AK_USED • • • • • Title: R3 AK Ring in Use Category: RING Events Event Code: 0x08 Max. Inc/Cyc: 1, Register Restrictions: 0-2 Definition: Counts the number of cycles that the AK ring is being used at this ring stop. This includes when packets are passing by and when packets are being sent, but does not include when packets are being sunk into the ring stop. Table 2-128.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring Table 2-130. Unit Masks for RING_IV_USED Extension ANY umask [15:8] b00001111 Description Any: Filters any polarity RxR_BYPASSED • • • • • Title: Ingress Bypassed Category: INGRESS Events Event Code: 0x12 Max. Inc/Cyc: 1, Register Restrictions: 0-1 Definition: Counts the number of times when the Ingress was bypassed and an incoming transaction was bypassed directly across the BGF and into the qfclk domain. Table 2-131.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring • Definition: Counts the number of allocations into the Intel® QPI Ingress. This tracks one of the three rings that are used by the Intel® QPI agent. This can be used in conjunction with the Intel® QPI Ingress Occupancy Accumulator event in order to calculate average queue latency. Multiple ingress buffers can be tracked at a given time using multiple counters. Table 2-133.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring TxR_CYCLES_NE • • • • • Title: Egress Cycles Not Empty Category: EGRESS Events Event Code: 0x23 Max. Inc/Cyc: 1, Register Restrictions: 0-1 Definition: Counts the number of cycles when the Intel® QPI Egress is not empty. This tracks one of the three rings that are used by the Intel® QPI agent. This can be used in conjunction with the Intel® QPI Egress Occupancy Accumulator event in order to calculate average queue occupancy.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring Table 2-139. Unit Masks for VN0_CREDITS_REJECT (Sheet 2 of 2) Extension umask [15:8] Description NDR bxxxxx1xx NDR Message Class: NDR packets are used to transmit a variety of protocol flits including grants and completions (CMP). DRS bxxxx1xxx DRS Message Class: Filter for Data Response (DRS). DRS is generally used to transmit data with coherency.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring VNA_CREDITS_ACQUIRED • • • • • Title: VNA credit Acquisitions Category: LINK_VNA_CREDITS Events Event Code: 0x33 Max. Inc/Cyc: 4, Register Restrictions: 0-1 Definition: Number of Intel® QPI VNA Credit acquisitions. This event can be used in conjunction with the VNA In-Use Accumulator to calculate the average lifetime of a credit holder. VNA credits are used by all message classes in order to communicate across Intel® QPI.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring can be transmitted, as those holding VN0 credits will still (potentially) be able to transmit. Generally it is the goal of the uncore that VNA credits should not run out, as this can substantially throttle back useful Intel® QPI bandwidth. VNA_CREDIT_CYCLES_USED • • • • • Title: Cycles with 1 or more VNA credits in use Category: LINK_VNA_CREDITS Events Event Code: 0x32 Max.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring Table 2-143.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring Table 2-144. Opcodes (Alphabetical Listing) (Sheet 2 of 4) Name Gen By? Desc Opc MC DataC_(FEIMS)_Cmp 0010 DRS Data Response in (FEIMS) state, Complete NOTE: Set RDS field to specify which state is to be measured. - Supports getting data in E, F or I state DataC_(FEIMS)_FrcAckC nflt 0001 DRS Data Response in (FEIMS) state, Force Acknowledge NOTE: Set RDS field to specify which state is to be measured.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring Table 2-144. Opcodes (Alphabetical Listing) (Sheet 3 of 4) Name 134 Opc MC Gen By? Desc PrefetchHint 1111 SNP RdCode 0001 HOM0 Read cache line in F (or S, if the F state not supported) Snoop Prefetch Hint RdCur 0000 HOM0 Request a cache line in I. Typically issued by I/O proxy entities, RdCur is used to obtain a coherent snapshot of an uncached cache line.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring Table 2-144. Opcodes (Alphabetical Listing) (Sheet 4 of 4) Name Gen By? Desc Opc MC WbMtoI 1100 HOM0 Write a cache line in M state back to memory and transition its state to I. WbMtoE 1101 HOM0 Write a cache line in M state back to memory and transition its state to E. WbMtoS 1110 HOM0 Write a cache line in M state back to memory and transition its state to S.
Intel® Xeon® Processor E5-2600 Product Family Uncore Performance Monitoring 136 Reference Number: 327043-001