White Paper Telemetry Streaming with iDRAC9—What you Need to Get Started Abstract Dell EMC PowerEdge Servers with iDRAC9 4.0 Datacenter streams data to help IT administrators better understand the inner workings of their server environment. This white paper explains the Telemetry Streaming feature and basic steps to configure iDRAC9, and provides troubleshooting tips.
White Paper Revisions Date Description November 2019 Initial release June 2020 Errata update Acknowledgments Authors: Sankara Gara, Cyril Jose, Sailaja Mahendrakar, Praveen Thangavelu, MaheshBabu Ramaiah, Doug Iler The information in this publication is provided “as is.” Dell Inc. makes no representations or warranties of any kind with respect to the information in this publication, and specifically disclaims implied warranties of merchantability or fitness for a particular purpose.
Telemetry overview Table of contents Revisions.............................................................................................................................................................................2 Acknowledgments ...............................................................................................................................................................2 Table of contents ...............................................................................................
Telemetry overview Executive summary With iDRAC9 v4.00.00.00 firmware and a Datacenter license, IT managers can integrate advanced server hardware operation telemetry into their existing analytics solutions. Telemetry is provided as granular, timeseries data that is streamed, or pushed, compared to inefficient, legacy polling, or pulled, methods. The advanced agent-free architecture in iDRAC9 provides over 180 data metrics that are related to server and peripherals operations.
Telemetry overview 1 Telemetry overview Telemetry streaming is an automated communications process by which measurements and other data are collected at remote or inaccessible points. With iDRAC9 4.0 Datacenter, it is possible to stream a wide variety of metric reports from one or more PowerEdge servers to an ingress collector such as Splunk or ELK Stack. These and other tools can then perform remote server monitoring and analysis.
Telemetry overview Remote syslog (RSyslog): Remote syslog implements the basic syslog protocol, and extends it with contentbased filtering, rich filtering capabilities, and flexible configuration options. EEMI: The Event and Error Message Information is a reference guide which lists the messages in the user interface, command-line interface, and log files. Messages are displayed or stored as a result of user action, automatic event occurrence, or for data logging purposes.
Configuring telemetry 2 Configuring telemetry Telemetry configuration allows you to configure telemetry data streaming behavior and report generation behavior. It includes the global settings common to all reports, and those settings specific to each available report. Enabling or disabling telemetry at the global setting level enables or disables all reports for telemetry streaming. By default, the telemetry feature is disabled at the global setting level and for all reports individually.
Configuring telemetry PowerMetrics StorageDiskSMARTData CPUSensor FPGASensor NICSensor PowerStatistics StorageSensor Supported triggers CPUCriticalTrigger MEMWarnTrigger TMPCpuWarnTrigger CPUWarnTrigger NVMeCriticalTrigger TMPCriticalTrigger FANCriticalTrigger NVMeWarnTrigger TMPDiskCriticalTrigger FANWarnTrigger PDRCriticalTrigger MPDiskWarnTrigger IERRCriticalTrigger PDRWarnTrigger TMPWarnTrigger MEMCriticalTrigger TMPCpuCriticalTrigger VLTCriticalTrigger Workflow example configuring telemetry using Redf
Configuring telemetry -H 'Content-Type: application/json' -d '{"Attributes": {"TelemetryPowerMetrics.1.EnableTelemetry":"Enabled"}}' Workflow example configuring telemetry using RACADM The global and per report configuration can be done using iDRAC RACADM get and set operations on the Telemetry configuration attributes. The following is an example of enabling Telemetry on global and per report (PowerMetrics) and setting report interval in seconds on the report. Global: racadm get idrac.
Configuring telemetry HTTP POST https:///redfish/v1/Dell/Managers/iDRAC.Embedded.1/DelliDRACCardService/Actions/Delli DRACCardService.TestRsyslogServerConnection HTTP PATCH /redfish/v1/Managers/iDRAC.Embedded.1/Attributes Payload: {"Attributes":{"Telemetry.1.RsyslogTarget": "True"} e.g. curl -s -k -u user:pw -X PATCH https:///redfish/v1/Managers/iDRAC.Embedded.1/Attributes -H 'Content-Type: application/json' -d '{"Attributes":{"Telemetry.1.RsyslogServer1": "1.1.1.1", "Telemetry.1.
Configuring telemetry -H 'Content-Type: application/json' -d '{ "Attributes": {"TelemetryPowerMetrics.1.ReportTriggers": "CPUCriticalTrigger, CPUWarnTrigger"}}' RACADM: racadm set idrac.telemetry.1.ReportTriggers "" e.g. racadm set idrac.telemetryPowerMetrics.1.
3 Receiving telemetry reports After telemetry streaming is configured on the iDRAC, telemetry reports are streamed to the configured Redfish clients or Remote Syslog servers. The Redfish client can also pull the reports on demand. The following sections describe the methods through which clients can receive the data. Redfish client using subscription method A Redfish client receives the telemetry reports using subscription method.
Receiving telemetry reports Create subscription request Delete subscription request Document 418
Receiving telemetry reports Getting subscription collection Getting subscription details: 3.1.1 Redfish client using SSE method The SSE method is one more way of streaming telemetry data, with Redfish client and iDRAC event service communicating using the HTML5 SSE feature and HTTP protocol.
Receiving telemetry reports performing a GET on SSE URI. The streaming URI contains the event format type as metric report, which directs the iDRAC event service to stream enabled metric reports alone. The client-triggered SSE URI can also be provisioned to query specific metric reports that are streamed with the use of $filter.
Receiving telemetry reports 3.1.2 Redfish Client Using Pull Method The Redfish client can pull a report or report collection URIs on demand by performing an HTTP GET operation on the metric report URI as specified below. Pull One Report: HTTP GET /redfish/v1/TelemetryService/MetricReports/ e.g. curl -s -k -u user:pw -X GET https:///redfish/v1/TelemetryService/MetricReports/PowerMetrics Pull Report Collection (URI list only): HTTP GET /redfish/v1/TelemetryService/MetricReports e.g.
Receiving telemetry reports • FanSensor report gets generated only for Monolithic servers. For modular servers, the report is empty (with "MetricValues@odata.count": 0). When a report is enabled but the device hardware is not present, no report is generated. For instance, if a GPU card is not present in the system and the GPUMetrics report is pulled, the result would be an empty report with "MetricValues@odata.count": 0.
Receiving telemetry reports b. For reports like CUPS, PowerMetrics, CPUMemMetrics, ThermalMetrics, and GPUMetrics, it is recommended to set a minimum ReportInterval of 60 s even though the minimum ReportInterval of 5 s is allowed.
Technical support and resources A Technical support and resources • iDRAC Telemetry Workflow Examples https://github.com/dell/iDRAC-Telemetry-Scripting/ • Open-source iDRAC REST API with Redfish Python and PowerShell examples. https://github.com/dell/iDRAC-Redfish-Scripting • • The iDRAC support home page provides access to product documents, technical white papers, howto videos, and more. www.dell.com/support/idrac iDRAC User Guides and other manuals www.dell.
MetricIDs B MetricIDs B.1 AggregationMetrics Report • • • B.2 SystemAvgInletTempHour SystemMaxInletTempHour SystemMaxPowerConsumption CPUMemMetrics Report • • • • • • • • • CPUC0ResidencyHigh CPUC0ResidencyLow CUPSIIOBandwidthDMI CUPSIIOBandwidthPort0 CUPSIIOBandwidthPort1 CUPSIIOBandwidthPort2 CUPSIIOBandwidthPort3 NonC0ResidencyHigh NonC0ResidencyLow B.3 CPUSensor Report • TemperatureReading B.4 CUPS Report • CPUUsage • IOUsage • MemoryUsage • SystemUsage B.
MetricIDs B.6 FCSensor Report • TemperatureReading B.7 FPGASensor Report • TemperatureReading B.8 GPUMetrics Report • BoardPowerSupplyStatus • BoardTemperature • GPUHealth • GPUStatus • MemoryTemperature • PowerBrakeState • PowerConsumption • PowerSupplyStatus • PrimaryTemperature • SecondaryTemperature • ThermalAlertState B.
MetricIDs B.11 NICSensor Report • TemperatureReading B.
MetricIDs • TxErrorPktExcessiveCollision • TxErrorPktLateCollision • TxErrorPktMultipleCollision • TxErrorPktSingleCollision • TxMutlicast • TxPauseXOFFFrames • TxPauseXONFrames • TxUnicast B.
MetricIDs • TotalCPUPower • TotalFanPower • TotalMemoryPower • TotalPciePower • TotalStoragePower B.
MetricIDs • RPMReading • SystemUsagePctReading • TemperatureReading • VoltageReading • WattsReading B.
MetricIDs • SysAirflowUtilization • SysNetAirflow • SysRackTempDelta • TotalPSUHeatDissipation B.21 ThermalSensor Report • C TemperatureReading Sample Metric Report - PowerMetrics { "@odata.type": "#MetricReport.v1_2_0.MetricReport", "@odata.context": "/redfish/v1/$metadata#MetricReport.MetricReport," "@odata.id": "/redfish/v1/TelemetryService/MetricReports/PowerMetrics", "Id": "PowerMetrics", "Name": "Power Metrics Metric Report," "ReportSequence": "1", "MetricReportDefinition": { "@odata.
MetricIDs "Timestamp": "2020-02-03T20:10:24-06:00", "MetricValue": "94", "Oem": { "Dell": { "ContextID": "PowerMetrics", "Label": "PowerMetrics SystemOutputPower" } } }, { "MetricId": "SystemPowerConsumption", "Timestamp": "2020-02-03T20:10:24-06:00", "MetricValue": "108", "Oem": { "Dell": { "ContextID": "PowerMetrics", "Label": "PowerMetrics SystemPowerConsumption" } } }, { "MetricId": "TotalCPUPower", "Timestamp": "2020-02-03T20:10:24-06:00", "MetricValue": "58.
MetricIDs "MetricId": "TotalPciePower", "Timestamp": "2020-02-03T20:10:24-06:00", "MetricValue": "0.0", "Oem": { "Dell": { "ContextID": "PowerMetrics", "Label": "PowerMetrics TotalPciePower" } } }, { "MetricId": "TotalStoragePower", "Timestamp": "2020-02-03T20:10:24-06:00", "MetricValue": "13.2001953125", "Oem": { "Dell": { "ContextID": "PowerMetrics", "Label": "PowerMetrics TotalStoragePower" } } } ], "MetricValues@odata.