White Paper Telemetry Streaming with iDRAC9— Custom Reports Get Started Abstract Dell EMC PowerEdge Servers with iDRAC9 4.0 Datacenter stream data to help IT administrators better understand the inner workings of their server environment. This white paper explains the Telemetry Streaming feature and basic steps to configure iDRAC9, including adding custom report definitions on iDRAC9 4.40 or above.
White Paper Revisions Date Description May 2021 Initial release Acknowledgments Authors: Sankara Gara, Sailaja Mahendrakar, Heidi Maeder, Praveen Thangavelu, Doug Iler The information in this publication is provided “as is.” Dell Inc. makes no representations or warranties of any kind with respect to the information in this publication, and specifically disclaims implied warranties of merchantability or fitness for a particular purpose.
Table of contents Revisions.............................................................................................................................................................................2 Acknowledgments ...............................................................................................................................................................2 Table of contents ...................................................................................................................
Executive summary With iDRAC9 v4.00.00.00 firmware and the Datacenter license, IT managers can integrate advanced server hardware operation telemetry into their existing analytics solutions. Telemetry is provided as granular, timeseries data that is streamed, or pushed, compared to inefficient, legacy polling, or pulled, methods. The advanced agent-free architecture in iDRAC9 provides over 180 data metrics that are related to server and peripherals operations.
1 Telemetry Overview Telemetry streaming is an automated communications process by which measurements and other data are collected at remote or inaccessible points. With iDRAC9 4.0 Datacenter, it is possible to stream a wide variety of metric reports from to an ingress collector such as Splunk or ELK Stack. These and other tools can then perform remote server monitoring and analysis.
EEMI: The Event and Error Message Information is available in a reference guide which lists the messages in the user interface, command-line interface, and log files. Messages are displayed or stored as a result of user action, automatic event occurrence, or for data logging purposes. MRD – Metric Report Definition FQDD – Fully Qualified Device Descriptor 1.2 Prerequisites The Custom Telemetry feature is available on iDRAC9 firmware version 4.40.00.00 or above and requires Datacenter license.
-H 'Content-Type: application/json' -d '{"ServiceEnabled": true }’ To disable Telemetry Service: curl -s -k -u : -X PATCH https:////redfish/v1/TelemetryService -H 'Content-Type: application/json' -d '{"ServiceEnabled": false }’ Configuring Telemetry Streaming Content The system is shipped with pre-canned report definitions with default configuration for periodic reporting.
2.3 Configuring Telemetry Report Triggers (optional) Telemetry triggers are a means to generate and stream reports that are based on an error or warning condition. These reports are predefined based on Lifecycle log (LCL) events for error or warming conditions. If configured, a new report is generated before the scheduled report interval when a trigger occurs. The default configuration includes the triggers that are relevant for a report. You can modify the trigger association.
"EventTypes": ["MetricReport"], "SubscriptionType":"RedfishEvent“, "MetricReportDefinitions":[ { "@odata.id": "/redfish/v1/TelemetryService/MetricReportDefinitions/", "@odata.
2.4.3 Pull Method The Redfish client can pull a report or report collection URIs on demand by performing an HTTP GET operation on the metric report URI as specified below. Pull report collection (URI list only) to know available reports: curl -s -k -u : -X GET https:///redfish/v1/TelemetryService/MetricReports Pull one report: curl -s -k -u : -X GET https:///redfish/v1/TelemetryService/MetricReports/ e.g. = PowerMetrics 2.
"MetricReportDefinitionType": "Periodic", "MetricReportHeartbeatInterval": "PT0H0M0S", "SuppressRepeatedMetricValue": false, "ReportTimespan": "PT0H0M0S", "ReportUpdates": "Overwrite", "ReportActions": [ "RedfishEvent" ], "Schedule": { "RecurrenceInterval": "PT0H2M0S" }, "Metrics": [ { "MetricId": "TxBytes", "MetricProperties": [], "MetricProperties@odata.count": 0, "CollectionFunction": null, "CollectionDuration": null, "CollectionTimeScope": "Point", "Oem": { "Dell": { "@odata.type": "#DellMetric.v1_1_0.
"Triggers": [] } } When the above POST command successful the new custom report will be added to the report definition collection. To get report definition collection (URI list only): curl -s -k -u : -X GET https:///redfish/v1/TelemetryService/MetricReportDefinitions To get one report definition detail: curl -s -k -u : -X GET https:///redfish/v1/TelemetryService/MetricReports/ e.g. = TxRxBytesNicSlot1 2.5.
Metric reports are generated and values are added to the report at the rate they are produced by the backend services. Reports that contain metrics with different reporting characteristics will have different numbers of metric values in the resulting reports that match the rate at which the backend daemons report data for these metrics. There may a variance in Metric Value count variance because the rate at which metrics are ingested is clocked by backend services reporting the metrics.
2.7 Troubleshooting and Tips Issues Possible Causes Service is not enabled. Property “ServiceEnabled” is set to false. Applies to TelemetryService/EventService POST, PATCH operations failure. Property that is added in the input payload is not allowed or the value added is invalid . GET Metric Report failure . No Metric Report in the SSE or subscription stream a) Check Redfish documentation for allowed properties and valid values. Service is not enabled. Property “ServiceEnabled” is set to false.
2.8 2.8 Best practices 1. A Server Configuration Profile (SCP) is better option to configure all the metric reports by including “Custom Telemetry” option. Once an SCP file is created, the same file can be applied to multiple servers that support Telemetry feature and Datacenter license. 2. Configure the “report interval” based on the system configuration and number of configured telemetry reports.
SYS406 SYS413 SYS414 Dell MessageRegistry Dell MessageRegistry Dell MessageRegistry Unable to start the configuration operation because the System Lockdown mode is enabled. The operation successfully completed. A new resource is successfully created. SYS419 Dell MessageRegistry Unable to complete the operation because the Redfish attribute is disabled.
Technical support and resources A Technical support and resources • iDRAC Telemetry Workflow Examples https://github.com/dell/iDRAC-Telemetry-Scripting/ • Open-source iDRAC REST API with Redfish Python and PowerShell examples. https://github.com/dell/iDRAC-Redfish-Scripting • • The iDRAC support home page provides access to product documents, technical white papers, howto videos, and more. www.dell.com/support/idrac iDRAC User Guides and other manuals www.dell.
MetricIDs B MetricIDs Following are the currently available metrics (MetricIDs) and the associated pre-canned reports. Detail of each metric (MetricDefinition), like description, type, units, and sensing interval etc., can be obtained using the following command. curl -s -k -u : -X GET https:///redfish/v1/TelemetryService/MetricDefinitions/ e.g. = SystemMaxPowerConsumption { "@odata.type": "#MetricDefinition.v1_1_1.MetricDefinition", "@odata.
MetricIDs B.4 SystemUsage Report • CPUUsage • IOUsage • MemoryUsage • AggregateUsage B.5 FanSensor Report • RPMReading B.6 FCPortStatistics Report • FCInvalidCRCs • FCLinkFailures • FCLossOfSignals • FCRxKBCount • FCRxSequences • FCRxTotalFrames • FCTxKBCount • FCTxSequences • FCTxTotalFrames • FCStatOSDriverState • PortSpeed • PortStatus B.6 FCSensor Report • TemperatureReading B.7 FPGASensor Report • TemperatureReading • TotalFPGAPower B.
MetricIDs • PowerConsumption • PowerSupplyStatus • PrimaryTemperature • SecondaryTemperature • ThermalAlertState B.9 GPUStatistics Report • CumulativeDBECounterFB • CumulativeDBECounterGR • CumulativeSBECounterFB • CumulativeSBECounterGR • DBECounterFB • DBECounterFBL2Cache • DBECounterGRL1Cache • DBECounterGRRF • DBECounterGRTex • DBERetiredPages • SBECounterFB • SBECounterFBL2Cache • SBECounterGRL1Cache • SBECounterGRRF • SBECounterGRTex • SBERetiredPages B.
MetricIDs • LinkStatus • OSDriverState • PartitionLinkStatus • PartitionOSDriverState • RDMARxTotalBytes • RDMARxTotalPackets • RDMATotalProtectionErrors • RDMATotalProtocolErrors • RDMATxTotalBytes • RDMATxTotalPackets • RDMATxTotalReadReqPkts • RDMATxTotalSendPkts • RDMATxTotalWritePkts • RxBroadcast • RxBytes • RxErrorPktAlignmentErrors • RxErrorPktFCSErrors • RxFalseCarrierDetection • RxJabberPkt • RxMutlicast • RxPauseXOFFFrames • RxPauseXONFrames • RxRuntPk
MetricIDs • DataUnitsReadLower • DataUnitsReadUpper • DataUnitsWrittenLower • DataUnitsWrittenUpper • HostReadCommandsLower • HostReadCommandsUpper • HostWriteCommandsLower • HostWriteCommandsUpper • MediaDataIntegrityErrorsLower • MediaDataIntegrityErrorsUpper • NumOfErrorInfoLogEntriesLower • NumOfErrorInfoLogEntriesUpper • PercentageUsed • PowerCyclesLower • PowerCyclesUpper • PowerOnHoursLower • PowerOnHoursUpper • UnsafeShutdownsLower • UnsafeShutdownsUpper B.
MetricIDs • LastHourMinPowerTime • LastMinuteAvgPower • LastMinuteMaxPower • LastMinuteMaxPowerTime • LastMinuteMinPower • LastMinuteMinPowerTime • LastWeekAvgPower • LastWeekMaxPower • LastWeekMaxPowerTime • LastWeekMinPower • LastWeekMinPowerTime B.16 PSUMetrics Report • FanSpeed • Temperature B.
MetricIDs • ReallocatedBlockCount • UncorrectableErrorCount • UncorrectableLBACount • UnusedReservedBlockCount • UsedReservedBlockCount • VolatileMemoryBackupSourceFailures B.19 StorageSensor Report • TemperatureReading B.20 ThermalMetrics Report • ComputePower • ITUE • PowerToCoolRatio • PSUEfficiency • SysAirFlowEfficiency • SysAirflowPerFanPower • SysAirflowPerSysInputPower • SysAirflowUtilization • SysNetAirflow • SysRackTempDelta • TotalPSUHeatDissipation B.
MetricIDs "MetricValues": [ { "MetricId": "TotalMemoryPower", "Timestamp": "2021-05-26T20:00:01.378Z", "MetricValue": "1", "Oem": { "Dell": { "@odata.type": "#DellMetricValue.v1_0_0.DellMetricValue", "ContextID": "PowerMetrics", "Label": "PowerMetrics TotalMemoryPower", "Source": "powermetrics", "FQDD": "PowerMetrics" } } }, { "MetricId": "TotalFanPower", "Timestamp": "2021-05-26T20:00:01.378Z", "MetricValue": "8.421875", "Oem": { "Dell": { "@odata.type": "#DellMetricValue.v1_0_0.
MetricIDs "MetricId": "SystemOutputPower", "Timestamp": "2021-05-26T20:00:01.378Z", "MetricValue": "112", "Oem": { "Dell": { "@odata.type": "#DellMetricValue.v1_0_0.DellMetricValue", "ContextID": "PowerMetrics", "Label": "PowerMetrics SystemOutputPower", "Source": "powermetrics", "FQDD": "PowerMetrics" } } }, { "MetricId": "TotalStoragePower", "Timestamp": "2021-05-26T20:00:01.378Z", "MetricValue": "12.890625", "Oem": { "Dell": { "@odata.type": "#DellMetricValue.v1_0_0.
MetricIDs "Oem": { "Dell": { "@odata.type": "#DellMetricValue.v1_0_0.DellMetricValue", "ContextID": "PowerMetrics", "Label": "PowerMetrics TotalFPGAPower", "Source": "powermetrics", "FQDD": "PowerMetrics" } } }, { "MetricId": "TotalCPUPower", "Timestamp": "2021-05-26T20:00:01.378Z", "MetricValue": "43", "Oem": { "Dell": { "@odata.type": "#DellMetricValue.v1_0_0.
MetricIDs "@odata.type": "#DellMetricValue.v1_0_0.DellMetricValue", "ContextID": "PowerMetrics", "Label": "PowerMetrics SystemInputPower", "Source": "powermetrics", "FQDD": "PowerMetrics" } } } ], "MetricValues@odata.