Dell EMC Storage Systems Events and Alerts Guide for the metro node appliance 7.
Notes, cautions, and warnings NOTE: A NOTE indicates important information that helps you make better use of your product. CAUTION: A CAUTION indicates either potential damage to hardware or loss of data and tells you how to avoid the problem. WARNING: A WARNING indicates a potential for property damage, personal injury, or death. © 2021 Dell Inc. or its subsidiaries. All rights reserved. Dell, EMC, and other trademarks are trademarks of Dell Inc. or its subsidiaries.
Contents Chapter 1: Events and Alerts......................................................................................................... 4 UI overview........................................................................................................................................................................... 4 Platform alerts......................................................................................................................................................................
1 Events and Alerts Topics: • • • • • • • • • • • • • • • • • • • • UI overview Platform alerts Hardware alerts Alert states Scope of the events Sorting and filtering alerts Download .
Column Description Severity Indicates the urgency of alert: ● CRITICAL-A condition has occurred that can obstruct the functionality or can lead to failure of the other components. ● ERROR-An error has occurred that has a significant impact on the system and must be rectified immediately. ● WARNING-An error has occurred that you should be aware of but does not have a significant impact on the system. For example, a component is working, but its performance may not be optimum.
Hardware alerts iDRAC alerts For a corresponding iDRAC or hardware event, the notification service generates alert. You can monitor the status of metro node hardware that includes alerts that are generated at hardware level. You can view alerts that were created during the last 48 hours (default). In the details of each alert, you can see more information including Severity, Message, and other properties.
Column Description Last Updated (UTC) Date and time when the status of the alert is last changed. User Note It shows the notes which are added through user. Monitor alerts Monitor alerts alert the customer if there are specific use cases. These alerts are there to keep a watch on the hardware functionality. Monitor alerts are generated for the following scenarios: ● ● ● ● ● If If If If If any of the storage partitions becomes 80% full. any of the storage partitions becomes 90% full.
Column Description Host Name It represents IP address or network name of the remote host. Creation Date (UTC) Date and time when the alert got generated. Last Updated (UTC) Date and time when the status of the alert is last changed. User Note It shows the notes which are added through user. Alert states The alerts can be in any of the following states: OPEN It represents the state when an alert has been raised.
For Platform alerts, whenever there is a change in scope incarnation value for the events, then all the director scope alerts with the previous scope incarnation value are closed on that node. Scope Incarnation value for director changes when the node of the metro node or firmware is restarted. The cluster scope incarnation values change when both the nodes are restarted on the cluster.
Alerts roll-up Alerts roll-up is the process of consolidating the alerts based on the circumstances that are possessed at the time of generation of platform alerts and at the time of moving the live alerts to historical alerts. As of now, the roll-up is applicable only for the platform alerts.
● Disabled-CLOSED rolled-up historical alert ● Entries for the Live alerts which got rolled-up Alerts on remote director User has option to view the alerts on the remote directors also. The director selection can be done from the drop-down option available on the alerts listing page. These alerts on the remote director can be viewed, and the state change operation can also be performed on them. The user can also disable the alert on the remote director.
Figure 1. Disabling particular component of platform alerts If user enables it again, then it starts showing up future alerts that are associated with the component level in the UI. Disable or enable hardware alerts In hardware alerts, user can disable the entire iDRAC alerts or Monitor alerts or both. Steps: 1. 2. 3. 4. In the UI, go to the Settings. Select Notifications from drop-down. The Configure Alerts page is displayed. Click the CONFIGURE NOTIFICATIONS button.
Steps: 1. 2. 3. 4. In the UI, go to the Settings. Select Notifications from drop-down. The Configure Alerts page is displayed. Click the CONFIGURE NOTIFICATIONS button. A Configure Notification window is displayed. To disable the Platform Alerts or Hardware Alerts or both, switch Platform Alerts button, or Hardware Alerts button, or both buttons as shown in the following figure, and then click CLOSE to close the window. Figure 3.
If user enables it again, then it starts showing up future alerts that are associated with the hardware alerts and platform alerts in the UI. Service Monitoring Alert If any monitoring service is stopped or failed at any given time, the service monitoring alert is generated. Under Notifications, this alert is displayed in the monitor window of metro node inside. Initially, the severity of the alert is Warning. If the service is down for more than 5 minutes, then the severity is changed to Critical.
Figure 5. Selecting the Test Alerts 4. A confirmation window is displayed, select the type of alert-Platform, Monitor, and SMS. 5. A Test Alert Result window is displayed, click the CLOSE to close the window. Ensure that when you generate the test alerts, the previous test alerts are closed. If the previous test alerts are not closed, then those alerts can be closed from the live alerts listing page.
alert. User can also send the alert notifications to a specified email or SMTP server. To configure SMTP server, see System Configuration guide available at SolVe (https://solveonline.emc.com/solve/home/74). Configure emailing notification You can disable the notification emailing as a whole, or you can also disable the Email notifications for platform, iDRAC, or monitor alerts individually.
Supported events Supported platform event See the following table: Condition ID Severity Call home Alert name Descriptio n RCA Corrective action Event source Alert type 0x100001 CRITICAL True High Memory Usage Memory usage on this director exceeds the threshold. Either a Contact Dell DIRECTOR memory leak Customer has Support. occurred or the metro node has exceeded its configuratio n limits.
Condition ID Severity Call home Alert name Descriptio n RCA Corrective action Event source Alert type it is enabled. 2. Check the cable and the SFP, and ensure they are properly plugged in. 3. Check the switch if applicable, and ensure it is operational and the correspondi ng port is enabled. 4. If the link remains down, contact Dell Customer Support. 0x20001 CRITICAL True Array No Access Storage Array is not seen through this director. Storage Array is not reachable through this director.
Condition ID Severity Call home Alert name Descriptio n RCA Corrective action Event source Alert type switches, and array. Contact Dell Customer Support if the problem persists. 0x20004 WARNING True Unit Busy Condition 0x20005 WARNING True 0x20006 ERROR True The logical unit is busy more often than is normal and may impact performanc e. The array has returned the SCSI BUSY status to metro node I/O requests for this storagevolume more often than what is considered acceptable.
Condition ID Severity Call home Alert name 0x20007 WARNING True 0x20008 ERROR 0x20009 0x2000a 20 RCA Corrective action Event source Alert type Array Not Array Supported supports an SPC Version SPC version NOT matching 2, 3 or 4. Target device advertised a behavior which is not supported by the metro node initiator.
Condition ID Severity Call home Alert name Descriptio n RCA Corrective action Event source Alert type identifiers. If either of these conditions occur, do not use this array. The array name and version number should be reported to Dell Customer support. Continuing to use the array could lead to Data Unavailabilit y and Data Loss conditions and unreliable array behavior. 0x2000b ERROR True Logical Unit Changed Logical Unit mappings change detected.
Condition ID Severity Call home Alert name Descriptio n RCA Corrective action Event source Alert type changes in order to prevent LUN swapping from occurring. 0x2000c ERROR True Array No Failover Executor Director No suitable executor director to perform the failover of a group of Logical Units to a specific array controller. It is likely indicative of severely degraded hardware and/or fabric condition. Restore connectivity to the array controllers for all directors.
Condition ID Severity Call home Alert name Descriptio n Virtual Volume redundancy has changed. RCA Corrective action Event source Alert type already returned. It can lead to Data Unavailabilit y/Data Loss. condition on the target. One or more factors have contributed to changing the redundancy of the given virtual volume. Contact Dell VIRTUALVO Alarm Customer LUME Support for assistance with restoring the redundancy of the virtual volume.
Condition ID Severity Call home Alert name Descriptio n RCA Corrective action maintaining a list of changes to the reported distributed device. The specified mirror of the distributed device is marked out of date and rebuilt when possible. the accessibility of the underlying storagevolume of the mirror that is marked out of date. If necessary, to reconnect the inaccessible storagevolumes, take corrective actions. Contact Dell Customer Support for assistance.
Condition ID Severity Call home Alert name Descriptio n RCA Corrective action Event source Alert type , if storage devices from separate metro node systems with existing devices are merged into one metro node system or if a device becomes visible to both clusters and happens to have the same name as a clusterlocal device on the other cluster. 0x3000c ERROR True Virtual Volume Name Conflict Name conflict detected between two virtual volumes, renaming the second occurrence.
Condition ID Severity Call home Alert name Descriptio n RCA Corrective action Event source Alert type as a clusterlocal virtual volume on the other cluster. 0x3000d ERROR True Automatic Detach Disallowed Automatic detach of the given cluster from the given device is disallowed to preserve consistency and avoid data loss. The automatic detach of the given cluster from the given device was disallowed, in order to preserve consistency on the device and avoid losing data.
Condition ID Severity Call home Alert name Descriptio n RCA Corrective action cluster is doing I/O to the storage volume, data corruption is likely. the other cluster. 3. Remove visibility to the storage volume from the other cluster. Event source Alert type 0x30010 WARNING True Mirror Marked Out Of Date A mirror of a RAID 1 device has been marked fully out of date.
Condition ID Severity Call home Alert name Descriptio n RCA Corrective action and the changes can be written successfully to disk, the changes are lost. The system configuratio n information that is associated with those metadata write not written to the disk may be lost. volume cannot be restored to 'ok' state, create and move to a new metadata volume as soon as possible.
Condition ID Severity Call home Alert name Descriptio n RCA Corrective action Event source Alert type investigate the cause of the performanc e degradation. 3. Compare the storagevolume latency stats to the latency on the storage array for the one or more volumes in question. If the latency on the array is not as high investigate the fabrics between the storage array and metro node. 4. If the issue persists and unable to determine the cause engage Dell Customer Service.
Condition ID Severity Call home Alert name Descriptio n RCA Corrective action Event source Alert type full rebuild occurs. This condition is necessary to avoid data corruption. 0x30015 CRITICAL True Device Bad Config Metadata persisted to the metadata volume relating to the device, or storage volume in question has been detected to be inconsistent . Access to the device or storage volume has been disabled until the problem can be examined and remedied.
Condition ID Severity Call home Alert name Descriptio n RCA Corrective action it was created. it was created. back-end devices to their original size. 3. Contact Dell Customer Support. Event source Alert type 0x30018 CRITICAL True Active Metadata Volume Unhealthy The active metadata volume has become unhealthy and is at risk. The active metavolume has become unhealthy and is at risk. It is in cache only, and must be written to storage volume.
Condition ID Severity Call home Alert name Descriptio n RCA Corrective action Event source Alert type Support to restore from the backup. 0x3001a WARNING True Storage Volume Latency Degraded Storage volume I/O latency has increased above an acceptable threshold. The average I/O latency on a storage volume has exceeded the acceptable limit, likely due to increased latency on the backend storage array or fabrics between the metro node and storage array. 1.
Condition ID Severity Call home Alert name Descriptio n RCA Corrective action Event source Alert type latency on the cluster where the remote device resides, and investigate further as needed. 2. Create storagevolume performanc e monitors in Vplexcli to investigate individual storagevolume latency stats as must further investigate the cause of the performanc e degradation. 3. Compare the storagevolume latency stats to the latency on the storage array for the one or more volumes in question.
Condition ID 0x3001d 34 Severity INFO Events and Alerts Call home False Alert name Metadata Copy Succeeded Descriptio n RCA Corrective action device has been isolated due to severe performanc e degradation of its storage volume components . volumes supporting this mirror are performing poorly, causing severe degradation in the RAID-1 performanc e. To improve the RAID-1 performanc e through the healthy legs, the IOs to the poorly performing mirror leg have been blocked.
Condition ID Severity Call home Alert name Descriptio n RCA Corrective action metadata to has been a metadata written out volume. to the given metadata volume. No action is required. Event source Alert type 0x3001e CRITICAL False Metadata Copy Failed Failed to copy inmemory metadata to a metadata volume. An attempt to write out all inmemory metadata to a metadata volume failed.
Condition ID Severity Call home Alert name Descriptio n RCA Corrective action reporting cluster. reporting cluster. Check that the distributeddevices and consistency -groups are running on the winning cluster. A winner must be manually chosen. 1. Check for problems with the network link to the indicated cluster. 2. Check the equipment at the indicated cluster for malfunction s. Check the managemen t network cables and the correspondi ng managemen t modules.
Condition ID Severity Call home Alert name Descriptio n RCA Corrective action the cluster. Also a communicat ions link could have been brought up by mistake, allowing a cluster of a different version to be visible. firmware was inserted into the cluster and booted, then shut down that director. If the situation arose because a communicat ions link has been brought up by mistake, then take down that communicat ions link.
Condition ID Severity Call home Alert name Descriptio n RCA Corrective action Event source Alert type Support if problem persists. 0x90002 CRITICAL True Possible Stuck I/O detected on virtual volume Possible Stuck I/O detected on virtual volume. An I/O failed to complete or be properly aborted and cleaned up. Contact Dell VIRTUALVO Operational Customer LUME Support for assistance.
Condition ID Severity Call home Alert name Descriptio n RCA Corrective action Event source Alert type allocation failure is occurred. Contact Dell Customer Support. 0x90006 ERROR True SCSI Xcopy command failed on virtual volume due to an internal memory allocation issue Run the collectdiagnostics utility to collect system information to determine why an internal firmware memory allocation failure occurred. Contact Dell Customer Support.
Condition ID Severity Call home Alert name Descriptio n RCA Corrective action and Write Command. the maximum transfer size advertised by the metro node. Consult the Troubleshoo ting Entry in the VAAI section of SolVe Desktop for this event. If problem persists, contact Dell Customer Support.
Condition ID Severity Call home Alert name Descriptio n view is disabled view is disabled. RCA Corrective action Event source Alert type true. Refer to the metro node CLI guide. 0x9000d WARNING True Failed to process a Write Same (16) command on volume as writesame-16enabled attribute is disabled on the storage view Failed to process a Write Same (16) command on volume as writesame-16enabled attribute is disabled on the storage view.
Condition ID Severity Call home Alert name Descriptio n RCA Corrective action Event source took too long took too long. all metro node volumes used by that application are in a healthy state. The applications affected will need to go through their recovery process. It is an SYSTEM informationa l event only. No further action is required. Alert type 0xb0002 INFO True All Failure Recovery Complete Failure recovery has completed for all volumes. Failure recovery has completed for all volumes.
Condition ID Severity Call home Alert name Descriptio n RCA Corrective action Consistency failure, the Group Failed configured winner settings on a consistency group were not able to come into effect, and I/O remains suspended on both clusters. automatic detach on the given consistency group, in order tpreserve consistency on the volumes in the set and avoid losing data.
Condition ID Severity Call home Alert name Descriptio n RCA Corrective action not written to the nonvolatile file system occurred and data was not written to the nonvolatile file system occurred to the nonvolatile file system and data was not written to disk. to the metavolume failed. If the failure cannot be corrected, create a new metavolume and copy the inmemory data to the new metavolume. Contact Dell Customer Support for assistance if necessary.
Condition ID Severity Call home Alert name Descriptio n RCA Corrective action Event source Alert type Check the WAN COM or LOCAL COM path that was disconnecte d, then check the switch logs for errors that will help pinpoint the root cause. If errors point to hardware issue check/ clean/ replace the cables and SFPs along the path. Engage Dell Customer Support if unable to determine the root cause.
Condition ID Severity Call home Alert name Descriptio n RCA Corrective action Event source Alert type zoning configuratio n apply. If unable to resolve the issue engage Dell Customer Support. 0x15000d ERROR True When connecting to a switch, the port received a non-speccompliant responseindi cating that the switch does not support any protocol version that metro node supports.
Condition ID Severity Call home Alert name Descriptio n RCA diagnostics for chip vendor analysis. diagnostics for chip vendor analysis. diagnostics for chip vendor analysis. Corrective action Event source Alert type 0x150018 WARNING True An attempt to communicat e with the switch has timed out. An attempt to communicat e with the switch has timed out. It likely indicates either a physical communicat ion issue with the switch or a misbehaving switch.
Condition ID Severity Call home Alert name Descriptio n RCA Corrective action reset from Dell Customer Support. reset from Dell Customer Support. and no reset the automated chip. recovery is possible. The chip may now be unresponsiv e resulting in stuck IO. The chip must be manually reset by Dell Customer Support, and if that fails the director must be rebooted to recover from this issue.
Condition ID Severity Call home Alert name Descriptio n RCA that the switch does not support the default protocol version that metro node supports. However, the switch responded with an older protocol version that metro node does support. e with the switch. Corrective action Event source Alert type 0x70001 Warning False Cluster Witness Disabled Cluster Witness is disabled. This event is generated when Cluster Witness is administrati vely disabled.
Condition ID Severity Call home Alert name Descriptio n RCA Corrective action Event source Alert type the Problem persists, contact Dell EMC Customer Support. 0x70003 Warning False CW Cluster Partition Guidance Communicat ion between clusters is broken. Cluster Witness Server has detected and reported an inter-cluster partition. This marks the loss of connectivity between the remote cluster and the reporting cluster.
Condition ID Severity Call home Alert name Descriptio n RCA Corrective action Event source Alert type Customer Support. 0x70005 Error True CW Cluster Isolation Guidance Or No Guidance CW Cluster Isolation Guidance Or No Guidance The cluster reporting this event has been unable to receive any guidance from the Cluster Witness Server for the last 10 seconds. This may be due to failure of the Cluster Witness Server or loss of network connectivity .
Condition ID Severity Call home Alert name Descriptio n RCA Corrective action communicat ion with Cluster Witness Server. This may be due to the failure of the server or loss of network connectivity . Cluster Witness Server. Check whether Cluster Witness Server is running. If connectivity is lost from other directors, disable the Cluster Witness Server until connectivity is restored in order to prevent data unavailabilit y on cluster partition.
Condition ID Severity Call home Alert name Descriptio n RCA Corrective action Event source Alert type replace the SFP. 0xa0002 Critical True SFP Rx Power Low A port RX power is below the warning or alarm threshold. A port's RX power is below the warning or alarm threshold. The INTERFACE hardware attached to this port must be carefully investigated , and the switch port SFP and cable should be reseated, cleaned, and swapped as needed.
Condition ID Severity Call home Alert name Descriptio n RCA Corrective action Event source Alert type determine the source of the corrupt packet. Contact Dell EMC Customer Support for assistance. 0x1300c8 ERROR True Path Indictment A tcpcom path has been indicted. A tcpcom path has been indicted. The system COMMUNIC Operational should ATIONSPAT recover H automaticall y. However, the underlying network and hardware must be investigated to determine the cause of the error.
Condition ID Severity Call home Alert name Descriptio n RCA Corrective action message length. message length. must be investigated to determine the cause of the invalid packets. Contact DELL EMC Customer Support for assistance. Event source Alert type 0x1300cb ERROR True Path Indictment Timeout A tcpcom path has been indicted due to a timeout. A tcpcom path has been indicted due to a timeout. The system COMMUNIC Operational should ATIONSPAT recover H automaticall y.
Condition ID Severity Call home Alert name Descriptio n RCA Corrective action Event source Alert type question one at a time by disabling and reenabling the port in the / clusters/ cluster-x/ directors/ director-x/ ports context of VPlexcli may relieve the issue. 0x170193 WARNING True Port High I/ O Error Rate {vportName }: In the last {seconds} seconds at least {numLogins } logins observed a high I/O failure rate.
Condition ID Severity Call home Alert name Descriptio n RCA Corrective action Event source 0x1701f5 WARNING True Login High I/O Error Rate {vportName }: In the last {seconds} seconds {errorPerce nt}% of I/O failed on login (npid {npid}, wwpn {wwpn}) Either there are frame drop issues on the fabric or there is an internal issue in the metro node.
Condition ID 0x180002 Severity ERROR Call home True Alert name Descriptio n RCA Corrective action Sustained Success encountered an I/O failure due to retry exhaustion after multiple consecutive I/O completions. I/Os to a given disk failed after retry exhaustion. There might be faulty hardware (cable, backend switch, array). array's BE disk health, LUN masking, array configuratio n and physical connection. If the problem persists, contact Dell EMC Customer Support.
Supported iDRAC events Supported hardware Ports to metro node port-mapping events The following table is generated through pulling out the cables from the system in real time. For more details about iDRAC alerts, see https://qrl.dell.com/LCDError/Lookup. Condition ID (Platform Alerts) HW Label PortRole UDEV (metro node) VS5 EndUser (UI/CLI) PortName Physical Physical Message Port location Port location (Controller (Port ID) ID) FC102 FC1 front-end - IO-00 FC.Slot.
Condition ID (Platform Alerts) HW Label PortRole UDEV (metro node) VS5 EndUser (UI/CLI) PortName Physical Physical Message Port location Port location (Controller (Port ID) ID) connected or the FC device is not functioning. 0x110001 LCOM1 local-com LC-00 LC-00 NIC.Integrate d.1-1-1 1 The Integrated NIC 1 Port 1 network link is down. 0x110001 LCOM2 local-com LC-01 LC-01 NIC.Integrate d.1-2-1 2 The Integrated NIC 1 Port 2 network link is down. 0x110001 WAN1 wan-com WC-00 WC-00 NIC.
Condition ID Severity Call home Alert name Description Event source 0x8A000111 Error True Sms Automated Meta Volume Backup Failed The automated backup of the metavolume could not be completed: {exception}. METAVOLUME 0x8A4a61F6 Error True SMS_HOST_CERTIFICAT The host certificate expires within a E_30_DAYS_UNTIL_EXPI month. RATION CERTIFICATES 0x8A4a61F7 Error True SMS_CA_CERTIFICATE_ 30_DAYS_UNTIL_EXPIR ATION The host certificate expires within a month.