Users Guide
Table Of Contents
- Dell EMC PowerEdge Servers Troubleshooting Guide
- Contents
- Introduction
- Diagnostic indicators
- Status LED indicators
- System health and system ID indicator codes
- iDRAC Quick Sync 2 indicator codes
- iDRAC Direct LED indicator codes
- NIC indicator codes
- Power supply unit indicator codes
- Non-redundant power supply unit indicator codes
- Hard drive indicator codes
- uSATA SSD indicator codes
- Internal dual SD module indicator codes
- Running diagnostics
- Troubleshooting hardware issues
- Troubleshooting system startup failure
- Troubleshooting external connections
- Troubleshooting the video subsystem
- Troubleshooting a USB device
- Troubleshooting a serial Input Output device
- Troubleshooting a NIC
- Troubleshooting a wet system
- Troubleshooting a damaged system
- Troubleshooting the system battery
- Troubleshooting cooling problems
- Troubleshooting cooling fans
- Troubleshooting an internal USB key
- Troubleshooting a micro SD card
- Troubleshooting expansion cards
- Troubleshooting processors
- Troubleshooting a storage controller
- OMSA flagging PERC driver
- Importing or clearing foreign configurations using the foreign configuration view screen
- Importing or clearing foreign configurations using the VD mgmt menu
- RAID controller L1, L2 and L3 cache error
- PERC controllers do not support NVME PCIe drives
- 12 Gbps hard drive does not support in SAS 6ir RAID controllers
- Hard drives cannot be added to the existing RAID 10 Array
- PERC battery discharging
- PERC battery failure message is displayed in ESM log
- Creating non-raid disks for storage purpose
- Firmware or Physical disks out-of-date
- Cannot boot to Windows due to foreign configuration
- Offline or missing virtual drives with preserved cache error message
- Expanding RAID array
- LTO-4 Tape drives are not supported on PERC
- Limitations of HDD size on H310
- System logs show failure entry for a storage controller even though it is working correctly
- Troubleshooting hard drives
- Troubleshooting an optical drive
- Troubleshooting a tape backup unit
- Troubleshooting no power issues
- Troubleshooting power supply units
- Troubleshooting RAID
- RAID configuration using PERC
- RAID configuration using OpenManage Server Administrator
- RAID configuration by using Unified Server Configurator
- Downloading and installing the RAID controller log export by using PERCCLI tool on ESXi hosts on Dell’s 13th generation of PowerEdge servers
- Configuring RAID by using Lifecycle Controller
- Starting and target RAID levels for virtual disk reconfiguration and capacity expansion
- Replacing physical disks in RAID1 configuration
- Thumb rules for RAID configuration
- Reconfiguring or migrating virtual disks
- Foreign Configuration Operations
- Viewing Patrol Read report
- Check Consistency report
- Virtual disk troubleshooting
- Rebuilding of virtual disk does not work
- Rebuilding of virtual disk completes with errors
- Cannot create a virtual disk
- A virtual disk of minimum size is not visible to Windows Disk Management
- Virtual disk errors on systems running Linux
- Problems associated with using the same physical disks for both redundant and nonredundant virtual disks
- Enable the alarm on PERC 5/E adapter to alert in case of physical disk failures
- RAID controller displays multibit ECC errors
- PERC goes offline with an error message
- Reconfiguring the RAID level and virtual disks
- Lost shared storage access
- Troubleshooting memory or battery errors on the PERC controller on Dell PowerEdge servers
- Slicing
- RAID puncture
- Troubleshooting thermal issue
- Input/Output errors while reseating SAS IOM storage sled on hardware configurations
- Server management software issues
- What are the different types of iDRAC licenses
- How to activate license on iDRAC
- Can I upgrade the iDRAC license from express to enterprise and BMC to express
- How to find out missing licenses
- How to export license using iDRAC web interface
- How to set up e-mail alerts
- System time zone is not synchronized
- How to set up Auto Dedicated NIC feature
- How to configure network settings using Lifecycle Controller
- Assigning hot spare with OMSA
- Storage Health
- How do I configure RAID using operating system deployment wizard
- Foreign drivers on physical disk
- Physical disk reported as Foreign
- How to update BIOS on 13th generation PowerEdge servers
- Why am I unable to update firmware
- Which are the operating systems supported on Dell EMC PowerEdge servers
- Unable to create a partition or locate the partition and unable to install Microsoft Windows Server 2012
- JAVA support in iDRAC
- How to specify language and keyboard type
- Message Event ID - 2405
- Installing Managed System Software On Microsoft Windows Operating Systems
- Installing Managed System Software On Microsoft Windows Server and Microsoft Hyper-V Server
- Installing Systems Management Software On VMware ESXi
- Processor TEMP error
- PowerEdge T130, R230, R330, and T330 servers may report a critical error during scheduled warm reboots
- SSD is not detected
- OpenManage Essentials does not recognize the server
- Unable to connect to iDRAC port through a switch
- Lifecycle Controller is not recognizing USB in UEFI mode
- Guidance on remote desktop services
- Troubleshooting operating system issues
- How to install the operating system on a Dell PowerEdge Server
- Locating the VMware and Windows licensing
- Troubleshooting blue screen errors or BSODs
- Troubleshooting a Purple Screen of Death or PSOD
- Troubleshooting no boot issues for Windows operating systems
- No POST issues in iDRAC
- Troubleshooting a No POST situation
- Migrating to OneDrive for Business using Dell Migration Suite for SharePoint
- Windows
- Installing and reinstalling Microsoft Windows Server 2016
- FAQs
- Why are the USB keyboard and mouse not detected during the Windows Server 2008 R2 SP1 installation
- Why does the installation wizard stop responding during the Windows OS installation
- Why does Windows OS installation using Lifecycle Controller, on PowerEdge Servers fail at times with an error message
- Why does Windows Server 2008 R2 SP1 display a blank screen in UEFI mode after installation
- Symptoms
- Troubleshooting system crash at cng.sys with watchdog Error violation
- Host bus adapter mini is missing physical disks and backplane in Windows
- Converting evaluation OS version to retail OS version
- Partitions on disk selected for installation of Hyper-V server 2012
- Install Microsoft Hyper-V Server 2012 R2 with the Internal Dual SD module
- VMware
- Linux
- Installing operating system through various methods
- Getting help
method is that while the array has a RAID puncture in it, uncorrectable errors will continue to be encountered whenever the
impacted data (if any) is accessed.
A RAID puncture can occur in the following three locations:
● In blank space that contains no data. That stripe will be inaccessible, but since there is no data in that location, it will have no
significant impact. Any attempts to write to a RAID punctured stripe by an OS will fail and data will be written to a different
location.
● In a stripe that contains data that isn't critical such as a README.TXT file. If the impacted data is not accessed, no errors
are generated during normal I/O. Attempts to perform a file system backup will fail to backup any files impacted by a RAID
puncture. Performing a Check Consistency or Patrol Read operations will generate Sense code: 3/11/00 for the applicable
LBA and/or stripes.
● In data space that is accessed. In such a case, the lost data can cause a variety of errors. T he errors can be minor errors
that do not adversely impact a production environment. The errors can also be more severe and can prevent the system
from booting to an operating system, or cause applications to fail.
An array that is RAID punctured will eventually have to be deleted and recreated to eliminate the RAID puncture. This procedure
causes all data to be erased. The data would then need to be recreated or restored from backup after the RAID puncture is
eliminated. The resolution for a RAID puncture can be scheduled for a time that is more advantageous to needs of the business.
If the data within a RAID punctured stripe is accessed, errors will continue to be reported against the affected bad LBAs with
no possible correction available. Eventually (this could be minutes, days, weeks, months, and so on), the Bad Block Management
(BBM) Table will fill up causing one or more drives to become flagged as predictive failure. As seen in the figure, drive 0 will
typically be the drive that gets flagged as predictive failure due to the errors on drive 1 and drive 2 being propagated to it. Drive
0 may actually be working normally, and replacing drive 0 will only cause that replacement to eventually be flagged predictive
failure as well.
A Check Consistency performed after a RAID puncture is induced will not resolve the issue. This is why it is very important to
perform a Check Consistency on a regular basis. It becomes especially important prior to replacing drives, when possible. The
array must be in an optimal state to perform the Check Consistency.
A RAID array that contains a single data error in conjunction with an additional error event such as a hard drive failure causes
a RAID puncture when the failed or replacement drive is rebuilt into the array. As an example, an optimal RAID 5 array includes
three members: drive 0, drive 1 and drive 2. If drive 0 fails and is replaced, the data and parity remaining on drives 1 and 2 are
used to rebuild the missing information on to the replacement drive 0. However, if a data error exists on drive 1 when the rebuild
operation reaches that error, there is insufficient information within the stripe to rebuild the missing data in that stripe. Drive 0
has no data, drive 1 has bad data and drive 2 has good data as it is being rebuilt. There are multiple errors within that stripe.
Drive 0 and drive 1 do not contain valid data, so any data in that stripe cannot be recovered and is therefore lost. The result as
shown in Figure 3 is that RAID punctures (in stripes 1 and 2) are created during the rebuild. The errors are propagated to drive
0.
Figure 24. RAID punctures
Puncturing the array restores the redundancy and returns the array to an optimal state. This provides for the array to be
protected from additional data loss in the event of additional errors or drive failures.
How to fix a RAID puncture
Issue:
How to fix RAID arrays that have been subjected to a puncture?
Solution: Complete the following steps to resolve the issue:
Troubleshooting hardware issues 89