After HDFS Disk Replacement with Multiple Disk Failures bdacheckhw Raises "Wrong slot mapping to HBA target" Errors for Slots After the Replaced Disk and "ERROR: Wrong number of virtual disks : 13" (Doc ID 2227128.1)

Last updated on JANUARY 27, 2017

Applies to:

Big Data Appliance X4-2 Hardware - Version All Versions and later
Linux x86-64

Symptoms

The general scenario leading to this condition is that more than one hdfs disk reports a non-healthy state and the corresponding non-healthy disks have been replaced.

The example scenario used in this note is that the disk is slot 2 was in a "bad"/"failing" state and the disk in slot 3 was "Unconfigured(good) as per the "MegaCli64 pdlist a0" output below:

Slot Number: 2
...
Firmware state: Unconfigured(bad)
...
Foreign State: Foreign

and

Slot Number: 3
...
Firmware state: Unconfigured(good), Spun Up
...
Foreign State: Foreign

In the example here, after replacing the disk in "Slot Number: 2" the disks were configured in the order "lowest" to "highest" i.e. "Slot Number: 2" then "Slot Number: 3" as per: Steps for Replacing a Disk Drive and Determining its Function on the Oracle Big Data Appliance V2.2.*/V2.3.1/V2.4.0/V2.5.0/V3.x/V4.x (Doc ID 1581331.1).  The final steps of the process are to run bdacheckhw and bdachecksw.  However in the case here, both fail. 

bdacheckhw fails with two main issues:

1. 13 Virtual Disks are reported.
and
2. Disks in slots 3-11, the slots after the one where the disk was replaced, are not mapped correctly.

bdacheckhw output looks like:

SUCCESS: Correct disk 0 status : Online, Spun Up No alert
SUCCESS: Correct disk 1 status : Online, Spun Up No alert
SUCCESS: Correct disk 2 status : Online, Spun Up No alert
SUCCESS: Correct disk 3 status : Online, Spun Up No alert
SUCCESS: Correct disk 4 status : Online, Spun Up No alert
SUCCESS: Correct disk 5 status : Online, Spun Up No alert
SUCCESS: Correct disk 6 status : Online, Spun Up No alert
SUCCESS: Correct disk 7 status : Online, Spun Up No alert
SUCCESS: Correct disk 8 status : Online, Spun Up No alert
SUCCESS: Correct disk 9 status : Online, Spun Up No alert
SUCCESS: Correct disk 10 status : Online, Spun Up No alert
SUCCESS: Correct disk 11 status : Online, Spun Up No alert
...
ERROR: Wrong number of virtual disks : 13
INFO: Expected number of virtual disks : 12
SUCCESS: Correct slot 0 mapping to HBA target : 0
SUCCESS: Correct slot 1 mapping to HBA target : 1
SUCCESS: Correct slot 2 mapping to HBA target : 2
ERROR: Wrong slot 4 mapping to HBA target : 3
INFO: Expected slot 4 mapping to HBA target : 4
ERROR: Wrong slot 5 mapping to HBA target : 4
INFO: Expected slot 5 mapping to HBA target : 5
ERROR: Wrong slot 6 mapping to HBA target : 5
INFO: Expected slot 6 mapping to HBA target : 6
ERROR: Wrong slot 7 mapping to HBA target : 6
INFO: Expected slot 7 mapping to HBA target : 7
ERROR: Wrong slot 8 mapping to HBA target : 7
INFO: Expected slot 8 mapping to HBA target : 8
ERROR: Wrong slot 9 mapping to HBA target : 8
INFO: Expected slot 9 mapping to HBA target : 9
ERROR: Wrong slot 10 mapping to HBA target : 9
INFO: Expected slot 10 mapping to HBA target : 10
ERROR: Wrong slot 11 mapping to HBA target : 10
INFO: Expected slot 11 mapping to HBA target : 11
ERROR: Wrong slot 3 mapping to HBA target : 11
INFO: Expected slot 3 mapping to HBA target : 3
SUCCESS: Correct Host Channel Adapter model : Mellanox Technologies MT27500 Family [ConnectX-3]

Other symptoms include:

1. From lsscsi: 

  

Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms