Exadata: Failed to Start Reconstruction of Virtual Drive - Making optimal the logical group used by the Linux OS (Doc ID 1921079.1)

Last updated on AUGUST 28, 2014

Applies to:

Oracle Exadata Storage Server Software - Version 11.2.1.2.0 to 12.1.1.1.1 [Release 11.2 to 12.1]
Information in this document applies to any platform.

Symptoms

 

When process reclaimdisks.sh fails or is interrupted, the Exadata compute node will be left with an unexpected configuration.
The Logical group 0 (RAID 1) used by Linux partition is in degraded mode with only one disk (slot 0).
There is a new Logical group 1 (RAID 5) ,  using disks in slot 1, 2 and 3.

In order to execute the reclaimdisks procedure again, once the cause of the interruption or failure  has been found and fixed, the logical group 0
has to be restored to optimal state. This document describes the failure  during this process.

This document will assist your case if following symptoms are present:

 

  
Execution of /opt/oracle.SupportTools/reclaimdisks.sh -check, reports the logical group 0 used by Linux, has only one member.

2014-08-16 03:14:31 -0700  Started from /opt/oracle.SupportTools/reclaimdisks.sh
2014-08-16 03:14:31 -0700  [INFO] Free mode is set
2014-08-16 03:14:31 -0700  [INFO] Reclaim mode is set
2014-08-16 03:14:31 -0700  [INFO] This is SUN FIRE X4170 M2 SERVER machine
2014-08-16 03:14:31 -0700  [INFO] Number of LSI controllers: 1
2014-08-16 03:14:31 -0700  [INFO] Physical disks found: 4 (252:0 252:1 252:2 252:3)
2014-08-16 03:14:31 -0700  [INFO] Logical drives found: 2
2014-08-16 03:14:31 -0700  [INFO] Linux logical drive: 0
2014-08-16 03:14:31 -0700  [INFO] RAID Level for the Linux logical drive: 1
2014-08-16 03:14:31 -0700  [INFO] Dual boot installation: yes
2014-08-16 03:14:31 -0700  [INFO] LVM based installation: yes
2014-08-16 03:14:31 -0700  [INFO] Physical disks in the Linux logical drive: 1 (252:0)
2014-08-16 03:14:31 -0700  [INFO] Dedicated Hot Spares for the Linux logical drive: 0
2014-08-16 03:14:32 -0700  [INFO] Global Hot Spares: 0
2014-08-16 03:14:33 -0700  [ERROR] For X2-2 db node expected RAID 1 from 2 physical disks with no dedicated and no global hot spare
  

 

 

  
This information is provided by command  /opt/MegaRAID/MegaCli/MegaCli64 -cfgdsply -a0 |more

DISK GROUP: 0

Number of Spans: 1
SPAN: 0
Span Reference: 0x00
Number of PDs: 2
Number of VDs: 1
Number of dedicated Hotspares: 0
Virtual Drive Information:
Virtual Drive: 0 (Target Id: 0)
Name                :DBSYS
RAID Level          : Primary-1, Secondary-0, RAID Level Qualifier-0
Size                : 278.875 GB
Sector Size         : 512
Is VD emulated      : No
Mirror Data         : 278.875 GB
State               : Degraded
Strip Size          : 1.0 MB
Number Of Drives    : 2
Span Depth          : 1
  

 

Physical Disk:1

If sundiag is available use command  'grep GROUP megacli64-CfgDsply.out'  or command '/opt/MegaRAID/MegaCli/MegaCli64 -cfgdsply -a0|grep GROUP'. If two logical groups (DISK GROUPS) are present willl report:

Number of DISK GROUPS: 2
DISK GROUP: 0
DISK GROUP: 1

  

 

 

  
Adapter: 0: Failed to replace Missing PD at Array 0, Row 1.

FW error description:
The specified physical disk doesn't have enough capacity to complete the requested command.
  

The complete procedure is:

1 Drop the Logical Group 1 (RAID 5):

  # /opt/MegaRAID/MegaCli/MegaCli64 -CfgLdDel -L1 -Force -a0   -- in older versions Force may not needed.

2. Validate the information on the logical group 0 (Linux), related to the original member that was removed.

/opt/MegaRAID/MegaCli/MegaCli64 -pdgetmissing -a0

3. Add back the disk from slot 1 to logical group 0

/opt/MegaRAID/MegaCli/MegaCli64 -pdreplacemissing -physdrv [252:1] -array0 -row1 -a0

The arguments are obtained from command in point 2 where:

array 0 : Logical group 0
row 1   : physical disk 1
a0       : the disk array controller.

4. Start the rebuild of the logical group 0 (RAID 1)

# /opt/MegaRAID/MegaCli/MegaCli64 -pdrbld -start -physdrv [252:1] -a0

5. Check the progress

# /opt/MegaRAID/MegaCli/MegaCli64 -pdrbld -showprog -physdrv[252:1] -a0

 

 

Changes

 reclaimdisks.sh was executed in order to reclaim the unused disks.

Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms