ODA : Old Disk Path Information Still Exists in ASM Outputs Will Prevent Adding a Replacement Disk (Doc ID 1637898.1)

Last updated on NOVEMBER 12, 2023

Applies to:

Oracle Database Appliance - Version All Versions and later
Information in this document applies to any platform.

Symptoms

After a disk replacement in an ODA machine , old disk information still exists in environment.
This can prevent successfully replacing the older / bad disk with a new disk.

The OLD disk information may exist on one node but not the other.
Either node can see the mismatch.
Sometimes the problem is MISSING information for the new disk.

Regardless: you can get one or more of the following symptoms for the old disk information on one or both nodes.

1. Output of v$asm_disk will still show old disk path with group_number=0 . < group 0 means the disk belongs to an unowned diskgroup. meaning <does not belong to... Data,Reco,Flash or the Redo diskgroup

2. vgs output will still show IO errors for old disk.

3. /etc/multipath.conf will show old disk information.

4. /dev/mapper/* may show some or all of the old disk information.

5. /dev will also show device information.

6. /dev/mapper may have partial information or multiple entries for partitions for the disk slot. e.g. You may see partition p2 for the disk but not p1. Also, possibly multiple entries p1 and / or p2 for the same slot

7. /var/log/message will also record IO errors.

8. ASM alert logs also display IO errors.

9. /opt/oracle/extapi/asmappl.config may not have references to the disk partition(s) or not seen by one of the enclosures.

Along with entries for the old disk , new disk information can also available and it may co-exist for the same diskgroups.

*****These symptoms will be on either or both the nodes.*****

Example : Multiple disk references for the same
Slot (03) ...875 vs. ...843

------------

Check in /dev/mapper to check old information -->>

[root@NODE1 mapper]# ls -altr *S03*
brw-rw---- 1 grid asmadmin 253, 24 Mar 27 2013 HDD_E1_S03_#########5      << notice that there are multiple named entries for the disk in Enclosure 1 (E1) and Slot S03
brw-rw---- 1 grid asmadmin 253, 30 Mar 7 13:23 HDD_E1_S03_#########5p1    << ...875p*
brw-rw---- 1 grid asmadmin 253, 31 Mar 7 13:30   HDD_E1_S03_#########5p2    << ...875p*
brw-rw---- 1 grid asmadmin 253, 72 Mar 10 14:19 HDD_E1_S03_#########3
brw-rw---- 1 grid asmadmin 253, 74 Mar 10 16:29 HDD_E1_S03_#########3p2   << ...843p*
brw-rw---- 1 grid asmadmin 253, 73 Mar 10 16:29 HDD_E1_S03_#########3p1 << ...843p*

Disk pd_03 was replaced in this environment but the "ls" output from /dev/mapper is still showing old and new disk information.

PATH HDD_E1_S03_#########5 is for old disk and PATH HDD_E1_S03_#########3 is representing new disk.
- Check the Data and Timestamps for another possibly indicator of age confirming the older vs. newer disk(s).

Output of "vgs" will show IO errors like given below -->>
Notice that the errors are for the older disk name ..2875

[root@NODE1 mapper]# vgs
/dev/mpath/HDD_E1_S03_#########5: read failed after 0 of 4096 at 600127176704: Input/output error
/dev/mpath/HDD_E1_S03_#########5: read failed after 0 of 4096 at 600127258624: Input/output error
/dev/mpath/HDD_E1_S03_#########5: read failed after 0 of 4096 at 0: Input/output error
/dev/mpath/HDD_E1_S03_#########5: read failed after 0 of 4096 at 4096: Input/output error
/dev/mpath/HDD_E1_S03_#########5p1: read failed after 0 of 4096 at 515396009984: Input/output error
/dev/mpath/HDD_E1_S03_#########5p1: read failed after 0 of 4096 at 515396067328: Input/output error
/dev/mpath/HDD_E1_S03_#########5p1: read failed after 0 of 4096 at 0: Input/output error
/dev/mpath/HDD_E1_S03_#########5p1: read failed after 0 of 4096 at 4096: Input/output error
/dev/mpath/HDD_E1_S03_#########5p2: read failed after 0 of 512 at 84726382592: Input/output error
/dev/mpath/HDD_E1_S03_#########5p2: read failed after 0 of 512 at 84726472704: Input/output error
/dev/mpath/HDD_E1_S03_#########5p2: read failed after 0 of 512 at 0: Input/output error
/dev/mpath/HDD_E1_S03_#########5p2: read failed after 0 of 512 at 4096: Input/output error
/dev/mpath/HDD_E1_S03_#########5p2: read failed after 0 of 2048 at 0: Input/output error
VG #PV #LV #SN Attr VSize VFree
VolGroupSys 1 4 0 wz--n- 465.66G 251.66G

These error messages are related with old disk.

ASM alert log error message :--

Mon Mar 10 13:59:39 2014
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_pz99_29322.trc:
ORA-27061: waiting for async I/Os failed
Linux-x86_64 Error: 5: Input/output error
Additional information: -1
Additional information: 4096
WARNING: Read Failed. group:0 disk:51 AU:0 offset:0 size:4096
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_pz99_29322.trc:
ORA-27061: waiting for async I/Os failed
Linux-x86_64 Error: 5: Input/output error
Additional information: -1
Additional information: 4096
ORA-15080: synchronous I/O operation to a disk failed

V$ASM_DISK output :--

SQL> select path, name, header_status, mode_status, mount_status, state, failgroup, group_number from v$asm_disk order by path;

PATH                                           NAME                       HEADER_STATUS    MODE_ST    MOUNT_S     STATE       FAILGROUP                              GROUP_NUMBER
-------------------------------------      -----------------          ---------------- ----------    ------------     ---------    ----------------      ------------------------ ------------
...
/HDD_E0_S19_#########p1    HDD_E0_S19_#########P1      MEMBER       ONLINE    CACHED       NORMAL    HDD_E0_S19_#########P1      2 << New disk already added back to diskgroup 2
/HDD_E0_S19_#########p1    HDD_E0_S19_#########p1      MEMBER       ONLINE    CLOSED       NORMAL                                                         0   << Group #0 (Old disk no longer belongs to a group)
... etc

Comment:

In this example: the disk is in slot 19. For this hardware type it is an HDD. There are also SSDs in the same slot for some hardware types. Pay attention to the disk type, diskgroup, disk name, and partition#.
Disk partition information (p#) is Data = p1, Redo = p1, and Flash = p1 vs. Reco which is p2 on the shared disk with data

***Where Group 0 means the disk does not belong to any of the expected disk groups.

Changes

Failed disk is was replaced and a new disk was inserted but the new disk was not successfully added to the disk group(s).

Cause

	To view full details, sign in with your My Oracle Support account.
	Don't have a My Oracle Support account? Click to get started!

My Oracle Support provides customers with access to over a million knowledge articles and a vibrant support community of peers and Oracle experts.

ODA : Old Disk Path Information Still Exists in ASM Outputs Will Prevent Adding a Replacement Disk (Doc ID 1637898.1)

Applies to:

Symptoms

Changes

Cause

To view full details, sign in with your My Oracle Support account.

Don't have a My Oracle Support account? Click to get started!