Grid Upgrade to 19c Fails During rootupgrade.sh. Nodes Reboot, Voting File Relocates, ASM Listener, and ASM Don't Start Automatically. (Doc ID 2651972.1)

Last updated on OCTOBER 18, 2023

Applies to:

Oracle Database - Enterprise Edition - Version 19.1.0.0.0 and later
Information in this document applies to any platform.

Symptoms

Upgrading from 12.1.0.2 to 19.5

Script rootupgrade.sh is run as part of upgrade
Nodes are evicted
Permissions errors
ASM fails to start
Cluster fails to start
ASM starts if done manually
Rootupgrade.sh rerun on node where it didn't complete
Cluster starts
Upgrade verified to be successful

The below details may vary depending on what order the servers were upgraded and how many nodes in the cluster.

++ Upgrade to 19c completed on nodes 1,3,5,7
++ When running rootupgrade on node 2, all the nodes except node 4 are evicted.
++ After that cluster is not starting on any of the nodes.
++ Verified and found cluster is up and running only on node 4, but it was not responding to any of the crsctl commands
++ Tried restarting node 1 , but still it is failing because of connectivity failure with node 8.
++ Stopped the clusterware on all the nodes and start it one by one.

Stopped cluster on all the nodes , but still ocssd processes were running as orphan processes .
Killed these proceesses manually on all the nodes using kill -9 command.

++ Followed the workaround mentioned in the below doc to resolve this issue and rebooted the node once.

GI Upgrade From 12c To 18c Fails (Doc ID 2497014.1)

++ Now cluster on all the nodes were up and running without any issues
++ Restarted root upgrade on node 2.

Rootupgrade completed without any issues on node 2 and 4

++ Upgrade completed successfully on all the nodes:

$ dcli -l oracle -g /home/oracle/dbs_group ' /u01/app/19.0.0.0/grid/bin/crsctl query crs activeversion -f '
Node1: Oracle Clusterware active version on the cluster is [19.0.0.0.0]. The cluster upgrade state is [NORMAL]. The cluster active patch level is [2737317647].
Node2: Oracle Clusterware active version on the cluster is [19.0.0.0.0]. The cluster upgrade state is [NORMAL]. The cluster active patch level is [2737317647].
...
Node8: Oracle Clusterware active version on the cluster is [19.0.0.0.0]. The cluster upgrade state is [NORMAL]. The cluster active patch level is [2737317647].

This same scenario happened on different clusters and for different customers.

Various Messages Seen:

=================

CRS-41053: checking Oracle Grid Infrastructure for file permission issues <-------- These privilege checking messages are a new feature of 19c and were not the cause of the issue.
> PRVH-0111 : Path "/u01/app/oracle/diag" with permissions "rwxr-x---" does not have read permissions for others on node "Node2".
> PRVH-0109 : Path "/u01/app/oracle/diag" with permissions "rwxr-x---" does not have write permissions for the file's group on node "Node2".
> PRVH-0113 : Path "/u01/app/oracle/diag" with permissions "rwxr-x---" does not have execute permissions for others on node "Node2".

"PRVF-5157 : Could not verify ASM group "DATA" for Voting Disk location" issue on Node 8 and Node 6 <-------- ASM was down and had to be started manually using "sqlplus sys / as sysasm"
CRS upgrade has completed.
Rolling upgrade of CRS is finished. Turning ON New PE
Node Node1 upgraded to version 19.0.0.0.0
CRS-8013: reboot advisory message text: oracssdmonitor is about to reboot this node due to no I/O completions with majority of voting disks.

osysmond shows:

CRFM:4011058944: Version mismatch. Expected: major=19 minor=3, Got: major=12 minor=0, j=0

Because ASM is not started there will also be errors related to anything where a file is not available on ASM such as...

KFOD-00301: Unable to contact Cluster Synchronization Services (CSS). Return code 2 from kgxgncin.
KFOD-00105: Could not open pfile 'init@.ora'
PRVG-2071 : Disk group for ocr location "+DATA" is not available on "Node8"
PRVF-4557 : Node application "ora.Node8.vip" is offline on node "Node8"
PRVF-4557 : Node application "ora.net1.network" is offline on node "Node8"
PRVF-5155 : Failure to retrieve ASM Disk Groups on node "Node8"

Messages seen in ocssd.trc prior to reboots
CSSD:2398549760: (:CSSNM00018:)clssnmvDiskCheck: Aborting, 0 of 5 configured voting disks available, need 3 ocssd_73779.trc [ Node8 ]
CSSD:2398549760: (:CSSNM00058:)clssnmvDiskCheck: No I/O completions for 57570 ms for voting file ...[ Node8]
CSSD:2398549760: (:CSSNM00058:)clssnmvDiskCheck: No I/O completions for 57570 ms for voting file ...[ Node8]

Messages showing the voting files relocating in ASM alert logs

NOTE: Attempting voting file refresh on diskgroup DATA
NOTE: Refresh completed on diskgroup DATA. Found 5 voting file(s).
NOTE: Voting file relocation is required in diskgroup DATA <------------------ This is the main contributing issue - voting file relocation during the upgrade when the servers are temporarily on different GI versions
NOTE: Attempting voting file relocation on diskgroup DATA
NOTE: Successful voting file relocation on diskgroup DATA

Node 1 /var/log/messages shows 12 reboots of Node 1 alone over the course of 3 hours during upgrades

Cause

	To view full details, sign in with your My Oracle Support account.
	Don't have a My Oracle Support account? Click to get started!

In this Document

My Oracle Support provides customers with access to over a million knowledge articles and a vibrant support community of peers and Oracle experts.

Grid Upgrade to 19c Fails During rootupgrade.sh. Nodes Reboot, Voting File Relocates, ASM Listener, and ASM Don't Start Automatically. (Doc ID 2651972.1)

Applies to:

Symptoms

Cause

To view full details, sign in with your My Oracle Support account.

Don't have a My Oracle Support account? Click to get started!