11gR2 Grid Infrastructure Fails to Start After OS Upgrade or root.sh or rootupgrade.sh Hangs if Voting Disk is on OCFS2

(Doc ID 1321757.1)

Last updated on FEBRUARY 19, 2012

Applies to:

Oracle Server - Enterprise Edition - Version: 11.2.0.1 and later   [Release: 11.2 and later ]
Information in this document applies to any platform.

Symptoms

If Voting File is on OCFS2, 11gR2 GI root.sh or rootupgrade.sh stalls:

If running root.sh

  • root.sh output
..
Creating OCR Keys for user 'root', privgrp 'root'..
Operation successful.
==> then stuck here

  • $GRID_HOME/cfgtoollogs/crsconfig/rootcrs_<node>.log
..
2011-03-16 17:23:28: Creating voting files
2011-03-16 17:23:28: Adding voting files /ocfs2/storage/voting
2011-03-16 17:23:28: Executing crsctl add css votedisk /ocfs2/storage/voting
2011-03-16 17:23:28: Executing /ocw/grid/bin/crsctl add css votedisk /ocfs2/storage/voting
2011-03-16 17:23:28: Executing cmd: /ocw/grid/bin/crsctl add css votedisk /ocfs2/storage/voting

  • "ps" output, strace and call stack
ps -ef| grep votedisk
root     22026 17304  0 13:18 pts/2    00:00:00 /ocw/grid/bin/crsctl.bin add css votedisk /ocfs2/storage/voting

strace -ftt -p <pid-of-crsctl-from-above-ps-command>
Process 22026 attached - interrupt to quit
13:59:35.663332 times({tms_utime=1, tms_stime=0, tms_cutime=0, tms_cstime=0}) = 429525744
13:59:35.663538 times({tms_utime=1, tms_stime=0, tms_cutime=0, tms_cstime=0}) = 429525744
13:59:35.663638 io_getevents(47273557176320, 1, 128,
<unfinished ...>
Process 22026 detached

pstack <pid-of-crsctl-from-above-ps-command>
Thread 2 (Thread 0x41b70940 (LWP 22028)):
#0  0x0000003d0640aee9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00002b09a1e73548 in sltspcwait () from /ocw/grid/lib/libclntsh.so.11.1
#2  0x00002b099fcf69b8 in clsd_logThread () from /ocw/grid/lib/libhasgen11.so
#3  0x0000003d0640673d in start_thread () from /lib64/libpthread.so.0
#4  0x0000003d058d44bd in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x2b09a4493dc0 (LWP 22026)):
#0  0x00002b09a42915b4 in ?? () from /usr/lib64/libaio.so.1
#1  0x00002b09a116cf6c in skgfrliopo () from /ocw/grid/lib/libclntsh.so.11.1
#2  0x00002b09a116cd75 in skgfospo () from /ocw/grid/lib/libclntsh.so.11.1
#3  0x00002b09a284f37f in skgfrwat () from /ocw/grid/lib/libclntsh.so.11.1
#4  0x00002b09a326b51e in kgfkWaitIO () from /ocw/grid/lib/libasmclntsh11.so
#5  0x00000000004cb848 in clsfDiscoverReap ()
#6  0x00000000004ca62a in clsfDiscover ()
#7  0x00000000004cad8a in clsfskidCreate ()
#8  0x00000000004c8256 in clsscfg_vhcreate ()
#9  0x00000000004c8d3e in clsscfg_vcreate ()
#10 0x0000000000434e61 in crsctl_vformat ()
#11 0x0000000000436257 in crsctl_css_votedisk ()
#12 0x000000000053a1c5 in cls_crsctl_parser::AddParser::addCss() ()
#13 0x0000000000535c23 in cls_crsctl_parser::AddParser::performOperation() ()
#14 0x00000000004dd2df in cls_crsctl_parser::Command::performParsingAndOperation(int, unsigned char**)()
#15 0x00000000004d202b in crsctl_main(int, unsigned char**) ()
#16 0x00000000004306c8 in main ()


If running rootupgrade.sh
:

  • rootupgrade.sh output
..
ACFS-9323: Creating module dependencies - this may take some time.
ACFS-9327: Verifying ADVM/ACFS devices.
ACFS-9309: ADVM/ACFS installation correctness verified.
==> then stuck here

  • $GRID_HOME/cfgtoollogs/crsconfig/rootcrs_<node>.log
..
2011-05-12 13:54:38: Upgrading the existing voting disks!
2011-05-12 13:54:38: Executing /ocw/grid/bin/cssvfupgd
2011-05-12 13:54:38: Executing cmd: /ocw/grid/bin/cssvfupgd

  • "ps" output, strace and call stack
ps -ef| grep cssvfupgd
root     19530 14802  0 13:54 pts/0    00:00:00 /ocw/grid/bin/cssvfupgd.bin

strace -ftt -p <pid-of-cssvfupgd-from-above-ps-command>
Process 19530 attached - interrupt to quit
13:59:35.663332 times({tms_utime=1, tms_stime=0, tms_cutime=0, tms_cstime=0}) = 429525744
13:59:35.663538 times({tms_utime=1, tms_stime=0, tms_cutime=0, tms_cstime=0}) = 429525744
13:59:35.663638 io_getevents(47273557176320, 1, 128,
<unfinished ...>
Process 19530 detached

pstack <pid-of-cssvfupgd-from-above-ps-command> 
#0  0x00002afebb6c45b4 in ?? () from /usr/lib64/libaio.so.1
#1  0x00002afeb8d756d4 in skgfrliopo () from /ocw/grid/lib/libclntsh.so.11.1
#2  0x00002afeb8d754dd in skgfospo () from /ocw/grid/lib/libclntsh.so.11.1
#3  0x00002afeba454eb7 in skgfrwat () from /ocw/grid/lib/libclntsh.so.11.1
#4  0x00002afebae7351e in kgfkWaitIO () from /ocw/grid/lib/libasmclntsh11.so
#5  0x0000000000407bbc in clsfDiscoverReap ()
#6  0x000000000040699e in clsfDiscover ()
#7  0x0000000000406b66 in clsfpath2skid ()
#8  0x00000000004029dc in cssvfupgd_GetGUID ()
#9  0x00000000004032d3 in cssvfupgd_GetConfig ()
#10 0x000000000040376d in cssvfupgd_main ()
#11 0x00000000004042c6 in clsutlmain ()
#12 0x0000000000401fc8 in main ()

For both root.sh and rootupgrade.sh

  • $GRID_HOME/log/<node>/client/crsctl_root.log for root.sh or $GRID_HOME/log/<node>/client/cssvfupgdn.log for rootupgrade.sh
..
2011-03-16 17:23:28.143: [   SKGFD][2754190768]Discovery with str:/ocfs2/storage/voting:

2011-03-16 17:23:28.143: [   SKGFD][2754190768]UFS discovery with :/ocfs2/storage/voting:

2011-03-16 17:23:28.143: [   SKGFD][2754190768]Fetching UFS disk :/ocfs2/storage/voting:

2011-03-16 17:23:28.143: [   SKGFD][2754190768]OSS discovery with :/ocfs2/storage/voting:

2011-03-16 17:23:28.143: [   SKGFD][2754190768]Handle 0x154724a0 from lib :UFS:: for disk :/ocfs2/storage/voting:

2011-03-16 18:35:03.761: [   SKGFD][2754190768]WARNING:io_getevents timed out 4294 sec


  • mount option
mount | grep ocfs2
/dev/sdc1 on /ocfs2 type ocfs2 (rw,_netdev,datavolume,nointr,heartbeat=local)

ls -l /ocfs2/storage/voting
-rw-r--r-- 1 oracle oinstall 10240000 Jun 19  2009 /ocfs2/storage/voting

OR:

After OS upgrade, GI fails to start as ocssd.bin reports the following messages repeatedly for about 10 minutes:

2011-08-05 12:02:50.917: [   SKGFD][1083201856]Handle 0x16472a60 from lib :UFS:: for disk :/ocfs2/vote/unicadpr_vote1.crs:
..
2011-08-05 12:02:51.438: [    CSSD][1077500224]clssgmExecuteClientRequest(): type(6) size(684) only connect and exit messages are allowed before lease acquisition proc(0x16422dd0) client(0x16428550)
..
..
2011-08-05 12:12:44.643: [    CSSD][1077500224]clssgmExecuteClientRequest: MAINT recvd from proc 3 (0x16423de0)
2011-08-05 12:12:44.643: [    CSSD][1077500224]clssgmShutDown: Received abortive shutdown request from client.
2011-08-05 12:12:44.643: [    CSSD][1077500224]###################################
2011-08-05 12:12:44.643: [    CSSD][1077500224]clssscExit: CSSD aborting from thread GMClientListener
2011-08-05 12:12:44.643: [    CSSD][1077500224]###################################


Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms