Create a file system with ACFS hang at mkfs -t acfs

(Doc ID 2331497.1)

Last updated on NOVEMBER 29, 2017

Applies to:

Oracle Database - Enterprise Edition - Version 12.2.0.1 and later
Information in this document applies to any platform.

Symptoms

+ OS Version : Oracle Linux 7
+ GI Version : Oracle Grid Infrastructure 12c 12.2.0.1.0
+ Create ADVM volume successfully .

ASMCMD> volcreate -G ACFS -s 1024M data
ASMCMD> volinfo -G acfs data
Diskgroup Name: ACFS

Volume Name: DATA
Volume Device: /dev/asm/data-211
State: ENABLED
Size (MB): 1024
Resize Unit (MB): 512
Redundancy: UNPROT
Stripe Columns: 8
Stripe Width (K): 1024
Usage:
Mountpath:

+ Create ACFS filesystem hang

$ /sbin/mkfs -t acfs /dev/asm/data-211

+ strace report following message repeatedly:


7541 09:44:20 futex(0x1e7db24, FUTEX_WAIT_BITSET_PRIVATE, 9, {174, 309629424}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
7541 09:44:20 futex(0x1e7da50, FUTEX_WAKE_PRIVATE, 1) = 0
7541 09:44:20 futex(0x1e7db24, FUTEX_WAIT_BITSET_PRIVATE, 11, {174, 335038304}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
7541 09:44:20 futex(0x1e7da50, FUTEX_WAKE_PRIVATE, 1) = 0
7541 09:44:20 futex(0x1e7db24, FUTEX_WAIT_BITSET_PRIVATE, 13, {174, 360489084}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
7541 09:44:20 futex(0x1e7da50, FUTEX_WAKE_PRIVATE, 1) = 0

+ OS process of mkfs is in 'D' state, but dd the disk by grid user is fine,and disk permission is correct, so it is not a disk I/O issue

[grid@rac1 ~]$ ps auxf|grep mkfs
root 7534 0.0 0.0 4708 952 pts/0 S+ 09:44 0:03 \_ strace -fto /tmp/strace_mkfs_new.log /sbin/mkfs -t acfs /dev/asm/data-211
root 7539 0.0 0.1 235552 12336 pts/0 D+ 09:44 0:01 \_ /sbin/mkfs.acfs.bin /dev/asm/data-211

+ acfs.log show following message:

V 4294841.555 mkfs.acfs.bin[7539] Asm_getVbgCounts: busy/runnable 0/4
V 4294841.555 mkfs.acfs.bin[7539] Asm_dgUseMsgAsm: ACFS: DGUSE message sent
V 4294841.555 mkfs.acfs.bin[7539] Asm_getVbgCounts: busy/runnable 0/4
U 4294971.097/171117094630 KsLHHMThread[3317] OKSK-00033: WARNING. Possible hung thread tid:0xffff88015766c080 name:mkfs.acfs.bin pid:7539 tsd:0xffff880213d337d0 Volume Number:-1 record sequence count:18
U 4294971.097/171117094630 KsLHHMThread[3317] OKSK-00034: WARNING. Possible hung thread tid:0xffff88015766c080 total elapsed secs:129 hang elapsed secs:129 record cnt:8 lock cnt:3 function cnt:5 last record removed:Asm_getVbgCounts
U 4294971.097/171117094630 KsLHHMThread[3317] OKSK-00035: 0 L LH_WaitEventGeneric type:294 LHV:LH_WaitEventGeneric:294 LHV2:LH_WaitEventGeneric:294 mode:0x200:LH_WAIT_EVENT
U 4294971.097/171117094630 KsLHHMThread[3317] OKSK-00035: 0 L LH_WaitEventGeneric id1:0xffff88019d30b718 id2:0x00000000fffe14d3 ks.c line:1464
U 4294971.097/171117094630 KsLHHMThread[3317] OKSK-00035: 0 L LH_WaitEventGeneric state:held elapsed secs:129 blocked ticks:0 held ticks:129542 seq_num:18
K 4294971.097/171117094630 KsLHHMThread[3317] Hang Analyzer: Spawning a callback to handle hung thread.
U 4294971.097/171117094630 KsLHHMThread[3317] OKSK-00035: 1 F AsmMessage asmMsg.c line:440 elapsed secs:129 seq_num: 16
U 4294971.097/171117094630 KsLHHMThread[3317] OKSK-00035: 2 F Asm_dgUseMsgAsm asmdg.c line:1976 elapsed secs:129 seq_num: 11
U 4294971.097/171117094630 KsLHHMThread[3317] OKSK-00035: 3 L LH_IgnoreLock type:292 LHV:LH_IgnoreLock:292 LHV2:LH_IgnoreLock:292 mode:0x8:LH_MUTEX
U 4294971.097/171117094630 KsLHHMThread[3317] OKSK-00035: 3 L LH_IgnoreLock id1:0xffff8801906f8268 id2:0x0000000000000000 asmdg.c line:1852
U 4294971.097/171117094630 KsLHHMThread[3317] OKSK-00035: 3 L LH_IgnoreLock state:held elapsed secs:129 blocked ticks:0 held ticks:129542 seq_num:6
U 4294971.097/171117094630 KsLHHMThread[3317] OKSK-00035: 4 F Asm_dgUse asmdg.c line:1851 elapsed secs:129 seq_num: 5
U 4294971.097/171117094630 KsLHHMThread[3317] OKSK-00035: 5 F AsmDgStateVolOpen asmdg.c line:634 elapsed secs:129 seq_num: 4
U 4294971.097/171117094630 KsLHHMThread[3317] OKSK-00035: 6 F AsmRootStateVolOpen asmroot.c line:663 elapsed secs:129 seq_num: 3
U 4294971.097/171117094630 KsLHHMThread[3317] OKSK-00035: 7 L LH_ADVM_voldevLock type:233 LHV:LH_ADVM_voldevLock:233 LHV2:LH_ADVM_voldevLock:233 mode:0x8:LH_MUTEX
U 4294971.097/171117094630 KsLHHMThread[3317] OKSK-00035: 7 L LH_ADVM_voldevLock id1:0xffff88020b71f800 id2:0x0000000000000000 asmosd.c line:1435
U 4294971.097/171117094630 KsLHHMThread[3317] OKSK-00035: 7 L LH_ADVM_voldevLock state:held elapsed secs:129 blocked ticks:0 held ticks:129542 seq_num:1

+ dmesg show following message:

[ 719.288155] INFO: task apx_rbal_+apx1:4985 blocked for more than 120 seconds.
[ 719.288158] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 719.288159] apx_rbal_+apx1 D ffff880272453100 0 4985 1 0x00000080
[ 719.288162] ffff880158ecfb40 0000000000000086 ffff880190554380 ffff880158ecffd8
[ 719.288164] ffff880158ecffd8 ffff880158ecffd8 ffff880190554380 ffff8802111ae298
[ 719.288165] ffff8802111ae29c ffff880190554380 00000000ffffffff ffff8802111ae2a0
[ 719.288168] Call Trace:
[ 719.288175] [<ffffffff81571c39>] schedule_preempt_disabled+0x29/0x70
[ 719.288177] [<ffffffff81570667>] __mutex_lock_slowpath+0x117/0x1d0
[ 719.288181] [<ffffffff8156ffdf>] mutex_lock+0x1f/0x30
[ 719.288184] [<ffffffff811bbd63>] __blkdev_get+0x63/0x4c0
[ 719.288185] [<ffffffff811bc385>] blkdev_get+0x1c5/0x340
[ 719.288187] [<ffffffff8157297e>] ? _raw_spin_lock+0xe/0x20
[ 719.288188] [<ffffffff811bc5ab>] blkdev_open+0x5b/0x80
[ 719.288192] [<ffffffff81181e1f>] do_dentry_open+0x1ef/0x2a0
[ 719.288195] [<ffffffff8118ec22>] ? __inode_permission+0x52/0xc0
[ 719.288196] [<ffffffff811bc550>] ? blkdev_get_by_dev+0x50/0x50
[ 719.288198] [<ffffffff81181f01>] finish_open+0x31/0x40
[ 719.288199] [<ffffffff81191cd1>] do_last+0x4e1/0xe60
[ 719.288201] [<ffffffff811926fe>] path_openat+0xae/0x500
[ 719.288205] [<ffffffff8114e225>] ? handle_mm_fault+0x185/0x530
[ 719.288206] [<ffffffff8118ebb9>] ? putname+0x29/0x40
[ 719.288208] [<ffffffff8119313f>] ? user_path_at_empty+0x5f/0x90
[ 719.288210] [<ffffffff811931c8>] do_filp_open+0x38/0x80
[ 719.288211] [<ffffffff8157297e>] ? _raw_spin_lock+0xe/0x20
[ 719.288213] [<ffffffff8119fd0f>] ? __alloc_fd+0xaf/0x140
[ 719.288215] [<ffffffff81183174>] do_sys_open+0xe4/0x1c0
[ 719.288217] [<ffffffff81183271>] sys_open+0x21/0x30
[ 719.288219] [<ffffffff8157af19>] system_call_fastpath+0x16/0x1b
[ 719.288224] INFO: task mkfs.acfs.bin:7539 blocked for more than 120 seconds.
[ 719.288225] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 719.288225] mkfs.acfs.bin D ffff8802724d3100 0 7539 7534 0x00000087
[ 719.288227] ffff88019d30b660 0000000000000082 ffff88015766c080 ffff88019d30bfd8
[ 719.288228] ffff88019d30bfd8 ffff88019d30bfd8 ffff88015766c080 ffff880213d337d0
[ 719.288230] 0000000000000001 ffff880263d3b2d8 0000000000000000 0000000000000001
[ 719.288231] Call Trace:
[ 719.288233] [<ffffffff81571589>] schedule+0x29/0x70
[ 719.288242] [<ffffffffa046f9d5>] KsWaitEvent_OSD+0x355/0x4f0 [oracleoks]
[ 719.288245] [<ffffffff8107f610>] ? wake_up_bit+0x30/0x30
[ 719.288250] [<ffffffffa04732cc>] KsCondWaitTimeoutV2+0xbc/0x1d0 [oracleoks]
[ 719.288255] [<ffffffffa0473418>] KsCondWaitV2+0x18/0x20 [oracleoks]
[ 719.288262] [<ffffffffa0525dc6>] AsmMessage+0x79b/0xf3c [oracleadvm]
[ 719.288266] [<ffffffffa0516628>] Asm_dgUseMsgAsm+0x615/0x938 [oracleadvm]
[ 719.288270] [<ffffffffa0516d19>] AsmDgStateVolOpen+0x3ce/0x739 [oracleadvm]
[ 719.288277] [<ffffffffa057efa5>] ? AsmBlogText+0xef/0xfe [oracleadvm]
[ 719.288281] [<ffffffffa051acd5>] AsmDgAction+0x20d/0x2dc [oracleadvm]
[ 719.288285] [<ffffffffa0510573>] AsmRootStateVolOpen+0x1e2/0x574 [oracleadvm]
[ 719.288289] [<ffffffffa050fd92>] AsmRootAction+0x424/0x504 [oracleadvm]
[ 719.288295] [<ffffffffa05868e6>] asmOpen+0x4ba/0x633 [oracleadvm]
[ 719.288297] [<ffffffff811bbdcb>] __blkdev_get+0xcb/0x4c0
[ 719.288298] [<ffffffff811bc385>] blkdev_get+0x1c5/0x340
[ 719.288300] [<ffffffff8157297e>] ? _raw_spin_lock+0xe/0x20
[ 719.288301] [<ffffffff811bc5ab>] blkdev_open+0x5b/0x80
[ 719.288303] [<ffffffff81181e1f>] do_dentry_open+0x1ef/0x2a0
[ 719.288305] [<ffffffff8118ec22>] ? __inode_permission+0x52/0xc0
[ 719.288306] [<ffffffff811bc550>] ? blkdev_get_by_dev+0x50/0x50
[ 719.288307] [<ffffffff81181f01>] finish_open+0x31/0x40
[ 719.288309] [<ffffffff81191cd1>] do_last+0x4e1/0xe60
[ 719.288311] [<ffffffff81097398>] ? dequeue_task_fair+0x4c8/0x660
[ 719.288313] [<ffffffff811926fe>] path_openat+0xae/0x500
[ 719.288315] [<ffffffff811931c8>] do_filp_open+0x38/0x80
[ 719.288316] [<ffffffff8157297e>] ? _raw_spin_lock+0xe/0x20
[ 719.288318] [<ffffffff8119fd0f>] ? __alloc_fd+0xaf/0x140
[ 719.288319] [<ffffffff81183174>] do_sys_open+0xe4/0x1c0
[ 719.288321] [<ffffffff81183271>] sys_open+0x21/0x30
[ 719.288322] [<ffffffff8157b127>] tracesys+0xdd/0xe2

+ Following UDEV rules was used to set device name

ACTION=="add|change", KERNEL=="sd?1", ENV{DEVTYPE}=="partition", SUBSYSTEM=="block", PROGRAM=="/usr/lib/udev/scsi_id -g -u -d $devnode", RESULT=="36000c29a6ac2f962224c83cffa0afb00", RUN+="/bin/sh -c '/usr/bin/mkdir /dev/asm; mknod /dev/asm/ASM_OCR_DISK b $major $minor; chown grid:asmadmin /dev/asm/ASM_OCR_DISK; chmod 0660 /dev/asm/ASM_OCR_DISK'"
ACTION=="add|change", KERNEL=="sd?1", ENV{DEVTYPE}=="partition", SUBSYSTEM=="block", PROGRAM=="/usr/lib/udev/scsi_id -g -u -d $devnode", RESULT=="36000c297d1cc334d6fd2501dc55b0b37", RUN+="/bin/sh -c '/usr/bin/mkdir /dev/asm; mknod /dev/asm/ASM_GIMR_DISK b $major $minor; chown grid:asmadmin /dev/asm/ASM_GIMR_DISK; chmod 0660 /dev/asm/ASM_GIMR_DISK'"
ACTION=="add|change", KERNEL=="sd?1", ENV{DEVTYPE}=="partition", SUBSYSTEM=="block", PROGRAM=="/usr/lib/udev/scsi_id -g -u -d $devnode", RESULT=="36000c29f217670e8d3507d16343c81a4", RUN+="/bin/sh -c '/usr/bin/mkdir /dev/asm; mknod /dev/asm/ASM_DATA_DISK b $major $minor; chown grid:asmadmin /dev/asm/ASM_DATA_DISK; chmod 0660 /dev/asm/ASM_DATA_DISK'"

 



Changes

 + Create OS group and user as following for installation

# useradd oracle -u 1001 -g oinstall -G dba,asmdba,backupdba,dgdba,kmdba,racdba,oper
# useradd grid -u 1002 -g oinstall -G asmadmin,asmdba,asmoper,dba

+ ASM mknod devices at /dev/asm to set deivce name and set owner&group "grid:asmadmin" , permission is 0660

+ Newly install the GI 12.2 on Oracle Linux 7, following error occured during run root.sh

2017/11/21 09:29:51 CLSRSC-482: Running command: '/grid/app/12.2.0.1/grid/bin/ocrconfig -upgrade grid oinstall'
Errors in file :
ORA-27091: unable to queue I/O
ORA-06512: at line 4
ORA-15081: failed to submit an I/O operation to a disk
ORA-06512: at "SYS.X$DBMS_DISKGROUP", line 410
ORA-06512: at line 2
PROT-1: Failed to initialize ocrconfig
PROC-26: Error while accessing the physical storage Storage layer error [Insufficient quorum to open OCR devices] [0]
2017/11/21 09:29:51 CLSRSC-157: Failed to create Oracle Cluster Registry configuration
2017/11/21 09:29:51 CLSRSC-259: Oracle Clusterware Repository configuration failed
Died at /grid/app/12.2.0.1/grid/crs/install/oraocr.pm line 202.
The command '/grid/app/12.2.0.1/grid/perl/bin/perl -I/grid/app/12.2.0.1/grid/perl/lib -I/grid/app/12.2.0.1/grid/crs/install /grid/app/12.2.0.1/grid/crs/install/rootcrs.pl ' execution failed

+ Change the group of disk to "grid:oinstall" , rerun the root.sh ,finish without any error.

Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms