My Oracle Support Banner

A Fragmented And Near Full Zpool Could Cause RAC Panic (Doc ID 2163827.1)

Last updated on JULY 02, 2018

Applies to:

Solaris x64/x86 Operating System - Version 10 11/06 U3 and later
Oracle SuperCluster T5-8 Full Rack - Version All Versions to All Versions [Release All Releases]
Solaris Operating System - Version 10 11/06 U3 and later
Information in this document applies to any platform.

Symptoms

 If the zpool, where the RAC software is installed, becomes fragmented and near full, it could cause RAC panic.

Disclaimer: RAC panic could be caused by many reasons. The scenario in this document is merely one possibility. Any RAC induced panic must be diagnosed by the RAC support team first.

In the example, the RAC software is installed in zpool 'data'.  We could observe the following symptoms in the Solaris crash dump:

1. High zpool usage

WARNING: ZFS pool "data" @ 0xffffc1c06773a540 usage > 90% 509G/556G

2. Overdue zios

WARNING: 213 deadline (103 overdue) 4 read 209 write 10 pending zios for pool "data" vdev /dev/dsk/c3t3d0s0 (run "zfs [-l] zio pending")
WARNING: 138 deadline (30 overdue) 2 read 136 write 10 pending zios for pool "data" vdev /dev/dsk/c3t0d0s0 (run "zfs [-l] zio pending")

3. RAC grid process(es) waiting to write to the zpool

==== user (LWP_SYS) thread: 0xffffc1c0fda4a7e0 PID: 20382 ====
cmd: /u01/app/12.1.0.2/grid/bin/oraagent.bin

t_wchan: 0xffffc1c06bbd58e6 sobj: condition var (from zfs:txg_wait_open+0x78)

idle: 4285160124 hrticks (4.285160124s)
start: Tue Jun 28 00:46:34 2016
age: 1218 seconds (20 minutes 18 seconds)

unix:_resume_from_idle+0xf5 resume_return()
unix:swtch - frame recycled
void genunix:cv_wait+0x60
void zfs:txg_wait_open+0x78
void zfs:dmu_tx_wait+0xa1
int zfs:zfs_setattr+0x18d4
int genunix:fop_setattr+0x115
int genunix:fsetattrat+0xfa
int genunix:fchownat+0x13b
unix:_syscall_invoke+0x30()
-- switch to user thread's user stack --

CAT()> getpath 0xffffc1c0f0ef4b00
/u01/app/grid/diag/crs/snr3db01/crs/alert/log.xml

4. Write in flight

"data" @ 0xffffc1c06773a540:
synced ub_timestamp: -4s txg: 10909751
current ub_timestamp: -4s txg: 10909751
sync thread:
0xfffffffc83000c20 SLEEP idle:0.360338689s cv:0xffffc1c0edd4b438 (from zfs:zio_wait+0x5c) sched(zfs:txg_sync_thread)
quiesce thread:
0xfffffffc837d6c20 RUNNABLE idle:4.281476232s sched(zfs:txg_quiesce_thread) (in CPU46 pri 60 dispq)
open[2] quiesced[1] syncing[0] synced[3]
============ ============ ============ ============
txg 10909754 10909753 10909752 10909751
dp_space_towrite 33841152 35979264 35364864 0
dp_tempreserved 0 0 0 0
dp_sync_write_time: 3.332925963s
dp_read_overhead: 0s
dp_written_in_sync: 35315712 (33.6M)
dp_writes_inflight: 105185280 (100M)
dp_throughput: 5005 B/ms
dp_write_limit: 33554432 (32M)

5. Fragmentation

CAT()> zfs -p data metaslab
spa "data" @ 0xffffc1c06773a540 (ACTIVE) 91% used (509G/556G)

vdev mirror-0 @ 0xffffc1c06b1f2800
Total Alloc Free Usage
556G 509G 46.9G 91%
=============================================
address number size alloc free offset spacemap freepct loaded segments maxsize
0xffffc1c06f1b2040 0 4G 3.77G 229M 0 30 5% loaded 64823 10.5K
0xffffc1c06a88b580 1 4G 3.81G 190M 0x100000000 41 4% loaded 56443 10.5K
0xffffc1c06af97ac0 2 4G 3.67G 330M 0x200000000 42 8% loaded 90702 11K
0xffffc1c06ab95580 3 4G 3.65G 352M 0x300000000 43 8% loaded 92048 10.5K
0xffffc1c0677a6540 4 4G 3.62G 381M 0x400000000 44 9% loaded 99480 11K
0xffffc1c0671dc540 5 4G 3.47G 537M 0x500000000 45 13% loaded 137358 11K
0xffffc1c069b61540 6 4G 3.74G 262M 0x600000000 47 6% loaded 69857 11K
0xffffc1c06b1ed040 7 4G 3.78G 219M 0x700000000 48 5% loaded 58044 11K
0xffffc1c06bdf5580 8 4G 3.64G 368M 0x800000000 49 8% loaded 93489 11K
0xffffc1c069b16040 9 4G 3.62G 386M 0x900000000 50 9% loaded 99216 11K
0xffffc1c08519cac0 10 4G 3.66G 340M 0xa00000000 56 8% loaded 89827 11K
0xffffc1c08519c580 11 4G 3.85G 152M 0xb00000000 57 3% loaded 38434 11K
0xffffc1c08519c040 12 4G 3.71G 287M 0xc00000000 58 7% loaded 75771 11K
0xffffc1c08519da80 13 4G 3.91G 82.1M 0xd00000000 61 2% loaded 21545 10.5K
0xffffc1c08519d540 14 4G 3.97G 27.5M 0xe00000000 62 0% loaded 11213 7.50K
0xffffc1c08519d000 15 4G 3.88G 113M 0xf00000000 64 2% loaded 32868 10.5K
0xffffc1c08519fac0 16 4G 3.84G 158M 0x1000000000 66 3% loaded 40496 11K
0xffffc1c08519f580 17 4G 3.65G 356M 0x1100000000 67 8% loaded 89386 11K
0xffffc1c08519f040 18 4G 3.62G 379M 0x1200000000 68 9% loaded 94097 11K
0xffffc1c0851a2a80 19 4G 3.66G 346M 0x1300000000 69 8% loaded 85095 11K

....

 

Cause

To view full details, sign in with your My Oracle Support account.

Don't have a My Oracle Support account? Click to get started!


In this Document
Symptoms
Cause
Solution


My Oracle Support provides customers with access to over a million knowledge articles and a vibrant support community of peers and Oracle experts.