Solaris 10 With Patch 142909-17 or 142910-17 May Panic or Experience Significant ZFS I/O Performance Degradation (Doc ID 1358527.1)

Last updated on JULY 29, 2016

Applies to:

Solaris Operating System - Version 10 9/10 U9 and later
Information in this document applies to any platform.

Symptoms

If the described issue occurs, the system may exhibit either a panic or a significant reduction in ZFS I/O performance.

The panic manifests itself with a stack trace similar to the following:

(SPARC Platform)
-- trap data type: 0x31 (data access MMU miss) rp: 0x2a11048cd70 --
addr: 0x68
pc: 0x12c2ed0 zfs:arc_get_data_buf+0x224: ldx [%i4 + 0x68], %i2
npc: 0x12c2ed4 zfs:arc_get_data_buf+0x228: subcc %i2, %l7, %g0 ( cmp %i2, %l7 )
global: %g1 0xc
%g2 0x200 %g3 0x18855c
%g4 0x18855b %g5 0x3011e232bc8
%g6 0x15 %g7 0x3010b306a40
out: %o0 0x506e98e %o1 0x1
%o2 0x506e98d %o3 0x506e98d
%o4 0xf %o5 0x3000
%sp 0x2a11048c611 %o7 0x12c2ec0
loc: %l0 0x604c23655b8 %l1 0x2006
%l2 0x19651b0 %l3 0x1965150
%l4 0x19651b0 %l5 0x195d400
%l6 0x1965008 %l7 0x1965150
in: %i0 0x6038aa7e0f8 %i1 0
%i2 0x80d73c58600 %i3 0x2000
%i4 0 %i5 0x195d760
%fp 0x2a11048c6c1 %i7 0x12c3a24

zfs:arc_get_data_buf+0x224(0x6038aa7e0f8)
zfs:arc_read_nolock+0x48c(0x0, 0x602c3f10040, 0x602e97b0100, 0x0, 0x0, 0x6, 0xc0, 0x2a11048d19c, 0x2a11048d178)
zfs:arc_read+0x48(0x0, 0x602c3f10040, 0x602e97b0100, 0xc0, 0x0, 0x0, 0x6, 0xc0, 0x2a11048d19c, 0x2a11048d178)
zfs:dbuf_prefetch+0x13c(, 0x2dc2, , , , 0x602e97b0100)
zfs:dmu_zfetch_fetch+0x2c(0x6041e407a48, 0x2dbe, 0x8)
zfs:dmu_zfetch_dofetch+0xb8(0x6041e407ca8, 0x604224f70a8)
zfs:dmu_zfetch_find+0x480(0x6041e407ca8, 0x2a11048d478, 0x20)
zfs:dmu_zfetch+0x7c(0x6041e407ca8, 0x5abc000, 0x2000, 0x20)
zfs:dbuf_read+0x1b8(0x6040d2c46d0, 0x603fdb592c0, 0x16)
zfs:dmu_buf_hold_array_by_dnode+0x19c(0x6041e407a48, 0x5a80200, 0x7fe00, 0x1, 0x13408c6, 0x2a11048d7fc, 0x2a11048d7f0, 0x0)
zfs:dmu_buf_hold_array+0x64(0x602e9840fc0, 0x2a11048d7fc, 0x5a80200, 0x7fe00, 0x1, 0x13408c6, , 0x2a11048d7f0)
zfs:dmu_read_uio+0x30(0x602e9840fc0, 0xa24, 0x2a11048da10, 0x7fe00, 0x2a11048d7fc)
zfs:zfs_read+0x224()
zfs:zfs_shim_read(0x603ba494040, 0x2a11048da10, 0x0, 0x603ed711400, 0x0) - frame recycled
genunix:fop_read+0x20(0x603ba494040, 0x2a11048da10, 0x0, , 0x0)
genunix:pread+0x244(0x105, , 0x80000?, 0x5a80200?)
unix:syscall_trap+0xac()
(x86 Platform)
panic[cpu29]/thread=fffffee68cefd700:
BAD TRAP: type=e (#pf Page fault) rp=fffffe80095f7780 addr=68 occurred in module "zfs" due to a NULL pointer dereference

oracle:
#pf Page fault
Bad kernel fault at addr=0x68
pid=12100, pc=0xfffffffff7f5ffaa, sp=0xfffffe80095f7870, eflags=0x10206
cr0: 8005003b<pg,wp,ne,et,ts,mp,pe> cr4: 6f8<xmme,fxsr,pge,mce,pae,pse,de>
cr2: 68 cr3: 260a43e000 cr8: d
rdi: fffffffff3c7d310 rsi: 1 rdx: fffffe894669e000
rcx: 0 r8: 14b98 r9: 0
rax: fffffe894669e000 rbx: 0 rbp: fffffe80095f7890
r10: 1 r11: 1 r12: 2000
r13: fffffee2b91535d0 r14: 0 r15: 0
fsb: fffffd7ffcbf0200 gsb: ffffffffae885000 ds: 0
es: 0 fs: 0 gs: 0
trp: e err: 0 rip: fffffffff7f5ffaa
cs: 28 rfl: 10206 rsp: fffffe80095f7870
ss: 30

fffffe80095f7690 unix:die+da ()
fffffe80095f7770 unix:trap+5e6 ()
fffffe80095f7780 unix:_cmntrap+140 ()
fffffe80095f7890 zfs:arc_get_data_buf+11a ()
fffffe80095f7940 zfs:arc_read_nolock+365 ()
fffffe80095f79a0 zfs:arc_read+6e ()
fffffe80095f7a20 zfs:dbuf_prefetch+115 ()
fffffe80095f7a50 zfs:dmu_zfetch_fetch+40 ()
fffffe80095f7a80 zfs:dmu_zfetch_dofetch+7c ()
fffffe80095f7ae0 zfs:dmu_zfetch_find+369 ()
fffffe80095f7b90 zfs:dmu_zfetch+c8 ()
fffffe80095f7bd0 zfs:dbuf_read+27a ()
fffffe80095f7c50 zfs:dmu_buf_hold_array_by_dnode+284 ()
fffffe80095f7ca0 zfs:dmu_buf_hold_array+7f ()
fffffe80095f7d00 zfs:dmu_read_uio+49 ()
fffffe80095f7d60 zfs:zfs_read+15e ()
fffffe80095f7d70 zfs:zfs_shim_read+c ()
fffffe80095f7dc0 genunix:fop_read+31 ()
fffffe80095f7f00 genunix:pread+178 ()
fffffe80095f7f10 unix:brand_sys_syscall+21d ()



The performance impact can be seen as a very high percentage of kernel CPU time in vmstat or mpstat and high lock contention in ARC locks during ZFS I/O, as shown in some example lockstat -C output below exhibiting contention in an ARC_mru lock.


Adaptive mutex spin: 30074 events in 2.261 seconds (13300 events/sec)

---------------------------------------------------------------------
Count indv cuml rcnt nsec   Lock Hottest Caller
24958  88%  88% 0.00 264141 ARC_mru[96]  arc_change_state+0x134

nsec ------ Time Distribution ------ count   Stack
512    |                               158   arc_read_done+0x114
1024   |                               584   zio_done+0x444
2048   |                               771   zio_execute+0xb4
4096   |@                             1600   zio_done+0x4a0
8192   |@@@@                          3681   zio_execute+0xb4
16384  |@@@@@                         4234   zio_done+0x4a0
32768  |@@@                           2963   zio_execute+0xb4
65536  |@                             1267   taskq_thread+0x3b8
131072 |@@@                           2572   thread_start+0x4
262144 |@@                            2348

Changes



Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms