High CPU Utilization and ZFS Write Latency Observed When the ARC Grows to its Maximum Size. (Doc ID 1355094.1)

Last updated on AUGUST 01, 2016

Applies to:

Solaris Operating System - Version 10 5/09 U7 and later
Information in this document applies to any platform.
***Checked for relevance on 11-03-2013***
*** Checked for relevance on 08-12-2014 ***
*** Checked for relevance on 05-03-2016 ***

Symptoms

If the described issue occurs, the following will happen:

The following one-line DTrace script will show the problem:

# dtrace -n 'arc_evict_ghost:entry{trace(arg2);@a[stack()]=quantize(arg2);}'
dtrace -n 'arc_evict_ghost:entry{trace(arg2);@a[stack()]=quantize(arg2);}'
dtrace: description 'arc_evict_ghost:entry' matched 1 probe
CPU    ID                    FUNCTION:NAME
 11 56002            arc_evict_ghost:entry        3604216332
  1 56002            arc_evict_ghost:entry       22691599127
  4 56002            arc_evict_ghost:entry       20399602440
^C

zfs`arc_adjust+0x15d
zfs`arc_shrink+0xe2
zfs`arc_kmem_reap_now+0x43
zfs`arc_reclaim_thread+0xf3
unix`thread_start+0x8

      value ------------- Distribution ------------- count
 1073741824 |                                        0
 2147483648 |@@@@@@@@@@@@@                           1
 4294967296 |                                        0
 8589934592 |                                        0
17179869184 |@@@@@@@@@@@@@@@@@@@@@@@@@@@             2
34359738368 |                                        0



Also, lock profiling information as collected by lockstat(1M) will show contention with a stack similar with the one below (syntax example "lockstat -CcwP -n 50000 -D 20 -s 40 sleep 5")  :


Count  indv cuml rcnt    nsec Lock        Hottest Caller
  4724 76%  76%  0.00 1346171 ARC_mru[96] remove_reference+0x3c

  nsec ------ Time Distribution ------ count Stack
  2048 |                               7     arc_buf_remove_ref+0x80
  4096 |                               49    dbuf_rele_and_unlock+0x128
  8192 |@                              245   dmu_buf_rele_array+0x30
 16384 |@@@                            536   dmu_read_uio+0xe8
 32768 |@@@@                           716   zfs_read+0x224
 65536 |@@@@@                          821   fop_read+0x20
131072 |@@@                            554   read+0x274
262144 |@@                             354   syscall_trap32+0xcc




Hold events can report stacks similar to the one below (syntax example "lockstat -HcwP -n 50000 -D 20 -s 40 sleep 5") :


  Count indv  cuml rcnt     nsec  Lock                   Hottest Caller
 106500  67%  67%  0.00   601774  buf_hash_table[16400]  arc_evict_ghost+0x114

 nsec ------ Time Distribution ------ count   Stack
 4096 |                               1       arc_adjust+0x1a8
 8192 |@@@@@@@@@@@@@@@@@@@@           71655   arc_reclaim_thread+0x140
16384 |@@@@@@@@                       31607   thread_start+0x4




Note:  The lockstat(1M) outputs can be collected by the GUDS script, see <Document 1285485.1>  GUDS - A Script for Gathering Solaris Performance Data.




Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms