ZFS Write Performance Degrades With Threads Held Up By space_map_load_wait() (Doc ID 1359269.1)

Last updated on JULY 29, 2016

Applies to:

Sun Storage 7720 Unified Storage System - Version Not Applicable and later
Sun ZFS Storage 7420 - Version Not Applicable and later
Sun Storage 7310 Unified Storage System - Version Not Applicable and later
OpenSolaris Operating System - Version 2008.05 and later
Sun Storage 7110 Unified Storage System - Version Not Applicable and later
Information in this document applies to any platform.
***Checked for relevance on 26-Aug-2013***

Symptoms

There are several actions required to confirm this issue. 

A threadlist taken from the system during the performance issue should contain threads with a stack similar to the following:

ffffff00f5253c60 fffffffffbc2bbb0 0 0 99 ffffff8215990f3e
PC: _resume_from_idle+0xf1 TASKQ: spa_zio_write_issue
stack pointer for thread ffffff00f5253c60: ffffff00f5253790
[ ffffff00f5253790 _resume_from_idle+0xf1() ]
  swtch+0x160()
  cv_wait+0x61()
  space_map_load_wait+0x2e()
  space_map_load+0x4c()
  metaslab_activate+0x6e()
  metaslab_group_alloc+0x269()
  metaslab_alloc_dva+0x287()
  metaslab_alloc+0x9b()
  zio_dva_allocate+0x3e()
  zio_execute+0xa0()
  taskq_thread+0x1b7()
  thread_start+8()


Or count the number of stacks with space_map_load_wait () in them within mdb

> ::stacks -c space_map_load_wait

THREAD           STATE  SOBJ COUNT
ffffff007cd61c60 SLEEP  CV   8
         swtch+0x147
         cv_wait+0x61
         space_map_load_wait+0x2e
         metaslab_activate+0x60
         metaslab_group_alloc+0x246
         metaslab_alloc_dva+0x2a6
         metaslab_alloc+0x9c
         zio_dva_allocate+0x57
         zio_execute+0x89
         taskq_thread+0x1b7
         thread_start+8



Looking at time spent in zio_dva_allocate (timestamp/1000) there are several outliers which are taking a long time.


$ dtrace -n 'fbt:zfs:zio_dva_allocate:entry {self->ts = timestamp;} fbt:zfs:zio_dva_allocate:return /self->ts/ {@[probefunc] = quantize((timestamp - self->ts)/1000); self->ts = 0;}'
^C

         value   ------------- Distribution ------------- count
      2048 |                                         0
      4096 |                                         1
      8192 |@                                        1944
     16384 |@@@@@@@@@@@@@@                           27018
     32768 |@@@@@@@@@@@@@@@@                         31038
     65536 |@@@@                                     7144
    131072 |@@                                       3470
    262144 |@                                        2331
    524288 |@                                        1150
   1048576 |                                         510
   2097152 |                                         211
   4194304 |                                         81
   8388608 |                                         103
  16777216 |                                         248
  33554432 |                                         424
  67108864 |                                         588
 134217728 |                                         636
 268435456 |                                         543
 536870912 |                                         223
1073741824 |                                         28
2147483648 |                                         0



Running '::zio_state' from mdb will show several write threads in WAIT_FOR_CHILDREN_READY and DVA_ALLOCATE

> ::zio_state

ADDRESS                           TYPE   STAGE                    WAITER
60038a57130                       NULL   DONE                     3017d34daa0
301eb1ebc40                       NULL   DONE                     3000c5a14e0
60047aa0060                       NULL   DONE                     3011d097800
6004dd35c40                       NULL   WAIT_FOR_CHILDREN_READY  2a10627fca0
 600471ac450                      WRITE  WAIT_FOR_CHILDREN_READY  -
  600498aa5b8                     WRITE  WAIT_FOR_CHILDREN_READY  -
   300960f0008                    WRITE  WAIT_FOR_CHILDREN_READY  -
    302a7915888                   WRITE  WAIT_FOR_CHILDREN_READY  -
     300bbd28d78                  WRITE  WAIT_FOR_CHILDREN_READY  -
      3006e484ac0                 WRITE  WAIT_FOR_CHILDREN_READY  -
       3006289f1d0                WRITE  WAIT_FOR_CHILDREN_READY  -
        3009b726120               WRITE  WAIT_FOR_CHILDREN_READY  -
         30181d8f720              WRITE  WAIT_FOR_CHILDREN_READY  -
          6004a1f6f10             WRITE  DVA_ALLOCATE             -
          6004a1f6700             WRITE  DVA_ALLOCATE             -
         30387051168              WRITE  WAIT_FOR_CHILDREN_READY  -
          30047b738a0             WRITE  DVA_ALLOCATE             -
          30047b72b30             WRITE  DVA_ALLOCATE             -
          30097f45468             WRITE  DVA_ALLOCATE             -


Running 'zpool iostat -v' will show the vdevs which have limited space left.  The following example shows how the 'capacity free' varies between the original vdevs (emcpower16g and empower17g) versus the new vdevs (emcpower1g and emcpower23c).

               capacity    operations  bandwidth
pool          alloc free  read  write read  write
------------- ----- ----- ----- ----- ----- -----
DATA2021-02   372G  94.8G 212   152   1.97M 1.68M
  emcpower1g  198G  50.4G 88    30    934K  553K
  emcpower16g 56.3G 3.22G 36    37    318K  179K
  emcpower17g 56.2G 3.34G 35    39    307K  202K
  emcpower23c 61.6G 37.9G 51    46    462K  789K
------------- ----- ----- ----- ----- ----- -----



Changes

This issue can occur in situations where additional top-level vdevs have been added to an existing zpool to increase storage.  This results in imbalanced vdevs within the zpool where the original vdevs have more data on them than the new ones.  When the older vdevs fill up ZFS will spend a lot of time in space_map_load_wait() trying to look for space in other vdevs.

Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms