My Oracle Support Banner

4 Seconds of IO Disruption While New LUNs are Being Added to a Server. (Doc ID 2222153.1)

Last updated on OCTOBER 16, 2018

Applies to:

Solaris Operating System - Version 10 3/05 to 11.3 [Release 10.0 to 11.0]
Information in this document applies to any platform.

Symptoms

Solaris SPARC server running Solaris 10 / Solaris 11 connected to an external array through a SAN setup.
The connection details are irrelevant here.

A 4 seconds IO disruption is observed when new LUs are added to the server, while using some third party multipathing solution, as for example Veritas DMP.

The case exposed below demonstrates this issue. On this particular test scenario, the customer was running a simple dd command on the following drives

c3t500601613CE024FFd0
c3t500601613CE024FFd1

while adding a new LUN to the server.

The IO was interrupted for about 4 seconds on device c3t500601613CE024FFd1.

The new LU was mapped to the same storage port of the LUs above, this is 500601613CE024FF.
This new LU had instance number ssd52.
On the messages file of the host, we can see the addition of this LU at 14:13:46

Jun 15 14:13:43 host1 scsi: [ID 107833 kern.warning] WARNING: /pci@2,600000/SUNW,qlc@0/fp@0,0/ssd@w500601613ce024ff,0 (ssd1):
Jun 15 14:13:43 host1 Error for Command: write(10) Error Level: Retryable
Jun 15 14:13:43 host1 scsi: [ID 107833 kern.notice] Requested Block: 6943232 Error Block: 6943232
Jun 15 14:13:43 host1 scsi: [ID 107833 kern.notice] Vendor: DGC Serial Number: C900001568CL
Jun 15 14:13:43 host1 scsi: [ID 107833 kern.notice] Sense Key: Unit_Attention
Jun 15 14:13:43 host1 scsi: [ID 107833 kern.notice] ASC: 0x3f (reported LUNs data has changed), ASCQ: 0xe, FRU: 0x0

Jun 15 14:13:46 host1 scsi: [ID 107833 kern.warning] WARNING: /pci@2,600000/SUNW,qlc@0/fp@0,0/ssd@w500601613ce024ff,0 (ssd1):
Jun 15 14:13:46 host1 Error for Command: write(10) Error Level: Retryable
Jun 15 14:13:46 host1 scsi: [ID 107833 kern.notice] Requested Block: 7916032 Error Block: 7916032
Jun 15 14:13:46 host1 scsi: [ID 107833 kern.notice] Vendor: DGC Serial Number: C900001568CL
Jun 15 14:13:46 host1 scsi: [ID 107833 kern.notice] Sense Key: Unit_Attention
Jun 15 14:13:46 host1 scsi: [ID 107833 kern.notice] ASC: 0x3f (reported LUNs data has changed), ASCQ: 0xe, FRU: 0x0

Jun 15 14:13:46 host1 scsi: [ID 583861 kern.info] ssd52 at fp7: unit-address w500601613ce024ff,b: 10000
Jun 15 14:13:46 host1 genunix: [ID 936769 kern.info] ssd52 is /pci@2,600000/SUNW,qlc@0/fp@0,0/ssd@w500601613ce024ff,b
Jun 15 14:13:46 host1 scsi: [ID 107833 kern.warning] WARNING: /pci@2,600000/SUNW,qlc@0/fp@0,0/ssd@w500601613ce024ff,b (ssd52):
Jun 15 14:13:46 host1 Corrupt label; wrong magic number

Jun 15 14:13:46 host1 genunix: [ID 408114 kern.info] /pci@2,600000/SUNW,qlc@0/fp@0,0/ssd@w500601613ce024ff,b (ssd52) online
Jun 15 14:13:46 host1 scsi: [ID 107833 kern.warning] WARNING: /pci@2,600000/SUNW,qlc@0/fp@0,0/ssd@w500601613ce024ff,b (ssd52):
Jun 15 14:13:46 host1 Corrupt label; wrong magic number

 

To show this issue, the customer was collecting an iostat every second while the new LUN was added.
As seen the ongoing IO drops to 0 on disk c3t500601613CE024FFd1 only at 14:13:45. The IO resumed at 14:13:49.

Wed Jun 15 14:13:44 2016

extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.0 33.0 0.0 220.3 0.0 0.0 0.0 0.3 0 1 c0t1d0
1.0 121.2 0.0 124075.2 0.0 0.8 0.0 6.6 0 81 c3t500601613CE024FFd0
1.0 14.0 0.0 14355.7 0.0 0.9 0.0 62.8 0 94 c3t500601613CE024FFd1

Wed Jun 15 14:13:45 2016

extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.0 91.0 0.0 692.1 0.0 1.4 0.0 15.5 0 28 c0t1d0
0.0 152.0 0.0 155670.3 0.0 0.9 0.0 5.8 0 89 c3t500601613CE024FFd0
0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0 100 c3t500601613CE024FFd1 -------> IO drop here

Wed Jun 15 14:13:46 2016

extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
1.0 0.0 0.5 0.0 0.0 0.0 0.0 4.5 0 0 c0t1d0
0.0 137.1 0.0 140353.1 0.0 0.9 0.0 6.6 0 90 c3t500601613CE024FFd0
0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0 100 c3t500601613CE024FFd1
<>

Wed Jun 15 14:13:47 2016

extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.0 28.9 0.0 215.5 0.0 0.0 0.0 0.2 0 1 c0t1d0
1.0 133.7 0.0 136928.9 0.0 0.8 0.0 6.0 0 81 c3t500601613CE024FFd0
0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0 100 c3t500601613CE024FFd1
52.7 9.4 8.6 0.0 0.0 0.0 0.0 0.0 0 0 ssd52

Wed Jun 15 14:13:48 2016

extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.0 5.0 0.0 152.4 0.0 0.0 0.0 1.5 0 1 c0t1d0
0.0 156.4 0.0 160165.2 0.0 0.9 0.0 5.7 0 88 c3t500601613CE024FFd0
0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0 100 c3t500601613CE024FFd1
Wed Jun 15 14:13:49 2016

extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
1.0 12.0 0.5 24.0 0.0 0.0 0.0 0.4 0 1 c0t1d0
0.0 121.0 0.0 123942.0 0.0 0.9 0.0 7.5 0 91 c3t500601613CE024FFd0
1.0 152.0 0.0 155694.8 0.0 0.8 0.0 5.1 0 78 c3t500601613CE024FFd1 ------> IO recovered here

 

On the other hand, the same test was repeated but with MPxIO installed on the server. For testing purposes, only one path was used.
Surprisingly, with MPxIO this IO interruption wasn't observed.

If we look at the iostats supplied, at the time LUs were added or removed, the reads and writes are running normally. There are no IO drops.
Just taking an example, LU instance ssd34 was added at 13:56:53:

Logs on the messages file:

Jun 15 13:56:53 host1 scsi: [ID 107833 kern.warning] WARNING: /scsi_vhci/ssd@g60060160bcc022009a1b3f00352de611 (ssd21):
Jun 15 13:56:53 host1 Error for Command: write(10) Error Level: Retryable
Jun 15 13:56:53 host1 scsi: [ID 107833 kern.notice] Requested Block: 13249024 Error Block: 13249024
Jun 15 13:56:53 host1 scsi: [ID 107833 kern.notice] Vendor: DGC Serial Number: D100001595CL
Jun 15 13:56:53 host1 scsi: [ID 107833 kern.notice] Sense Key: Unit_Attention
Jun 15 13:56:53 host1 scsi: [ID 107833 kern.notice] ASC: 0x3f (reported LUNs data has changed), ASCQ: 0xe, FRU: 0x0

Jun 15 13:56:53 host1 scsi: [ID 107833 kern.warning] WARNING: /scsi_vhci/ssd@g60060160bcc022009b1b3f00352de611 (ssd20):
Jun 15 13:56:53 host1 Error for Command: write(10) Error Level: Retryable
Jun 15 13:56:53 host1 scsi: [ID 107833 kern.notice] Requested Block: 13279744 Error Block: 13279744
Jun 15 13:56:53 host1 scsi: [ID 107833 kern.notice] Vendor: DGC Serial Number: D20000159BCL
Jun 15 13:56:53 host1 scsi: [ID 107833 kern.notice] Sense Key: Unit_Attention
Jun 15 13:56:53 host1 scsi: [ID 107833 kern.notice] ASC: 0x3f (reported LUNs data has changed), ASCQ: 0xe, FRU: 0x0

Jun 15 13:56:54 host1 scsi: [ID 583861 kern.info] ssd34 at scsi_vhci0: unit-address g60060160bcc0220008ad4506352de611: g60060160bcc0220008ad4506352de611
Jun 15 13:56:54 host1 genunix: [ID 936769 kern.info] ssd34 is /scsi_vhci/ssd@g60060160bcc0220008ad4506352de611
Jun 15 13:56:54 host1 scsi: [ID 107833 kern.warning] WARNING: /scsi_vhci/ssd@g60060160bcc0220008ad4506352de611 (ssd34):
Jun 15 13:56:54 host1 Corrupt label; wrong magic number
Jun 15 13:56:54 host1 genunix: [ID 408114 kern.info] /scsi_vhci/ssd@g60060160bcc0220008ad4506352de611 (ssd34) online
Jun 15 13:56:54 host1 genunix: [ID 483743 kern.info] /scsi_vhci/ssd@g60060160bcc0220008ad4506352de611 (ssd34) multipath status: degraded: path 35 fp7/ssd@w500601613ce024ff,10 is online
Jun 15 13:56:54 host1 scsi: [ID 107833 kern.warning] WARNING: /scsi_vhci/ssd@g60060160bcc0220008ad4506352de611 (ssd34):
Jun 15 13:56:54 host1 Corrupt label; wrong magic number
Jun 15 13:56:59 host1 scsi: [ID 107833 kern.warning] WARNING: /scsi_vhci/ssd@g60060160bcc022009a1b3f00352de611 (ssd21):

Jun 15 13:56:59 host1 Error for Command: write(10) Error Level: Retryable
Jun 15 13:56:59 host1 scsi: [ID 107833 kern.notice] Requested Block: 14215680 Error Block: 14215680
Jun 15 13:56:59 host1 scsi: [ID 107833 kern.notice] Vendor: DGC Serial Number: D100001595CL
Jun 15 13:56:59 host1 scsi: [ID 107833 kern.notice] Sense Key: Unit_Attention
Jun 15 13:56:59 host1 scsi: [ID 107833 kern.notice] ASC: 0x3f (reported LUNs data has changed), ASCQ: 0xe, FRU: 0x0
Jun 15 13:56:59 host1 scsi: [ID 107833 kern.warning] WARNING: /scsi_vhci/ssd@g60060160bcc022009b1b3f00352de611 (ssd20):
Jun 15 13:56:59 host1 Error for Command: write(10) Error Level: Retryable
Jun 15 13:56:59 host1 scsi: [ID 107833 kern.notice] Requested Block: 14252544 Error Block: 14252544
Jun 15 13:56:59 host1 scsi: [ID 107833 kern.notice] Vendor: DGC Serial Number: D20000159BCL
Jun 15 13:56:59 host1 scsi: [ID 107833 kern.notice] Sense Key: Unit_Attention
Jun 15 13:56:59 host1 scsi: [ID 107833 kern.notice] ASC: 0x3f (reported LUNs data has changed), ASCQ: 0xe, FRU: 0x0

 

If we now look on iostat, there's no IO drop on this period:

Wed Jun 15 13:56:52 2016

extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.0 91.0 0.0 93230.2 0.0 0.9 0.0 10.3 0 94 c6t60060160BCC022009B1B3F00352DE611d0
0.0 91.0 0.0 93230.2 0.0 0.9 0.0 10.2 0 0 c6t60060160BCC022009B1B3F00352DE611d0.t500601613ce024ff
0.0 91.0 0.0 93230.2 0.0 0.9 0.0 10.2 0 93 c6t60060160BCC022009B1B3F00352DE611d0.t500601613ce024ff.fp7
0.0 92.0 0.0 94254.8 0.0 0.9 0.0 10.2 0 94 c6t60060160BCC022009A1B3F00352DE611d0
0.0 92.0 0.0 94254.8 0.0 0.9 0.0 10.1 0 0 c6t60060160BCC022009A1B3F00352DE611d0.t500601613ce024ff
0.0 92.0 0.0 94254.8 0.0 0.9 0.0 10.1 0 93 c6t60060160BCC022009A1B3F00352DE611d0.t500601613ce024ff.fp7

Wed Jun 15 13:56:53 2016

extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.0 95.0 0.0 97308.7 0.0 0.9 0.0 9.8 0 93 c6t60060160BCC022009B1B3F00352DE611d0
0.0 95.0 0.0 97308.7 0.0 0.9 0.0 9.8 0 0 c6t60060160BCC022009B1B3F00352DE611d0.t500601613ce024ff
0.0 95.0 0.0 97308.7 0.0 0.9 0.0 9.8 0 93 c6t60060160BCC022009B1B3F00352DE611d0.t500601613ce024ff.fp7
0.0 94.0 0.0 96284.3 0.0 0.9 0.0 9.9 0 93 c6t60060160BCC022009A1B3F00352DE611d0
0.0 94.0 0.0 96284.3 0.0 0.9 0.0 9.9 0 0 c6t60060160BCC022009A1B3F00352DE611d0.t500601613ce024ff
0.0 94.0 0.0 96284.3 0.0 0.9 0.0 9.9 0 93 c6t60060160BCC022009A1B3F00352DE611d0.t500601613ce024ff.fp7

<>
<>
<>
<>
<>
<>
<>
<>
<>
<>
<>
<>
<>
<>
<>
<>
<>
<>
<>

Wed Jun 15 13:56:54 2016

extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.0 43.9 0.0 339.2 0.0 0.0 0.0 0.2 0 1 c0t1d0
1.0 92.8 0.0 95010.6 0.0 0.8 0.0 8.9 0 84 c6t60060160BCC022009B1B3F00352DE611d0
1.0 93.8 0.0 96032.3 0.0 0.8 0.0 8.8 0 0 c6t60060160BCC022009B1B3F00352DE611d0.t500601613ce024ff
1.0 93.8 0.0 96032.3 0.0 0.8 0.0 8.8 0 84 c6t60060160BCC022009B1B3F00352DE611d0.t500601613ce024ff.fp7
1.0 89.8 0.0 91946.1 0.0 0.8 0.0 9.2 0 84 c6t60060160BCC022009A1B3F00352DE611d0
1.0 90.8 0.0 92967.8 0.0 0.8 0.0 9.1 0 0 c6t60060160BCC022009A1B3F00352DE611d0.t500601613ce024ff
1.0 90.8 0.0 92967.8 0.0 0.8 0.0 9.1 0 83 c6t60060160BCC022009A1B3F00352DE611d0.t500601613ce024ff.fp7
95.8 17.1 15.6 0.0 0.0 0.0 0.0 0.0 0 0 ssd34
25.5 0.0 7.6 0.0 0.0 0.0 0.0 0.1 0 0 ssd34.t1
25.5 0.0 7.6 0.0 0.0 0.0 0.0 0.1 0 0 ssd34.t1.fp7

Wed Jun 15 13:56:55 2016

extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.0 92.3 0.0 94473.3 0.0 0.9 0.0 10.1 0 9 c6t60060160BCC022009B1B3F00352DE611d0
0.0 92.3 0.0 94473.3 0.0 0.9 0.0 10.1 0 0 c6t60060160BCC022009B1B3F00352DE611d0.t500601613ce024ff
0.0 92.3 0.0 94473.3 0.0 0.9 0.0 10.1 0 93 c6t60060160BCC022009B1B3F00352DE611d0.t500601613ce024ff.fp7
0.0 93.3 0.0 95500.2 0.0 0.9 0.0 10.0 0 93 c6t60060160BCC022009A1B3F00352DE611d0
0.0 93.3 0.0 95500.2 0.0 0.9 0.0 10.0 0 0 c6t60060160BCC022009A1B3F00352DE611d0.t500601613ce024ff
0.0 93.3 0.0 95500.2 0.0 0.9 0.0 10.0 0 93 c6t60060160BCC022009A1B3F00352DE611d0.t500601613ce024ff.fp7

 

Changes

 

Cause

To view full details, sign in with your My Oracle Support account.

Don't have a My Oracle Support account? Click to get started!


In this Document
Symptoms
Changes
Cause
Solution
References

My Oracle Support provides customers with access to over a million knowledge articles and a vibrant support community of peers and Oracle experts.