My Oracle Support Banner

Multipath I/O Stalled Indefinitely Until Reboot (Doc ID 2263513.1)

Last updated on AUGUST 20, 2018

Applies to:

Linux OS - Version Oracle Linux 6.1 with Unbreakable Enterprise Kernel [2.6.39] and later
Linux x86-64

Symptoms

The following entries found in the /var/log/messages:

Apr 18 18:52:27 hostname kernel: qla2xxx [0000:20:00.0]-801c:7: Abort command issued nexus=7:0:266 -- 1 2002.
Apr 18 18:52:28 hostname kernel: qla2xxx [0000:20:00.0]-801c:7: Abort command issued nexus=7:0:266 -- 1 2002.
Apr 18 18:52:29 hostname kernel: qla2xxx [0000:20:00.0]-801c:7: Abort command issued nexus=7:0:204 -- 1 2002.
Apr 18 18:52:30 hostname kernel: qla2xxx [0000:20:00.0]-801c:7: Abort command issued nexus=7:0:42 -- 1 2002.
Apr 18 18:52:31 hostname kernel: qla2xxx [0000:20:00.0]-801c:7: Abort command issued nexus=7:0:221 -- 1 2002.
Apr 18 18:52:31 hostname kernel: INFO: task multipathd:118326 blocked for more than 120 seconds.
Apr 18 18:52:31 hostname kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 18 18:52:32 hostname kernel: multipathd D ffff89a6f6396aa8 0 118326 1 0x00000080
Apr 18 18:52:32 hostname kernel: ffff89a1f9bf1d78 0000000000000082 0000004a00000002 ffff8b0000000001
Apr 18 18:52:32 hostname kernel: 0000000000011180 ffff89a1f9bf1fd8 ffff89a1f9bf0010 0000000000011180
Apr 18 18:52:32 hostname kernel: ffff89a1f9bf1fd8 0000000000011180 ffff89a6fa3044c0 ffff89a6f6396500
Apr 18 18:52:32 hostname kernel: Call Trace:
Apr 18 18:52:32 hostname kernel: [<ffffffff8150b0ef>] schedule+0x3f/0x60
Apr 18 18:52:32 hostname kernel: [<ffffffff81367025>] scsi_block_when_processing_errors+0xd5/0x100
Apr 18 18:52:32 hostname kernel: [<ffffffff81091590>] ? wake_up_bit+0x40/0x40
Apr 18 18:52:32 hostname kernel: [<ffffffff81363bf8>] scsi_nonblockable_ioctl+0x78/0xe0
Apr 18 18:52:32 hostname kernel: [<ffffffffa011bc7c>] sd_ioctl+0x7c/0x110 [sd_mod]
Apr 18 18:52:32 hostname kernel: [<ffffffff812431f8>] __blkdev_driver_ioctl+0x28/0x30
Apr 18 18:52:32 hostname kernel: [<ffffffff812436ce>] blkdev_ioctl+0x1fe/0x6e0
Apr 18 18:52:32 hostname kernel: [<ffffffff811a102c>] block_ioctl+0x3c/0x40
Apr 18 18:52:32 hostname kernel: [<ffffffff8117effc>] do_vfs_ioctl+0x8

 

After enabling extended logging on the HBA, we can see the FC port state oscillating between ONLINE and LOST every few seconds.:

 

Apr 20 03:20:53 hostname kernel: qla2xxx [0000:20:00.0]-3841:7: Port down status: port-state=0x4.
Apr 20 03:20:53 hostname kernel: qla2xxx [0000:20:00.0]-287d:7: FCPort state transitioned from ONLINE to LOST - portid=01c500.
Apr 20 03:20:53 hostname kernel: qla2xxx [0000:20:00.0]-2867:7: Port login retry 50060e8007df9901, id = 0x0001 retry cnt=30.
Apr 20 03:20:53 hostname kernel: qla2xxx [0000:20:00.0]-3822:7: FCP command status: 0x29-0x0 (0xe0000) nexus=7:0:87 portid=01c500 oxid=0x0 cdb=2a000426858000001000 len=0x2000 rsp_info=0x0 resid=0x0 fw_resid=0x0.
Apr 20 03:20:53 hostname kernel: qla2xxx [0000:20:00.0]-3841:7: Port down status: port-state=0x3.
Apr 20 03:20:53 hostname kernel: qla2xxx [0000:20:00.0]-3822:7: FCP command status: 0x29-0x0 (0xe0000) nexus=7:0:14 portid=01c500 oxid=0x0 cdb=2a000d8340c000002000 len=0x4000 rsp_info=0x0 resid=0x0 fw_resid=0x0.
Apr 20 03:20:53 hostname kernel: qla2xxx [0000:20:00.0]-480d:7: Relogin scheduled.
Apr 20 03:20:53 hostname kernel: qla2xxx [0000:20:00.0]-480e:7: Relogin end.
Apr 20 03:20:53 hostname kernel: qla2xxx [0000:20:00.0]-4800:7: DPC handler sleeping.
Apr 20 03:20:53 hostname kernel: qla2xxx [0000:20:00.0]-2872:7: Async-login - hdl=213, loopid=1 portid=01c500 retries=29.
Apr 20 03:20:53 hostname kernel: qla2xxx [0000:20:00.0]-4800:7: DPC handler sleeping.
Apr 20 03:20:53 hostname kernel: qla2xxx [0000:20:00.0]-5836:7: Async-login complete - hdl=213 portid=01c500 iop0=12.
Apr 20 03:20:53 hostname kernel: qla2xxx [0000:20:00.0]-287d:7: FCPort state transitioned from LOST to ONLINE - portid=01c500.
Apr 20 03:20:53 hostname kernel: qla2xxx [0000:20:00.0]-4800:7: DPC handler sleeping.

 

Checking with the vmstat(8) report from OSWatcher  shows a large number of blocked processes:

 

zzz ***Tue Apr 18 18:52:09 PDT 2017
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
7 1801 0 619694912 1164876 44046568 0 0 14 1 0 0 0 0 99 0 0
8 1873 0 619367680 1164876 44048508 0 0 1897 3928 74072 110771 2 2 63 32 0                      ==>High number of blocked process
7 1949 0 619044480 1164876 44049676 0 0 1816 52 66119 100817 2 1 63 34 0
zzz ***Tue Apr 18 18:52:19 PDT 2017
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
6 2517 0 616770048 1164880 44049832 0 0 14 1 0 0 0 0 99 0 0
4 2574 0 616711296 1164880 44051388 0 0 1743 0 59280 94987 1 2 62 36 0
4 2631 0 616657408 1164880 44052816 0 0 1530 56 59751 94591 1 1 62 36 0
zzz ***Tue Apr 18 18:52:29 PDT 2017
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
1 3096 0 615147968 1164880 44052464 0 0 14 1 0 0 0 0 99 0 0
7 3182 0 614754944 1164880 44054096 0 0 2410 0 73300 104292 2 2 61 35 0
1 3268 0 614536832 1164880 44055640 0 0 2118 164 62354 99420 2 1 61 37 0

 

 

SLCG1000_9_25a9 (360060e8007df99000030df99000025a9) dm-400 HITACHI,OPEN-V
size=544G features='1 queue_if_no_path' hwhandler='0' wp=rw                                  =====>Queue if no path feature enabled
'-+- policy='round-robin 0' prio=0 status=active
|- 7:0:0:395 sdbax 71:1424 active undef running
|- 9:0:0:395 sdbdk 131:1440 active undef running
|- 11:0:0:395 sdbfx 135:1456 active undef running
'- 13:0:0:395 sdbik 67:1728 active undef running
SLCG1000_9_24ef (360060e8007df99000030df99000024ef) dm-348 HITACHI,OPEN-V
size=544G features='1 queue_if_no_path' hwhandler='0' wp=rw
'-+- policy='round-robin 0' prio=0 status=active
|- 7:0:0:343 sdayx 68:1360 active undef running
|- 9:0:0:343 sdbbk 128:1376 active undef running
|- 11:0:0:343 sdbdx 132:1392 active undef running
'- 13:0:0:343 sdbgk 8:1664 active undef running

 

Cause

To view full details, sign in with your My Oracle Support account.

Don't have a My Oracle Support account? Click to get started!


In this Document
Symptoms
Cause
Solution


My Oracle Support provides customers with access to over a million knowledge articles and a vibrant support community of peers and Oracle experts.