My Oracle Support Banner

NVMe Timeout Errors/Abort Reset Resulted in Server Crash (Doc ID 2382645.1)

Last updated on MARCH 24, 2023

Applies to:

Linux OS - Version Oracle Linux 5.4 and later
Oracle Cloud Infrastructure - Version N/A and later
Linux x86-64

Symptoms

System crashed with the below call traces which indicate that the  nvme is not reachable and it caused kernel panic.

Call trace:

PID: 4656 TASK: ffff880ee9850000 CPU: 7 COMMAND: "nvme2"
#0 [ffff880ee7dd3c28] __schedule at ffffffff81699270
#1 [ffff880ee7dd3c80] schedule at ffffffff8169993e
#2 [ffff880ee7dd3ca0] blk_mq_freeze_queue_wait at ffffffff812f51bd
#3 [ffff880ee7dd3d00] blk_mq_freeze_queue at ffffffff812f710e
#4 [ffff880ee7dd3d20] blk_cleanup_queue at ffffffff812ea822
#5 [ffff880ee7dd3d60] nvme_ns_remove at ffffffffa002e645 [nvme]
#6 [ffff880ee7dd3d80] nvme_dev_remove at ffffffffa002e720 [nvme]
#7 [ffff880ee7dd3da0] nvme_remove at ffffffffa002e7c8 [nvme]
#8 [ffff880ee7dd3dc0] pci_device_remove at ffffffff8135eec6
#9 [ffff880ee7dd3df0] __device_release_driver at ffffffff8145cf57
#10 [ffff880ee7dd3e10] device_release_driver at ffffffff8145d0ed
#11 [ffff880ee7dd3e30] pci_stop_bus_device at ffffffff81357e6c
#12 [ffff880ee7dd3e60] pci_stop_and_remove_bus_device at ffffffff81357ff6
#13 [ffff880ee7dd3e80] pci_stop_and_remove_bus_device_locked at ffffffff8135802e
#14 [ffff880ee7dd3ea0] nvme_remove_dead_ctrl at ffffffffa002e50e [nvme]
#15 [ffff880ee7dd3ec0] kthread at ffffffff810a46be
#16 [ffff880ee7dd3f50] ret_from_fork at ffffffff8169df62

Logs:

[ 396.634130] nvme 0000:96:00.0: Abort status:0 result:1
[...]
[ 396.638885] nvme 0000:96:00.0: Abort status:0 result:1
[ 426.621567] nvme 0000:96:00.0: Timeout I/O 50 QID 6
[ 426.621877] nvme 0000:96:00.0: I/O 50 QID 6 timeout, reset controller
[ 426.622149] nvme 0000:96:00.0: Timeout I/O 64 QID 6
[ 426.622415] nvme 0000:96:00.0: I/O 64 QID 6 timeout, reset controller
[ 426.622683] nvme 0000:96:00.0: Timeout I/O 65 QID 6
[...]
[ 427.626374] nvme 0000:96:00.0: Timeout I/O 772 QID 6
[ 427.626663] nvme 0000:96:00.0: Timeout I/O 773 QID 6
[ 429.047461] nvme 0000:96:00.0: Cancelling I/O 50 QID 6
[ 429.047758] nvme 0000:96:00.0: completing aborted command with status:0007
[ 429.048026] blk_update_request: I/O error, dev nvme2n1, sector 4295376872
[ 429.048295] Buffer I/O error on dev dm-0, logical block 536921853, lost async page write
[ 429.048752] nvme 0000:96:00.0: Cancelling I/O 64 QID 6
[ 429.049012] nvme 0000:96:00.0: completing aborted command with status:0007
[ 429.049282] blk_update_request: I/O error, dev nvme2n1, sector 4295376888
[ 429.049558] Buffer I/O error on dev dm-0, logical block 536921855, lost async page write
[...]
[ 429.064009] nvme 0000:96:00.0: Cancelling I/O 635 QID 6
[ 429.064253] nvme 0000:96:00.0: completing aborted command with status:0007
[...]
[ 429.070077] nvme 0000:96:00.0: Cancelling I/O 773 QID 6
[ 429.070319] nvme 0000:96:00.0: completing aborted command with status:0007
[ 429.070620] xen: registering gsi 69 triggering 0 polarity 1
[ 429.070636] Already setup the GSI :69
[ 429.171640] BUG: unable to handle kernel NULL pointer dereference at
000000000000001c
[ 429.172227] IP: [<ffffffffa0317587>] nvme_wait_ready+0x57/0xf0 [nvme]
[ 429.172558] PGD 0
[ 429.172873] Oops: 0000 [#1] SMP
[ 429.173269] Modules linked in: ipt_MASQUERADE nf_nat_masquerade_ipv4 
iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack
ip_tables xen_netback xen_blkback xen_gntalloc xen_gntdev xen_evtchn xenfs
xen_privcmd bnx2fc fcoe libfcoe libfc scsi_transport_fc 8021q mrp garp bridge
stp llc bonding dm_multipath iTCO_wdt iTCO_vendor_support pcspkr shpchp
sb_edac edac_core i2c_i801 i2c_core cdc_ether usbnet mii lpc_ich mfd_core sg
mlx5_ib ib_core ib_addr mlx5_core ipmi_devintf ipmi_si ipmi_msghandler ext4
jbd2 mbcache sr_mod cdrom sd_mod mxm_wmi nvme ahci libahci ixgbe dca ptp
pps_core vxlan udp_tunnel ip6_udp_tunnel megaraid_sas wmi crc32c_intel
be2iscsi bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi ipv6 cxgb3 mdio
libiscsi_tcp qla4xxx iscsi_boot_sysfs libiscsi scsi_transport_iscsi dm_mirror
[ 429.180908] dm_region_hash dm_log dm_mod
[ 429.181291] CPU: 9 PID: 158 Comm: kworker/9:1 Not tainted
4.1.12-61.1.10.el6uek.x86_64 #2
[ 429.181720] Hardware name: Oracle Corporation ORACLE SERVER
X6-2/ASM,MOTHERBOARD,1U, BIOS 38030200 03/21/2016
[ 429.182162] Workqueue: events nvme_probe_work [nvme]
[ 429.182486] task: ffff880477910e00 ti: ffff88047799c000 task.ti:
ffff88047799c000
[ 429.182915] RIP: e030:[<ffffffffa0317587>] [<ffffffffa0317587>]
nvme_wait_ready+0x57/0xf0 [nvme]
[ 429.183402] RSP: e02b:ffff88047799fd08 EFLAGS: 00010216
[ 429.183646] RAX: 0000000000000000 RBX: ffff88047782a000 RCX:
0000000080000200
[ 429.183898] RDX: 0000000000000000 RSI: 0000000080000200 RDI:
0000000080000200
[ 429.184171] RBP: ffff88047799fd38 R08: ffff88047799c000 R09:
0000000000000001
[ 429.184423] R10: 0000000000007ff0 R11: 0000000000000001 R12:
0000000000000000
[ 429.184673] R13: ffff880477910e00 R14: 00000001000234e5 R15:
0000000000000000
[ 429.184929] FS: 0000000000000000(0000) GS:ffff880487040000(0000)
knlGS:0000000000000000
[ 429.185357] CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 429.185632] CR2: 000000000000001c CR3: 0000000001a8a000 CR4:
0000000000042660
[ 429.185883] Stack:
[ 429.186116] ffff88047799fd18 ffff88047782a000 ffff88047782a190
00f00020500103ff
[ 429.186741] ffff88048705b800 0000000000000000 ffff88047799fd78
ffffffffa031bad5
[ 429.187392] ffff88047799fd78 ffffffffa0318251 ffff88047782a000
ffff88047782a000
[ 429.188016] Call Trace:
[ 429.188255] [<ffffffffa031bad5>] nvme_configure_admin_queue+0x85/0x210
[nvme]
[ 429.188700] [<ffffffffa0318251>] ? nvme_dev_map+0xe1/0x330 [nvme]
[ 429.188949] [<ffffffffa031cb2b>] nvme_probe_work+0x3b/0x330 [nvme]
[ 429.189199] [<ffffffff810ac0b8>] ? finish_task_switch+0x78/0x1c0
[ 429.189448] [<ffffffff8109f07e>] process_one_work+0x14e/0x4b0
[ 429.189695] [<ffffffff8109f500>] worker_thread+0x120/0x480
[ 429.189941] [<ffffffff816c69a9>] ? __schedule+0x309/0x880
[ 429.190188] [<ffffffff8109f3e0>] ? process_one_work+0x4b0/0x4b0
[ 429.190453] [<ffffffff8109f3e0>] ? process_one_work+0x4b0/0x4b0
[ 429.190701] [<ffffffff810a46de>] kthread+0xce/0xf0
[ 429.190946] [<ffffffff810a4610>] ?
kthread_freezable_should_stop+0x70/0x70
[ 429.191197] [<ffffffff816cb662>] ret_from_fork+0x42/0x70
[ 429.191442] [<ffffffff810a4610>] ?
kthread_freezable_should_stop+0x70/0x70
[ 429.191693] Code: 00 44 0f b6 e2 48 83 c6 01 48 69 f6 e8 03 00 00 65 4c 8b
2c 25 80 b8 00 00 48 d1 ee 4e 8d 34 36 0f 1f 40 00 48 8b 83 48 01 00 00 <8b>
40 1c 83 e0 01 44 39 e0 74 66 bf 64 00 00 00 e8 e4 7b dd e0
[ 429.195632] RIP [<ffffffffa0317587>] nvme_wait_ready+0x57/0xf0 [nvme]
[ 429.195933] RSP <ffff88047799fd08>
[ 429.196170] CR2: 000000000000001c

Changes

 

Cause

To view full details, sign in with your My Oracle Support account.

Don't have a My Oracle Support account? Click to get started!


In this Document
Symptoms
Changes
Cause
Solution
References


My Oracle Support provides customers with access to over a million knowledge articles and a vibrant support community of peers and Oracle experts.