NVMe Timeout Errors/Abort Reset Resulted in Server Crash
(Doc ID 2382645.1)
Last updated on MARCH 24, 2023
Applies to:
Linux OS - Version Oracle Linux 5.4 and laterOracle Cloud Infrastructure - Version N/A and later
Linux x86-64
Symptoms
System crashed with the below call traces which indicate that the nvme is not reachable and it caused kernel panic.
Call trace:
PID: 4656 TASK: ffff880ee9850000 CPU: 7 COMMAND: "nvme2" #0 [ffff880ee7dd3c28] __schedule at ffffffff81699270 #1 [ffff880ee7dd3c80] schedule at ffffffff8169993e #2 [ffff880ee7dd3ca0] blk_mq_freeze_queue_wait at ffffffff812f51bd #3 [ffff880ee7dd3d00] blk_mq_freeze_queue at ffffffff812f710e #4 [ffff880ee7dd3d20] blk_cleanup_queue at ffffffff812ea822 #5 [ffff880ee7dd3d60] nvme_ns_remove at ffffffffa002e645 [nvme] #6 [ffff880ee7dd3d80] nvme_dev_remove at ffffffffa002e720 [nvme] #7 [ffff880ee7dd3da0] nvme_remove at ffffffffa002e7c8 [nvme] #8 [ffff880ee7dd3dc0] pci_device_remove at ffffffff8135eec6 #9 [ffff880ee7dd3df0] __device_release_driver at ffffffff8145cf57 #10 [ffff880ee7dd3e10] device_release_driver at ffffffff8145d0ed #11 [ffff880ee7dd3e30] pci_stop_bus_device at ffffffff81357e6c #12 [ffff880ee7dd3e60] pci_stop_and_remove_bus_device at ffffffff81357ff6 #13 [ffff880ee7dd3e80] pci_stop_and_remove_bus_device_locked at ffffffff8135802e #14 [ffff880ee7dd3ea0] nvme_remove_dead_ctrl at ffffffffa002e50e [nvme] #15 [ffff880ee7dd3ec0] kthread at ffffffff810a46be #16 [ffff880ee7dd3f50] ret_from_fork at ffffffff8169df62
Logs:
[ 396.634130] nvme 0000:96:00.0: Abort status:0 result:1 [...] [ 396.638885] nvme 0000:96:00.0: Abort status:0 result:1 [ 426.621567] nvme 0000:96:00.0: Timeout I/O 50 QID 6 [ 426.621877] nvme 0000:96:00.0: I/O 50 QID 6 timeout, reset controller [ 426.622149] nvme 0000:96:00.0: Timeout I/O 64 QID 6 [ 426.622415] nvme 0000:96:00.0: I/O 64 QID 6 timeout, reset controller [ 426.622683] nvme 0000:96:00.0: Timeout I/O 65 QID 6 [...] [ 427.626374] nvme 0000:96:00.0: Timeout I/O 772 QID 6 [ 427.626663] nvme 0000:96:00.0: Timeout I/O 773 QID 6 [ 429.047461] nvme 0000:96:00.0: Cancelling I/O 50 QID 6 [ 429.047758] nvme 0000:96:00.0: completing aborted command with status:0007 [ 429.048026] blk_update_request: I/O error, dev nvme2n1, sector 4295376872 [ 429.048295] Buffer I/O error on dev dm-0, logical block 536921853, lost async page write [ 429.048752] nvme 0000:96:00.0: Cancelling I/O 64 QID 6 [ 429.049012] nvme 0000:96:00.0: completing aborted command with status:0007 [ 429.049282] blk_update_request: I/O error, dev nvme2n1, sector 4295376888 [ 429.049558] Buffer I/O error on dev dm-0, logical block 536921855, lost async page write [...] [ 429.064009] nvme 0000:96:00.0: Cancelling I/O 635 QID 6 [ 429.064253] nvme 0000:96:00.0: completing aborted command with status:0007
[...] [ 429.070077] nvme 0000:96:00.0: Cancelling I/O 773 QID 6 [ 429.070319] nvme 0000:96:00.0: completing aborted command with status:0007 [ 429.070620] xen: registering gsi 69 triggering 0 polarity 1 [ 429.070636] Already setup the GSI :69 [ 429.171640] BUG: unable to handle kernel NULL pointer dereference at 000000000000001c [ 429.172227] IP: [<ffffffffa0317587>] nvme_wait_ready+0x57/0xf0 [nvme] [ 429.172558] PGD 0 [ 429.172873] Oops: 0000 [#1] SMP [ 429.173269] Modules linked in: ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip_tables xen_netback xen_blkback xen_gntalloc xen_gntdev xen_evtchn xenfs xen_privcmd bnx2fc fcoe libfcoe libfc scsi_transport_fc 8021q mrp garp bridge stp llc bonding dm_multipath iTCO_wdt iTCO_vendor_support pcspkr shpchp sb_edac edac_core i2c_i801 i2c_core cdc_ether usbnet mii lpc_ich mfd_core sg mlx5_ib ib_core ib_addr mlx5_core ipmi_devintf ipmi_si ipmi_msghandler ext4 jbd2 mbcache sr_mod cdrom sd_mod mxm_wmi nvme ahci libahci ixgbe dca ptp pps_core vxlan udp_tunnel ip6_udp_tunnel megaraid_sas wmi crc32c_intel be2iscsi bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi ipv6 cxgb3 mdio libiscsi_tcp qla4xxx iscsi_boot_sysfs libiscsi scsi_transport_iscsi dm_mirror [ 429.180908] dm_region_hash dm_log dm_mod [ 429.181291] CPU: 9 PID: 158 Comm: kworker/9:1 Not tainted 4.1.12-61.1.10.el6uek.x86_64 #2 [ 429.181720] Hardware name: Oracle Corporation ORACLE SERVER X6-2/ASM,MOTHERBOARD,1U, BIOS 38030200 03/21/2016 [ 429.182162] Workqueue: events nvme_probe_work [nvme] [ 429.182486] task: ffff880477910e00 ti: ffff88047799c000 task.ti: ffff88047799c000 [ 429.182915] RIP: e030:[<ffffffffa0317587>] [<ffffffffa0317587>] nvme_wait_ready+0x57/0xf0 [nvme] [ 429.183402] RSP: e02b:ffff88047799fd08 EFLAGS: 00010216 [ 429.183646] RAX: 0000000000000000 RBX: ffff88047782a000 RCX: 0000000080000200 [ 429.183898] RDX: 0000000000000000 RSI: 0000000080000200 RDI: 0000000080000200 [ 429.184171] RBP: ffff88047799fd38 R08: ffff88047799c000 R09: 0000000000000001 [ 429.184423] R10: 0000000000007ff0 R11: 0000000000000001 R12: 0000000000000000 [ 429.184673] R13: ffff880477910e00 R14: 00000001000234e5 R15: 0000000000000000 [ 429.184929] FS: 0000000000000000(0000) GS:ffff880487040000(0000) knlGS:0000000000000000 [ 429.185357] CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 429.185632] CR2: 000000000000001c CR3: 0000000001a8a000 CR4: 0000000000042660 [ 429.185883] Stack: [ 429.186116] ffff88047799fd18 ffff88047782a000 ffff88047782a190 00f00020500103ff [ 429.186741] ffff88048705b800 0000000000000000 ffff88047799fd78 ffffffffa031bad5 [ 429.187392] ffff88047799fd78 ffffffffa0318251 ffff88047782a000 ffff88047782a000 [ 429.188016] Call Trace: [ 429.188255] [<ffffffffa031bad5>] nvme_configure_admin_queue+0x85/0x210 [nvme] [ 429.188700] [<ffffffffa0318251>] ? nvme_dev_map+0xe1/0x330 [nvme] [ 429.188949] [<ffffffffa031cb2b>] nvme_probe_work+0x3b/0x330 [nvme] [ 429.189199] [<ffffffff810ac0b8>] ? finish_task_switch+0x78/0x1c0 [ 429.189448] [<ffffffff8109f07e>] process_one_work+0x14e/0x4b0 [ 429.189695] [<ffffffff8109f500>] worker_thread+0x120/0x480 [ 429.189941] [<ffffffff816c69a9>] ? __schedule+0x309/0x880 [ 429.190188] [<ffffffff8109f3e0>] ? process_one_work+0x4b0/0x4b0 [ 429.190453] [<ffffffff8109f3e0>] ? process_one_work+0x4b0/0x4b0 [ 429.190701] [<ffffffff810a46de>] kthread+0xce/0xf0 [ 429.190946] [<ffffffff810a4610>] ? kthread_freezable_should_stop+0x70/0x70 [ 429.191197] [<ffffffff816cb662>] ret_from_fork+0x42/0x70 [ 429.191442] [<ffffffff810a4610>] ? kthread_freezable_should_stop+0x70/0x70 [ 429.191693] Code: 00 44 0f b6 e2 48 83 c6 01 48 69 f6 e8 03 00 00 65 4c 8b 2c 25 80 b8 00 00 48 d1 ee 4e 8d 34 36 0f 1f 40 00 48 8b 83 48 01 00 00 <8b> 40 1c 83 e0 01 44 39 e0 74 66 bf 64 00 00 00 e8 e4 7b dd e0 [ 429.195632] RIP [<ffffffffa0317587>] nvme_wait_ready+0x57/0xf0 [nvme] [ 429.195933] RSP <ffff88047799fd08> [ 429.196170] CR2: 000000000000001c
Changes
Cause
To view full details, sign in with your My Oracle Support account. |
|
Don't have a My Oracle Support account? Click to get started! |
In this Document
Symptoms |
Changes |
Cause |
Solution |
References |