My Oracle Support Banner

Exadata Cell Node Crashed at "memcpy_erms+0x6/0x10" After a Megaraid HBA Firmware Crash due to Disk Failure (Doc ID 2581914.1)

Last updated on AUGUST 30, 2019

Applies to:

Linux OS - Version Oracle Linux 7.6 and later
Linux x86-64

Symptoms

Exadata Cell Node crashed after Disk failure and below errors reported in the vmcore:

> bt
PID: 48089 TASK: ffff882fb65b9c00 CPU: 36 COMMAND: "mrdiagd"
#0 [ffff88099e963930] machine_kexec at ffffffff81062a6b
#1 [ffff88099e9639a0] crash_kexec at ffffffff8111cff2
#2 [ffff88099e963a70] oops_end at ffffffff8101b938
#3 [ffff88099e963aa0] no_context at ffffffff81746b7c
#4 [ffff88099e963b00] __bad_area_nosemaphore at ffffffff81746c64
#5 [ffff88099e963b50] bad_area_nosemaphore at ffffffff81746dd0
#6 [ffff88099e963b60] __do_page_fault at ffffffff810723a6
#7 [ffff88099e963bd0] do_page_fault at ffffffff810727b0
#8 [ffff88099e963c10] page_fault at ffffffff8175aeef
[exception RIP: memcpy_erms+6]
RIP: ffffffff81358456 RSP: ffff88099e963cc0 RFLAGS: 00010086
RAX: ffff8822c826f000 RBX: ffff8817ca1704c8 RCX: 0000000000000eff
RDX: 0000000000000fff RSI: ffffc900323fc000 RDI: ffff8822c826f100
RBP: ffff88099e963cf8 R8: ffff8817ca1704d8 R9: 0000000000000000
R10: 0000000000001000 R11: 0000000000000293 R12: ffff8817ca171904
R13: 0000000000000292 R14: ffff8822c826f000 R15: 0000000000000fff
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#9 [ffff88099e963cc0] megasas_fw_crash_buffer_show at ffffffffa00fa6aa [megaraid_sas]
#10 [ffff88099e963d00] dev_attr_show at ffffffff8149e913
#11 [ffff88099e963d30] sysfs_kf_seq_show at ffffffff812a134f
#12 [ffff88099e963d50] kernfs_seq_show at ffffffff8129f8d9
#13 [ffff88099e963d60] seq_read at ffffffff81243300
#14 [ffff88099e963de0] kernfs_fop_read at ffffffff812a00e5
#15 [ffff88099e963e30] __vfs_read at ffffffff8121d4ca
#16 [ffff88099e963ec0] vfs_read at ffffffff8121dbe6
#17 [ffff88099e963f00] sys_read at ffffffff8121eb65
#18 [ffff88099e963f50] system_call_fastpath at ffffffff8175579e
RIP: 00007f78cd3d06fd RSP: 00007ffe755b3ae0 RFLAGS: 00000293
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f78cd3d06fd
RDX: 0000000000000fff RSI: 0000000001c6ac70 RDI: 0000000000000012
RBP: 00007ffe755b3c50 R8: 00007f78cd7f2740 R9: 00007f78cc61f20d
R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000405080
R13: 00007ffe755b3ec0 R14: 00000000022c85d0 R15: 0000000000000000
ORIG_RAX: 0000000000000000 CS: 0033 SS: 002b
> log
[ 3509.884284] megaraid_sas 0000:5e:00.0: 2826 (617304407s/0x0002/FATAL) - Unrecoverable medium error during recovery on PD 1b(e0xfc/s5) at 37b955800
[ 3540.805684] sd 8:2:5:0: [sdh] tag#28 task abort called for scmd(ffff882fcd7d08c0)
[ 3540.805691] sd 8:2:5:0: [sdh] tag#28 CDB: Read(16) 88 00 00 00 00 03 7b 95 28 00 00 00 08 00 00 00
[ 3540.805695] sd 8:2:5:0: task abort: FAILED scmd(ffff882fcd7d08c0)
[ 3540.805700] sd 8:2:5:0: [sdh] tag#29 task abort called for scmd(ffff8822caf94c40)
[ 3540.805704] sd 8:2:5:0: [sdh] tag#29 CDB: Read(16) 88 00 00 00 00 03 7b 95 30 00 00 00 08 00 00 00
[ 3540.805707] sd 8:2:5:0: task abort: FAILED scmd(ffff8822caf94c40)
[ 3540.805710] sd 8:2:5:0: [sdh] tag#30 task abort called for scmd(ffff8809c6003800)
...
........
[ 3591.574961] megaraid_sas 0000:5e:00.0: 2968 (617304461s/0x0020/CRIT) - Controller encountered a fatal error and was reset
[ 3591.576672] megaraid_sas 0000:5e:00.0: 3011 (617304475s/0x0002/FATAL) - Puncturing bad block on PD 1b(e0xfc/s5) at 37b955800
[ 3591.576719] megaraid_sas 0000:5e:00.0: 3012 (617304475s/0x0002/FATAL) - Puncturing bad block on PD 1b(e0xfc/s5) at 37b952000
[ 3591.576753] megaraid_sas 0000:5e:00.0: 3013 (617304475s/0x0002/FATAL) - Puncturing bad block on PD 1b(e0xfc/s5) at 37b955000
[ 3591.576789] megaraid_sas 0000:5e:00.0: 3014 (617304475s/0x0002/FATAL) - Puncturing bad block on PD 1b(e0xfc/s5) at 37b954800
[ 3591.576826] megaraid_sas 0000:5e:00.0: 3015 (617304475s/0x0002/FATAL) - Puncturing bad block on PD 1b(e0xfc/s5) at 37b954000
[ 3591.576862] megaraid_sas 0000:5e:00.0: 3016 (617304475s/0x0002/FATAL) - Puncturing bad block on PD 1b(e0xfc/s5) at 37b953800
[ 3591.576910] megaraid_sas 0000:5e:00.0: 3017 (617304475s/0x0002/FATAL) - Puncturing bad block on PD 1b(e0xfc/s5) at 37b953000
[ 3591.576947] megaraid_sas 0000:5e:00.0: 3018 (617304475s/0x0002/FATAL) - Puncturing bad block on PD 1b(e0xfc/s5) at 37b952800
[ 3591.576982] megaraid_sas 0000:5e:00.0: 3019 (617304475s/0x0002/FATAL) - Puncturing bad block on PD 1b(e0xfc/s5) at 37b956000
[ 3591.577019] megaraid_sas 0000:5e:00.0: 3020 (617304475s/0x0002/FATAL) - Puncturing bad block on PD 1b(e0xfc/s5) at 37b956800
[ 3591.577056] megaraid_sas 0000:5e:00.0: 3021 (617304475s/0x0002/FATAL) - Puncturing bad block on PD 1b(e0xfc/s5) at 37b957800
[ 3591.577094] megaraid_sas 0000:5e:00.0: 3022 (617304475s/0x0002/FATAL) - Puncturing bad block on PD 1b(e0xfc/s5) at 37b957000
[ 3591.577228] megaraid_sas 0000:5e:00.0: scanning for scsi8...
[ 3591.693438] megaraid_sas 0000:5e:00.0: OCR done for IO timeout case
[ 3591.880480] paging request
[ 3591.883568] at ffffc900323fc000
[ 3591.885750] IP: [] memcpy_erms+0x6/0x10
[ 3591.891894] PGD 17e01ad067 PUD 2fde402067 PMD 2fcae45067 PTE 0
[ 3591.898577] Oops: 0000 [#1] SMP
[ 3591.902267] Modules linked in: ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 ip6table_filter ip6_tables iptable_filter ipmi_poweroff ipmi_ssif 8021q garp mrp stp llc rds_rdma ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm mlx4_ib ib_sa ib_mad ib_core ib_addr rds ksplice_f1ccb11a_vmlinux_new(O) ksplice_f1ccb11a(O) ksplice_cs87ajef(O) ksplice_78es7dsn(O) ksplice_e1qlw09e(O) ksplice_sx0v1lwp_vmlinux_new(O) ksplice_sx0v1lwp(O) ksplice_b990zaf2_vmlinux_new(O) ksplice_b990zaf2(O) ksplice_cqze8g21_vmlinux_new(O) ksplice_cqze8g21(O) vfat fat coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd sg ipmi_devintf shpchp wmi ipmi_si ipmi_msghandler acpi_cpufreq acpi_pad binfmt_misc ip_tables ext4 mbcache2 jbd2 raid1
[ 3591.983552] raid0 sd_mod igb ptp pps_core i2c_algo_bit nvme i2c_core crc32c_intel megaraid_sas nvme_core mlx4_core dca ahci libahci libata [last unloaded: ksplice_f1ccb11a_vmlinux_old]
[ 3592.015613] CPU: 36 PID: 48089 Comm: mrdiagd Tainted: G O 4.1.12-124.24.3.el7uek.x86_64 #2
[ 3592.033750] Hardware name: Oracle Corporation ORACLE SERVER X7-2L/ASM, MB MECH, X7-2L, BIOS 42050000 11/30/2018
[ 3592.052551] task: ffff882fb65b9c00 ti: ffff88099e960000 task.ti: ffff88099e960000
[ 3592.068362] RIP: 0010:[] [] memcpy_erms+0x6/0x10
[ 3592.084575] RSP: 0018:ffff88099e963cc0 EFLAGS: 00010086
[ 3592.097691] RAX: ffff8822c826f000 RBX: ffff8817ca1704c8 RCX: 0000000000000eff
[ 3592.112785] RDX: 0000000000000fff RSI: ffffc900323fc000 RDI: ffff8822c826f100
[ 3592.127785] RBP: ffff88099e963cf8 R08: ffff8817ca1704d8 R09: 0000000000000000
[ 3592.142698] R10: 0000000000001000 R11: 0000000000000293 R12: ffff8817ca171904
[ 3592.157862] R13: 0000000000000292 R14: ffff8822c826f000 R15: 0000000000000fff
[ 3592.172786] FS: 00007f78cd7f2740(0000) GS:ffff882fdec00000(0000) knlGS:0000000000000000
[ 3592.188448] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3592.201355] CR2: ffffc900323fc000 CR3: 00000009c1f6a000 CR4: 0000000000360670
[ 3592.215704] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 3592.229922] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 3592.244058] Stack:
[ 3592.252189] ffffffffa00fa6aa 000000002ab647f2 ffffffffa0117400 ffffffff817e7bb0
[ 3592.266423] 0000000000000001 ffff88099e963f18 ffff882fcee57140 ffff88099e963d28
[ 3592.280588] ffffffff8149e913 ffff88099e963d38 ffffffff81752ae6 0000000000000000
[ 3592.294555] Call Trace:
[ 3592.302768] [] ? megasas_fw_crash_buffer_show+0x8a/0x140 [megaraid_sas]
[ 3592.317594] [] dev_attr_show+0x23/0x60
[ 3592.329265] [] ? mutex_lock+0x16/0x3f
[ 3592.341148] [] sysfs_kf_seq_show+0xcf/0x1f0
[ 3592.353446] [] kernfs_seq_show+0x29/0x30
[ 3592.365493] [] seq_read+0x100/0x3f0
[ 3592.377216] [] kernfs_fop_read+0x115/0x180
[ 3592.389596] [] ? audit_filter_rules.isra.9+0x67d/0xf20
[ 3592.403295] [] __vfs_read+0x3a/0x110
[ 3592.415224] [] ? security_file_permission+0x94/0xb0
[ 3592.428670] [] ? rw_verify_area+0x56/0xe0
[ 3592.441301] [] vfs_read+0x86/0x140
[ 3592.453221] [] SyS_read+0x55/0xd0
[ 3592.464925] [] ? system_call_after_swapgs+0xe9/0x190
[ 3592.479450] [] ? system_call_after_swapgs+0xe2/0x190
[ 3592.494173] [] system_call_fastpath+0x18/0xd8
[ 3592.507686] Code: 90 90 90 90 90 eb 1e 0f 1f 00 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 66 0f 1f 44 00 00 48 89 f8 48 89 d1 a4 c3 0f 1f 80 00 00 00 00 48 89 f8 48 83 fa 20 72 7e 40 38
[ 3592.544269] RIP [] memcpy_erms+0x6/0x10
[ 3592.558132] RSP
[ 3592.570331] CR2: ffffc900323fc000

Changes

 

Cause

To view full details, sign in with your My Oracle Support account.

Don't have a My Oracle Support account? Click to get started!


In this Document
Symptoms
Changes
Cause
Solution
References


My Oracle Support provides customers with access to over a million knowledge articles and a vibrant support community of peers and Oracle experts.