Exalogic Compute Node Hung with "rcu_sched_state detected stall" Messages on the Console

(Doc ID 2348632.1)

Last updated on JANUARY 18, 2018

Applies to:

Oracle VM - Version 3.2.11 and later
Linux x86-64

Symptoms

Exalogic Compute node hung and it was not possible to connect to it.  Virtual Machines on that compute node were also unreachable.

Below messages can be seen on the ilom console until crashing the machine to collect the vmcore:

Mon Oct 23 11:58:52 2017 | mlx4_core 0000:13:00.0: Received reset from
slave:1^M
Mon Oct 23 11:58:52 2017 | INFO: rcu_sched_state detected stalls on
CPUs/tasks: { 0 11} (detected by 2, t=60002 jiffies)^M
Mon Oct 23 11:58:52 2017 | NMI backtrace for cpu 2^M
Mon Oct 23 11:58:52 2017 | CPU 2 ^M
Mon Oct 23 11:58:52 2017 | Modules linked in: iptable_filter ip_tables mptctl
mptbase xen_pciback ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm dm_nfs xen_blkback
xen_netback xen_gntdev xen_evtchn i2c_dev i2c_core ipmi_devintf ipmi_si nfs
fscache auth_rpcgss nfs_acl lockd sunrpc bridge stp llc bonding be2iscsi
iscsi_boot_sysfs iscsi_tcp bnx2i cnic uio cxgb3i libcxgbi cxgb3 mdio
libiscsi_tcp rds_rdma ib_sdp ib_iser libiscsi scsi_transport_iscsi ib_srp
scsi_transport_srp scsi_tgt rds ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad
rdma_cm ib_cm iw_cm ib_addr ipv6 mlx4_vnic mlx4_vnic_helper mlx4_ib ib_sa
ib_mad ib_core mlx4_core xenfs xen_privcmd ocfs2 jbd2 ocfs2_nodemanager
configfs ocfs2_stackglue video sbs sbshc acpi_memhotplug acpi_ipmi
ipmi_msghandler parport_pc lp parport cdc_ether usbnet mii wmi ixgbe hwmon
dca ghes pcspkr hed dm_snapshot dm_zero dm_mirror dm_region_hash dm_log
dm_mod usb_storage ahci libahci sg shpchp megaraid_sas sd_mod crc_t10dif ext3
jbd mbcache [last unloaded: ocfs2_dlm]^M
Mon Oct 23 11:58:52 2017 | ^M
Mon Oct 23 11:58:52 2017 | Pid: 0, comm: swapper Not tainted
2.6.39-400.294.1.el5uek #1 Oracle Corporation ORACLE SERVER
X5-2/ASM,MOTHERBOARD,1U^M
Mon Oct 23 11:58:52 2017 | RIP: e030:[]
[] xen_hypercall_vcpu_op+0xa/0x20^M
Mon Oct 23 11:58:52 2017 | RSP: e02b:ffff880365843c00 EFLAGS: 00000046^M
Mon Oct 23 11:58:52 2017 | RAX: 0000000000000000 RBX: 0000000000000002 RCX:ffffffff8100130a^M
Mon Oct 23 11:58:52 2017 | RDX: 0000000000000000 RSI: 0000000000000002 RDI:000000000000000b^M
Mon Oct 23 11:58:52 2017 | RBP: ffff880365843c18 R08: 0000000000000000 R09:ffffffff819b1840^M
Mon Oct 23 11:58:52 2017 | R10: 0000000000000000 R11: 0000000000000246 R12:ffffffff819b1840^M
Mon Oct 23 11:58:52 2017 | R13: 0000000000000005 R14: ffffffff819b1840 R15:ffff880365843e28^M
Mon Oct 23 11:58:52 2017 | FS: 00007fb4988596e0(0000)GS:ffff880365840000(0000) knlGS:0000000000000000^M
Mon Oct 23 11:58:52 2017 | CS: e033 DS: 002b ES: 002b CR0:000000008005003b^M
Mon Oct 23 11:58:52 2017 | CR2: 00000000006b3480 CR3: 00000002aeba0000 CR4:0000000000002660^M
Mon Oct 23 11:58:52 2017 | DR0: 0000000000000000 DR1: 0000000000000000 DR2:0000000000000000^M
Mon Oct 23 11:58:52 2017 | DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:0000000000000400^M
Mon Oct 23 11:58:52 2017 | Process swapper (pid: 0, threadinfo ffff88034148c000, task ffff88034148a100)^M
Mon Oct 23 11:58:52 2017 | Stack:^M
Mon Oct 23 11:58:52 2017 | 0000000000000001 0000000000000002 ffffffff812fb564 ffff880365843c48^M
Mon Oct 23 11:58:52 2017 | ffffffff81011f8e 0000000000000000 0000000000000010 ffffffff817c8600^M
Mon Oct 23 11:58:52 2017 | ffffffff817d8b00 ffff880365843c58 ffffffff810124e5 ffff880365843c78^M
Mon Oct 23 11:58:52 2017 | Call Trace:^M
Mon Oct 23 11:58:52 2017 | ^M
Mon Oct 23 11:58:52 2017 | [] ? xen_send_IPI_one+0x44/0x70^M
Mon Oct 23 11:58:52 2017 | [] __xen_send_IPI_mask+0x2e/0x50^M
Mon Oct 23 11:58:52 2017 | [] xen_send_IPI_all+0x65/0x90^M
Mon Oct 23 11:58:52 2017 | [] arch_trigger_all_cpu_backtrace+0x6c/0xb0^M
Mon Oct 23 11:58:52 2017 | [] ? _raw_spin_unlock_irqrestore+0x1e/0x30^M
Mon Oct 23 11:58:52 2017 | [] print_other_cpu_stall+0x145/0x160^M
Mon Oct 23 11:58:52 2017 | [] check_cpu_stall+0xc0/0xe0^M
Mon Oct 23 11:58:52 2017 | [] __rcu_pending+0x30/0x140^M
Mon Oct 23 11:58:52 2017 | [] rcu_pending+0x37/0x90^M
Mon Oct 23 11:58:52 2017 | [] rcu_check_callbacks+0x85/0xa0^M
Mon Oct 23 11:58:52 2017 | [] update_process_times+0x46/0x90^M
Mon Oct 23 11:58:52 2017 | [] tick_sched_timer+0x66/0xd0^M
Mon Oct 23 11:58:52 2017 | [] ? tick_clock_notify+0x60/0x60^M
Mon Oct 23 11:58:52 2017 | [] __run_hrtimer+0x83/0x1e0^M
Mon Oct 23 11:58:52 2017 | [] hrtimer_interrupt+0xe6/0x240^M
Mon Oct 23 11:58:52 2017 | [] ? notifier_call_chain+0x4a/0x90^M
Mon Oct 23 11:58:52 2017 | [] xen_timer_interrupt+0x27/0x40^M
Mon Oct 23 11:58:52 2017 | [] handle_irq_event_percpu+0x5d/0x1a0^M
Mon Oct 23 11:58:52 2017 | [] handle_percpu_irq+0x48/0x70^M
Mon Oct 23 11:58:52 2017 | [] __xen_evtchn_do_upcall+0x306/0x310^M
Mon Oct 23 11:58:52 2017 | [] xen_evtchn_do_upcall+0x2f/0x50^M
Mon Oct 23 11:58:52 2017 | [] xen_do_hypervisor_callback+0x1e/0xa0^M
Mon Oct 23 11:58:52 2017 | ^M
Mon Oct 23 11:58:52 2017 | [] ? xen_hypercall_sched_op+0xa/0x20^M
Mon Oct 23 11:58:52 2017 | [] ? xen_hypercall_sched_op+0xa/0x20^M
Mon Oct 23 11:58:52 2017 | [] ? xen_safe_halt+0x10/0x20^M
Mon Oct 23 11:58:52 2017 | [] ? default_idle+0x5b/0x170^M
Mon Oct 23 11:58:52 2017 | [] ? cpu_idle+0xc6/0xf0^M
Mon Oct 23 11:58:52 2017 | [] ? xen_irq_enable_direct_reloc+0x4/0x4^M
Mon Oct 23 11:58:52 2017 | [] ? cpu_bringup_and_idle+0xe/0x10^M
Mon Oct 23 11:58:52 2017 | Code: cc 51 41 53 50 b8 17 00 00 00 0f 05 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 18 00 00 00 0f 05 ^M
Mon Oct 23 11:58:57 2017 | 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc ^M
Mon Oct 23 11:58:57 2017 | Call Trace:^M
Mon Oct 23 11:58:57 2017 | [] ? xen_send_IPI_one+0x44/0x70^M
Mon Oct 23 11:58:57 2017 | [] __xen_send_IPI_mask+0x2e/0x50^M
Mon Oct 23 11:58:57 2017 | [] xen_send_IPI_all+0x65/0x90^M
Mon Oct 23 11:58:57 2017 | [] arch_trigger_all_cpu_backtrace+0x6c/0xb0^M
Mon Oct 23 11:58:57 2017 | [] ? _raw_spin_unlock_irqrestore+0x1e/0x30^M
Mon Oct 23 11:58:57 2017 | [] print_other_cpu_stall+0x145/0x160^M
Mon Oct 23 11:58:57 2017 | [] check_cpu_stall+0xc0/0xe0^M
Mon Oct 23 11:58:57 2017 | [] __rcu_pending+0x30/0x140^M
Mon Oct 23 11:58:57 2017 | [] rcu_pending+0x37/0x90^M
Mon Oct 23 11:58:57 2017 | [] rcu_check_callbacks+0x85/0xa0^M
Mon Oct 23 11:58:57 2017 | [] update_process_times+0x46/0x90^M
Mon Oct 23 11:58:57 2017 | [] tick_sched_timer+0x66/0xd0^M
Mon Oct 23 11:58:57 2017 | [] ? tick_clock_notify+0x60/0x60^M
Mon Oct 23 11:58:57 2017 | [] __run_hrtimer+0x83/0x1e0^M
Mon Oct 23 11:58:57 2017 | [] hrtimer_interrupt+0xe6/0x240^M
Mon Oct 23 11:58:57 2017 | [] ? notifier_call_chain+0x4a/0x90^M
Mon Oct 23 11:58:57 2017 | [] xen_timer_interrupt+0x27/0x40^M
Mon Oct 23 11:58:57 2017 | [] handle_irq_event_percpu+0x5d/0x1a0^M
Mon Oct 23 11:58:57 2017 | [] handle_percpu_irq+0x48/0x70^M
Mon Oct 23 11:58:57 2017 | [] __xen_evtchn_do_upcall+0x306/0x310^M
Mon Oct 23 11:58:57 2017 | [] xen_evtchn_do_upcall+0x2f/0x50^M
Mon Oct 23 11:58:57 2017 | [] xen_do_hypervisor_callback+0x1e/0xa0^M
Mon Oct 23 11:58:57 2017 | [] ? xen_hypercall_sched_op+0xa/0x20^M
Mon Oct 23 11:58:57 2017 | [] ? xen_hypercall_sched_op+0xa/0x20^M
Mon Oct 23 11:58:57 2017 | [] ? xen_safe_halt+0x10/0x20^M
Mon Oct 23 11:58:57 2017 | [] ? default_idle+0x5b/0x170^M
Mon Oct 23 11:58:57 2017 | [] ? cpu_idle+0xc6/0xf0^M
Mon Oct 23 11:58:57 2017 | [] ? xen_irq_enable_direct_reloc+0x4/0x4^M
Mon Oct 23 11:58:57 2017 | [] ? cpu_bringup_and_idle+0xe/0x10^M
Mon Oct 23 11:58:57 2017 | NMI backtrace for cpu 0^M
Mon Oct 23 11:58:57 2017 | CPU 0 ^M
Mon Oct 23 11:58:57 2017 | Modules linked in: iptable_filter ip_tables mptctl mptbase xen_pciback ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm dm_nfs xen_blkback xen_netback xen_gntdev xen_evtchn i2c_dev i2c_core ipmi_devintf ipmi_si nfs fscache auth_rpcgss nfs_acl lockd sunrpc bridge stp llc bonding be2iscsi iscsi_boot_sysfs iscsi_tcp bnx2i cnic uio cxgb3i libcxgbi cxgb3 mdio libiscsi_tcp rds_rdma ib_sdp ib_iser libiscsi scsi_transport_iscsi ib_srp scsi_transport_srp scsi_tgt rds ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 mlx4_vnic mlx4_vnic_helper mlx4_ib ib_sa ib_mad ib_core mlx4_core xenfs xen_privcmd ocfs2 jbd2 ocfs2_nodemanager configfs ocfs2_stackglue video sbs sbshc acpi_memhotplug acpi_ipmi ipmi_msghandler parport_pc lp parport cdc_ether usbnet mii wmi ixgbe hwmon dca ghes pcspkr hed dm_snapshot dm_zero dm_mirror dm_region_hash dm_log
dm_mod usb_storage ahci libahci sg shpchp megaraid_sas sd_mod crc_t10dif ext3 jbd mbcache [last unloaded: ocfs2_dlm]^M
Mon Oct 23 11:58:57 2017 | ^M
Mon Oct 23 11:58:57 2017 | Pid: 0, comm: swapper Not tainted 2.6.39-400.294.1.el5uek #1 Oracle Corporation ORACLE SERVER
X5-2/ASM,MOTHERBOARD,1U^M
Mon Oct 23 11:58:57 2017 | RIP: e030:[] [] xen_hypercall_sched_op+0xa/0x20^M
Mon Oct 23 11:58:57 2017 | RSP: e02b:ffff880365803910 EFLAGS: 00000002^M
Mon Oct 23 11:58:57 2017 | RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff810013aa^M
Mon Oct 23 11:58:57 2017 | RDX: 0000000000000002 RSI: ffff880365803928 RDI: 0000000000000003^M
Mon Oct 23 11:58:57 2017 | RBP: ffff880365803958 R08: 0000000000000000 R09: 0000000000000000^M
Mon Oct 23 11:58:57 2017 | R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000059^M
Mon Oct 23 11:58:57 2017 | R13: ffff8803388f1ca0 R14: 0000000000000000 R15: 0000000000000000^M
Mon Oct 23 11:58:57 2017 | FS: 00007f35836856e0(0000) GS:ffff880365800000(0000) knlGS:0000000000000000^M
Mon Oct 23 11:58:57 2017 | CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b^M
Mon Oct 23 11:58:57 2017 | CR2: 0000003142203080 CR3: 0000000310355000 CR4: 0000000000002660^M
Mon Oct 23 11:58:57 2017 | DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000^M
Mon Oct 23 11:58:57 2017 | DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400^M
Mon Oct 23 11:58:57 2017 | Process swapper (pid: 0, threadinfo ffffffff8179a000, task ffffffff817a5020)^M
Mon Oct 23 11:58:57 2017 | Stack:^M
Mon Oct 23 11:58:57 2017 | 0000000000000246 ffff880365000c20 ffffffff812f9d90 ffff880365803944^M
Mon Oct 23 11:58:57 2017 | ffffffff00000001 0000000000000000 00000002812f960e ffff880365803958^M
Mon Oct 23 11:58:57 2017 | 0000000000000000 ffff880365803968 ffffffff812f9db0 ffff8803658039b8^M
Mon Oct 23 11:58:57 2017 | Call Trace:^M
Mon Oct 23 11:58:57 2017 | ^M
Mon Oct 23 11:58:57 2017 | [] ? xen_poll_irq_timeout+0x40/0x50^M
Mon Oct 23 11:58:57 2017 | [] xen_poll_irq+0x10/0x20^M
Mon Oct 23 11:58:57 2017 | [] xen_spin_lock_slow+0x89/0x100^M
Mon Oct 23 11:59:02 2017 | [] ? kfree+0x11a/0x250^M
Mon Oct 23 11:59:02 2017 | [] xen_spin_lock_flags+0x72/0x80^M
Mon Oct 23 11:59:02 2017 | [] _raw_spin_lock_irqsave+0x34/0x50^M
Mon Oct 23 11:59:02 2017 | [] mlx4_ib_tunnel_comp_handler+0x38/0x90 [mlx4_ib]^M
Mon Oct 23 11:59:02 2017 | [] mlx4_ib_cq_comp+0x17/0x20 [mlx4_ib]^M
Mon Oct 23 11:59:02 2017 | [] mlx4_cq_completion+0x6b/0xc0 [mlx4_core]^M
Mon Oct 23 11:59:02 2017 | [] mlx4_eq_int+0x160/0x970 [mlx4_core]^M
Mon Oct 23 11:59:02 2017 | [] ? __alloc_pages_nodemask+0x122/0x200^M
Mon Oct 23 11:59:02 2017 | [] mlx4_msi_x_interrupt+0x14/0x20 [mlx4_core]^M
Mon Oct 23 11:59:02 2017 | [] handle_irq_event_percpu+0x5d/0x1a0^M
Mon Oct 23 11:59:02 2017 | [] handle_irq_event+0x4f/0x80^M
Mon Oct 23 11:59:02 2017 | [] handle_edge_irq+0xa5/0x100^M
Mon Oct 23 11:59:02 2017 | [] __xen_evtchn_do_upcall+0x218/0x310^M
Mon Oct 23 11:59:02 2017 | [] xen_evtchn_do_upcall+0x2f/0x50^M
Mon Oct 23 11:59:02 2017 | [] xen_do_hypervisor_callback+0x1e/0xa0^M
Mon Oct 23 11:59:02 2017 | [] ? xen_hypercall_xen_version+0xa/0x20^M
Mon Oct 23 11:59:02 2017 | [] ? xen_hypercall_xen_version+0xa/0x20^M
Mon Oct 23 11:59:02 2017 | [] ? xen_force_evtchn_callback+0xd/0x10^M
Mon Oct 23 11:59:02 2017 | [] ? check_events+0x12/0x20^M
Mon Oct 23 11:59:02 2017 | [] ? xen_restore_fl_direct_reloc+0x4/0x4^M
Mon Oct 23 11:59:02 2017 | [] ? _raw_spin_unlock_irqrestore+0x1e/0x30^M
Mon Oct 23 11:59:02 2017 | [] ? __queue_work+0xeb/0x290^M
Mon Oct 23 11:59:02 2017 | [] ? flush_delayed_work+0x50/0x50^M
Mon Oct 23 11:59:02 2017 | [] ? delayed_work_timer_fn+0x2a/0x40^M
Mon Oct 23 11:59:02 2017 | [] ? call_timer_fn+0x4a/0x110^M
Mon Oct 23 11:59:02 2017 | [] ? flush_delayed_work+0x50/0x50^M
Mon Oct 23 11:59:02 2017 | [] ? run_timer_softirq+0x13a/0x220^M
Mon Oct 23 11:59:02 2017 | [] ? _raw_spin_lock+0xe/0x20^M
Mon Oct 23 11:59:02 2017 | [] ? __do_softirq+0xb9/0x1d0^M
Mon Oct 23 11:59:02 2017 | [] ? call_softirq+0x1c/0x30^M
Mon Oct 23 11:59:02 2017 | [] ? do_softirq+0x65/0xa0^M
Mon Oct 23 11:59:02 2017 | [] ? irq_exit+0xab/0xc0^M
Mon Oct 23 11:59:02 2017 | [] ? xen_evtchn_do_upcall+0x35/0x50^M
Mon Oct 23 11:59:02 2017 | [] ? xen_do_hypervisor_callback+0x1e/0xa0^M
Mon Oct 23 11:59:02 2017 | ^M
Mon Oct 23 11:59:02 2017 | [] ? xen_hypercall_sched_op+0xa/0x20^M
Mon Oct 23 11:59:02 2017 | [] ? xen_hypercall_sched_op+0xa/0x20^M
Mon Oct 23 11:59:02 2017 | [] ? xen_safe_halt+0x10/0x20^M
Mon Oct 23 11:59:02 2017 | [] ? default_idle+0x5b/0x170^M
Mon Oct 23 11:59:02 2017 | [] ? cpu_idle+0xc6/0xf0^M
Mon Oct 23 11:59:02 2017 | [] ? rest_init+0x72/0x80^M
Mon Oct 23 11:59:02 2017 | [] ? start_kernel+0x2aa/0x390^M
Mon Oct 23 11:59:02 2017 | [] ? x86_64_start_reservations+0x6a/0xa0^M
Mon Oct 23 11:59:02 2017 | [] ? xen_start_kernel+0x325/0x450^M
Mon Oct 23 11:59:02 2017 | Code: cc 51 41 53 b8 1c 00 00 00 0f 05 41 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 1d 00 00 00 0f 05 ^M
Mon Oct 23 11:59:02 2017 | 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc ^M
Mon Oct 23 11:59:02 2017 | Call Trace:^M



Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms