EXADATA - Upgrade of Compute nodes to 11.2.3.3.0 using 2.6.39-400.126.1.el5uek requires disabling APM (Doc ID 1623834.1)

Last updated on AUGUST 29, 2014

Applies to:

Oracle Exadata Storage Server Software - Version 11.2.3.3.0 to 12.1.1.1.0 [Release 11.2 to 12.1]
Information in this document applies to any platform.

Symptoms

 

Example 1:

Jan 25 22:19:07 db03 kernel: INFO: task kworker/u:0:65901 blocked for more than 120 seconds.
Jan 25 22:19:07 db03 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 25 22:19:07 db03 kernel: kworker/u:0     D ffffffff8151d620     0 65901      2 0x00000080
Jan 25 22:19:07 db03 kernel:  ffff8818f51a7ce0 0000000000000046 0000000000000001 0000000000012180
Jan 25 22:19:07 db03 kernel:  ffff8818eb37a180 ffff881fc1846080 0000000000000000 0000000000000000
Jan 25 22:19:07 db03 kernel:  0000000000000000 0000000000000000 0000000000000000 0000000000000000
Jan 25 22:19:07 db03 kernel: Call Trace:
Jan 25 22:19:07 db03 kernel:  [<ffffffff815047f5>] schedule+0x45/0x60
Jan 25 22:19:07 db03 kernel:  [<ffffffff81505466>] __mutex_lock_slowpath+0xd6/0x150
Jan 25 22:19:07 db03 kernel:  [<ffffffff810147eb>] ? __switch_to+0x12b/0x310
Jan 25 22:19:07 db03 kernel:  [<ffffffff815050fb>] mutex_lock+0x2b/0x50
Jan 25 22:19:07 db03 kernel:  [<ffffffffa039c9a1>] cma_sap_work_handler+0x121/0x290 [rdma_cm]
Jan 25 22:19:07 db03 kernel:  [<ffffffffa0160515>] ? ib_mad_completion_handler+0x45/0xa0 [ib_mad]
Jan 25 22:19:07 db03 kernel:  [<ffffffff8108c0d9>] process_one_work+0xf9/0x370
Jan 25 22:19:07 db03 kernel:  [<ffffffffa039c880>] ? cma_query_handler+0x360/0x360 [rdma_cm]
Jan 25 22:19:07 db03 kernel:  [<ffffffff8108ca1a>] worker_thread+0xca/0x240
Jan 25 22:19:07 db03 kernel:  [<ffffffff8108c950>] ? manage_workers+0x90/0x90
Jan 25 22:19:07 db03 kernel:  [<ffffffff81090ff7>] kthread+0x97/0xa0
Jan 25 22:19:07 db03 kernel:  [<ffffffff8150fe84>] kernel_thread_helper+0x4/0x10
Jan 25 22:19:07 db03 kernel:  [<ffffffff81090f60>] ? kthread_bind+0x80/0x80
Jan 25 22:19:07 db03 kernel:  [<ffffffff8150fe80>] ? gs_change+0x13/0x13

Example 2:

Jan 25 22:19:07 db03 kernel: INFO: task kworker/4:0:69988 blocked for more than 120 seconds.
Jan 25 22:19:07 db03 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 25 22:19:07 db03 kernel: kworker/4:0     D ffffffff8151d620     0 69988      2 0x00000080
Jan 25 22:19:07 db03 kernel:  ffff8818f5229860 0000000000000046 00000001014357cb 0000000000012180
Jan 25 22:19:07 db03 kernel:  ffff8818ea15c140 ffff881fc1846080 ffffffffffffffff 0000000000000000
Jan 25 22:19:07 db03 kernel:  0000000000000000 0000000000000000 ffff8818f52297c0 00000000948d943d
Jan 25 22:19:07 db03 kernel: Call Trace:
Jan 25 22:19:07 db03 kernel:  [<ffffffff81262dc1>] ? list_del+0x11/0x40
Jan 25 22:19:07 db03 kernel:  [<ffffffff815045cc>] ? wait_for_common+0x5c/0x160
Jan 25 22:19:07 db03 kernel:  [<ffffffff81041009>] ? default_spin_lock_flags+0x9/0x10
Jan 25 22:19:07 db03 kernel:  [<ffffffff815065b4>] ? _raw_spin_lock_irqsave+0x34/0x50
Jan 25 22:19:07 db03 kernel:  [<ffffffff815047f5>] schedule+0x45/0x60
Jan 25 22:19:07 db03 kernel:  [<ffffffff81427b8d>] __lock_sock+0x6d/0xa0
Jan 25 22:19:07 db03 kernel:  [<ffffffff81091ae0>] ? add_wait_queue_exclusive+0x60/0x60
Jan 25 22:19:07 db03 kernel:  [<ffffffff81427bfd>] lock_sock_nested+0x3d/0x60
Jan 25 22:19:07 db03 kernel:  [<ffffffffa0415247>] sdp_cma_handler+0x57/0xaa0 [ib_sdp]
Jan 25 22:19:07 db03 kernel:  [<ffffffffa0242441>] ? mlx4_free_cmd_mailbox+0x31/0x40 [mlx4_core]
Jan 25 22:19:07 db03 kernel:  [<ffffffffa0258aec>] ? __mlx4_qp_modify+0x1bc/0x340 [mlx4_core]
Jan 25 22:19:07 db03 kernel:  [<ffffffffa02f8668>] ? to_ib_ah_attr+0x68/0x130 [mlx4_ib]
Jan 25 22:19:07 db03 kernel:  [<ffffffffa02d9a4b>] ? rdma_port_link_layer+0x1b/0x50 [ib_core]
Jan 25 22:19:07 db03 kernel:  [<ffffffffa02fc256>] ? __mlx4_ib_modify_qp+0x166/0xc50 [mlx4_ib]
Jan 25 22:19:07 db03 kernel:  [<ffffffffa02fce39>] ? mlx4_ib_modify_qp+0xf9/0x2f0 [mlx4_ib]
Jan 25 22:19:07 db03 kernel:  [<ffffffffa039b144>] cma_qp_set_alt_path+0x304/0x380 [rdma_cm]
Jan 25 22:19:07 db03 kernel:  [<ffffffffa03a1c1a>] cma_lap_handler+0x28a/0x610 [rdma_cm]
Jan 25 22:19:07 db03 kernel:  [<ffffffff8105b190>] ? account_entity_dequeue+0x90/0xa0
Jan 25 22:19:07 db03 kernel:  [<ffffffffa03a2645>] cma_ib_handler+0x2a5/0x3d0 [rdma_cm]
Jan 25 22:19:07 db03 kernel:  [<ffffffffa038b5b7>] cm_process_work+0x27/0xf0 [ib_cm]
Jan 25 22:19:07 db03 kernel:  [<ffffffffa038be68>] cm_lap_handler+0x248/0x270 [ib_cm]
Jan 25 22:19:07 db03 kernel:  [<ffffffffa038e5cc>] cm_work_handler+0x9c/0xf0 [ib_cm]
Jan 25 22:19:07 db03 kernel:  [<ffffffff8108c0d9>] process_one_work+0xf9/0x370
Jan 25 22:19:07 db03 kernel:  [<ffffffffa038e530>] ? cm_req_handler+0x3e0/0x3e0 [ib_cm]
Jan 25 22:19:07 db03 kernel:  [<ffffffff8108ca1a>] worker_thread+0xca/0x240
Jan 25 22:19:07 db03 kernel:  [<ffffffff8108c950>] ? manage_workers+0x90/0x90


Example 3:

[<ffffffffa0503247>] sdp_cma_handler+0x57/0xaa0 [ib_sdp]
 [<ffffffffa0508675>] ? sdp_reuse_sdp_buf+0x1c5/0x1e0 [ib_sdp]
 [<ffffffffa0509c79>] sdp_handle_disconn+0x269/0x390 [ib_sdp]
 [<ffffffffa050a095>] sdp_process_rx_ctl_skb+0x2f5/0x360 [ib_sdp]
 [<ffffffffa050a141>] sdp_do_posts+0x41/0x240 [ib_sdp]
 [<ffffffffa050a3d9>] sdp_rx_comp_work+0x99/0x240 [ib_sdp]
 [<ffffffffa050a340>] ? sdp_do_posts+0x240/0x240 [ib_sdp]
 [<ffffffffa04fe7dc>] sdp_destruct+0x14c/0x2e0 [ib_sdp]
 [<ffffffffa04ffa19>] sdp_destroy_work+0xc9/0x100 [ib_sdp]
 [<ffffffffa04ff950>] ? sdp_dreq_wait_timeout_work+0x1e0/0x1e0 [ib_sdp]
 [<ffffffffa04fec00>] sdp_connect+0x70/0x210 [ib_sdp]
 [<ffffffffa04fec00>] sdp_connect+0x70/0x210 [ib_sdp]
 [<ffffffffa04fec00>] sdp_connect+0x70/0x210 [ib_sdp]
 [<ffffffffa04fec00>] sdp_connect+0x70/0x210 [ib_sdp]
 [<ffffffffa0503247>] sdp_cma_handler+0x57/0xaa0 [ib_sdp]
 [<ffffffffa0508675>] ? sdp_reuse_sdp_buf+0x1c5/0x1e0 [ib_sdp]
 [<ffffffffa0509c79>] sdp_handle_disconn+0x269/0x390 [ib_sdp]
 [<ffffffffa050a095>] sdp_process_rx_ctl_skb+0x2f5/0x360 [ib_sdp]
 [<ffffffffa050a141>] sdp_do_posts+0x41/0x240 [ib_sdp]
 [<ffffffffa050a3d9>] sdp_rx_comp_work+0x99/0x240 [ib_sdp]
 [<ffffffffa050a340>] ? sdp_do_posts+0x240/0x240 [ib_sdp]
 [<ffffffffa04fe7dc>] sdp_destruct+0x14c/0x2e0 [ib_sdp]
 [<ffffffffa04ffa19>] sdp_destroy_work+0xc9/0x100 [ib_sdp]
 [<ffffffffa04ff950>] ? sdp_dreq_wait_timeout_work+0x1e0/0x1e0 [ib_sdp]

 

Changes

A combination (both) of:

Also has been observed on 12c installations:

 

This is typical of a configuration where Exadata RAC is interconnected to Exalogic or Exalytics.

Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms