Linux: Node Reboots After "kernel: INFO: task ocssd.bin:<pid> blocked for more than 120 seconds." in /var/log/messages (Doc ID 1423693.1)

Last updated on AUGUST 09, 2014

Applies to:

Oracle Server - Enterprise Edition - Version 10.2.0.1 and later
Generic Linux

Symptoms

Oracle Clusterware on Linux, node hangs and reboots after the following OS messages:

Feb 13 20:21:45 racnode2 kernel: INFO: task pdflush:1793 blocked for more than 120 seconds.
Feb 13 20:21:45 racnode2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb 13 20:21:45 racnode2 kernel: pdflush       D ffffffff801538ac     0  1793   1163          1794  1792 (L-TLB)
..
Feb 13 20:21:45 racnode2 kernel: Call Trace:
Feb 13 20:21:45 racnode2 kernel:  [<ffffffff8849d002>] :jbd:start_this_handle+0x2e5/0x36c
Feb 13 20:21:45 racnode2 kernel:  [<ffffffff800a28ec>] autoremove_wake_function+0x0/0x2e
Feb 13 20:21:45 racnode2 kernel:  [<ffffffff8849d152>] :jbd:journal_start+0xc9/0x100
Feb 13 20:21:45 racnode2 kernel:  [<ffffffff884bb511>] :ext3:ext3_ordered_writepage+0x7d/0x198
Feb 13 20:21:45 racnode2 kernel:  [<ffffffff8001d18f>] mpage_writepages+0x1bf/0x37d
Feb 13 20:21:45 racnode2 kernel:  [<ffffffff884bb494>] :ext3:ext3_ordered_writepage+0x0/0x198
Feb 13 20:21:45 racnode2 kernel:  [<ffffffff8815ed5c>] :dm_mod:dm_any_congested+0x38/0x3f
Feb 13 20:21:45 racnode2 kernel:  [<ffffffff88160acf>] :dm_mod:dm_table_any_congested+0x53/0xa3
Feb 13 20:21:45 racnode2 kernel:  [<ffffffff8005aebc>] do_writepages+0x29/0x2f
Feb 13 20:21:45 racnode2 kernel:  [<ffffffff8002fe38>] __writeback_single_inode+0x19e/0x318
Feb 13 20:21:45 racnode2 kernel:  [<ffffffff80021104>] sync_sb_inodes+0x1b5/0x26f
Feb 13 20:21:45 racnode2 kernel:  [<ffffffff800a26d4>] keventd_create_kthread+0x0/0xc4
Feb 13 20:21:45 racnode2 kernel:  [<ffffffff80051332>] writeback_inodes+0x82/0xd8
Feb 13 20:21:45 racnode2 kernel:  [<ffffffff800cbbce>] wb_kupdate+0xd4/0x14e
Feb 13 20:21:45 racnode2 kernel:  [<ffffffff800568c8>] pdflush+0x0/0x1fb
..

## >> note: the task name can be any running processes

Feb 13 20:21:45 racnode2 kernel:
Feb 13 20:21:45 racnode2 kernel: INFO: task kjournald:22190 blocked for more than 120 seconds.
Feb 13 20:21:45 racnode2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb 13 20:21:45 racnode2 kernel: kjournald     D ffffffff801538ac     0 22190   1163         22197 22185 (L-TLB)
..
Feb 13 20:21:45 racnode2 kernel: Call Trace:
Feb 13 20:21:45 racnode2 kernel:  [<ffffffff8849e5cf>] :jbd:journal_commit_transaction+0x16d/0x1066
Feb 13 20:21:45 racnode2 kernel:  [<ffffffff800a28ec>] autoremove_wake_function+0x0/0x2e
Feb 13 20:21:45 racnode2 kernel:  [<ffffffff8004b425>] try_to_del_timer_sync+0x7f/0x88
Feb 13 20:21:45 racnode2 kernel:  [<ffffffff884a25d3>] :jbd:kjournald+0xc1/0x213
Feb 13 20:21:45 racnode2 kernel:  [<ffffffff800a28ec>] autoremove_wake_function+0x0/0x2e
Feb 13 20:21:45 racnode2 kernel:  [<ffffffff884a2512>] :jbd:kjournald+0x0/0x213
Feb 13 20:21:45 racnode2 kernel:  [<ffffffff800a26d4>] keventd_create_kthread+0x0/0xc4
Feb 13 20:21:45 racnode2 kernel:  [<ffffffff80032b26>] kthread+0xfe/0x132
Feb 13 20:21:45 racnode2 kernel:  [<ffffffff800a26d4>] keventd_create_kthread+0x0/0xc4
Feb 13 20:21:45 racnode2 kernel:  [<ffffffff8005dfb1>] child_rip+0xa/0x11
Feb 13 20:21:45 racnode2 kernel:  [<ffffffff800a26d4>] keventd_create_kthread+0x0/0xc4
Feb 13 20:21:45 racnode2 kernel:  [<ffffffff80032a28>] kthread+0x0/0x132
Feb 13 20:21:45 racnode2 kernel:  [<ffffffff8005dfa7>] child_rip+0x0/0x11
..
Feb 13 20:21:45 racnode2 kernel: INFO: task multipathd:24551 blocked for more than 120 seconds.
..
Feb 13 20:21:45 racnode2 kernel: INFO: task ohasd.bin:9040 blocked for more than 120 seconds.
..
Feb 13 20:21:45 racnode2 kernel: INFO: task oracle:8977 blocked for more than 120 seconds.
..
Feb 13 20:21:45 racnode2 kernel: INFO: task tnslsnr:6129 blocked for more than 120 seconds.
..
Feb 13 20:21:45 racnode2 kernel: INFO: task ocssd.bin:7644 blocked for more than 120 seconds.
Feb 13 20:21:45 racnode2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb 13 20:21:45 racnode2 kernel: ocssd.bin     D ffffffff801538ac     0  7644      1          7645  5800 (NOTLB)
..
Feb 13 20:21:45 racnode2 kernel:  [<ffffffff8001c41a>] generic_make_request+0x211/0x228
Feb 13 20:21:45 racnode2 kernel:  [<ffffffff8006ec4e>] do_gettimeofday+0x40/0x90
Feb 13 20:21:45 racnode2 kernel:  [<ffffffff800637ca>] io_schedule+0x3f/0x67
Feb 13 20:21:45 racnode2 kernel:  [<ffffffff800f74c6>] __blockdev_direct_IO+0x8da/0xa82
Feb 13 20:21:45 racnode2 kernel:  [<ffffffff8004fdcf>] __filemap_fdatawrite_range+0x50/0x5b
Feb 13 20:21:45 racnode2 kernel:  [<ffffffff800e7fbc>] blkdev_direct_IO+0x32/0x37
Feb 13 20:21:45 racnode2 kernel:  [<ffffffff800e7ef4>] blkdev_get_blocks+0x0/0x96
Feb 13 20:21:45 racnode2 kernel:  [<ffffffff800c878a>] generic_file_direct_IO+0x108/0x122
Feb 13 20:21:45 racnode2 kernel:  [<ffffffff8001f0ff>] generic_file_direct_write+0x60/0xf2
Feb 13 20:21:45 racnode2 kernel:  [<ffffffff800166b8>] __generic_file_aio_write_nolock+0x2b8/0x3b6
Feb 13 20:21:45 racnode2 kernel:  [<ffffffff800c890f>] generic_file_aio_write_nolock+0x20/0x6c
Feb 13 20:21:45 racnode2 kernel:  [<ffffffff800c8ce6>] generic_file_write_nolock+0x8f/0xa8
Feb 13 20:21:45 racnode2 kernel:  [<ffffffff800a28ec>] autoremove_wake_function+0x0/0x2e
Feb 13 20:21:45 racnode2 kernel:  [<ffffffff800e6d70>] blkdev_file_write+0x1a/0x1f
Feb 13 20:21:45 racnode2 kernel:  [<ffffffff80016b71>] vfs_write+0xce/0x174
Feb 13 20:21:45 racnode2 kernel:  [<ffffffff8004419e>] sys_pwrite64+0x50/0x70
Feb 13 20:21:45 racnode2 kernel:  [<ffffffff8005d229>] tracesys+0x71/0xe0
Feb 13 20:21:45 racnode2 kernel:  [<ffffffff8005d28d>] tracesys+0xd5/0xe0

## >> note: node rebooted

Feb 13 20:35:33 racnode2 syslogd 1.4.1: restart.
Feb 13 20:35:33 racnode2 kernel: klogd 1.4.1, log source = /proc/kmsg started
2012-02-13 20:18:13.243: [    CSSD][1116629312]clssnmSendingThread: sending status msg to all nodes
2012-02-13 20:18:13.243: [    CSSD][1116629312]clssnmSendingThread: sent 4 status msgs to all nodes

## >> note: node rebooted

2012-02-13 20:42:13.392: [    CSSD][1825956736]clsu_load_ENV_levels: Module = CSSD, LogLevel = 2, TraceLevel = 0
2012-02-13 20:42:13.392: [    CSSD][1825956736]clsu_load_ENV_levels: Module = GIPCNM, LogLevel = 2, TraceLevel = 0
zzz ***Mon Feb 13 20:17:41 CST 2012
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 5  0      0 29418300 1508316 26174916    0    0   130   335    1    1  2  1 97  1  0
 1  0      0 29412920 1508316 26174956    0    0   520   358 1499 20298  2  2 96  0  0
 0  0      0 29414320 1508316 26174976    0    0   432   253 1464 19237  1  1 98  0  0
zzz ***Mon Feb 13 20:18:01 CST 2012
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 6  0      0 29420676 1508316 26175428    0    0   130   335    1    1  2  1 97  1  0
 0  1      0 29418056 1508316 26175652    0    0   272  2566 1837 21831  1  2 94  3  0
 0  1      0 29418956 1508316 26175476    0    0   352  2906 2067 21219  0  1 96  3  0

## >> note: no more record after this, node got rebooted a few minutes later
2012-02-13 20:26:32.825
[cssd(13407)]CRS-1607:Node racnode2 is being evicted in cluster incarnation 206713158; details at (:CSSNM00007:) in /ocw/grid/log/racnode1/cssd/ocssd.log.
2012-02-13 20:27:16.332
[ohasd(2158)]CRS-8011:reboot advisory message from host: racnode2, component: cssmonit, with time stamp: L-2012-02-13-20:27:16.331
[ohasd(2158)]CRS-8013:reboot advisory message text: Rebooting after limit 28000 exceeded; disk timeout 28000, network timeout 12630, last heartbeat from CSSD at epoch seconds 1329186408.287, 28040 milliseconds ago based on invariant clock value of 1229052344
2012-02-13 20:27:18.980
[cssd(13407)]CRS-1601:CSSD Reconfiguration complete. Active nodes are racnode1 .
2012-02-13 20:27:19.073
[crsd(23378)]CRS-5504:Node down event reported for node 'racnode2'
.



Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms