My Oracle Support Banner

Oracle Linux: Hung NFS Tasks Reported At "nfs_initiate_write" After RDS / IB Errors (Doc ID 2098760.1)

Last updated on AUGUST 04, 2018

Applies to:

Linux OS - Version Enterprise Linux 3.0 and later
Linux x86-64

Symptoms

 On Oracle Linux, after the following kind of error message is reported in the /var/log/messages file :

kernel: ib0: ipoib_cm_handle_tx_wc: failed cm send event (status=12, wrid=379 vend_err 85)
kernel: ib0: ipoib_cm_handle_tx_wc: failed cm send event (status=12, wrid=203 vend_err 81)

 

The following kind of error messages possibly being logged as well :

kernel: RDS/IB: connection <10.110.144.23,10.110.144.42,0> dropped
kernel: RDS/IB: send completion <10.110.144.23,10.110.144.42,0> had status 12, disconnecting and reconnecting

After a while, the following kind of hung task stack trace is getting reported :

kernel: INFO: task tr:112298 blocked for more than 120 seconds.
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this

kernel: tr D ffff8833dce5a6a8 0 112298 110975 0x0000008

kernel: ffff883403b67c88 0000000000000086 ffff883403b67c88 ffffffffa03cb

kernel: 0000000000011180 ffff883403b67fd8 ffff883403b66010 0000000000011

kernel: ffff883403b67fd8 0000000000011180 ffff8835132e8140 ffff8833dce5a

kernel: Call Trace:
kernel: [<ffffffffa03cbdcd>] ? nfs_initiate_write+0xdd/0x150 [nfs]
kernel: [<ffffffff8109c10d>] ? ktime_get_ts+0xad/0xe0
kernel: [<ffffffff81110a00>] ? __lock_page+0x70/0x70
kernel: [<ffffffff8150b84f>] schedule+0x3f/0x60
kernel: [<ffffffff8150b8fc>] io_schedule+0x8c/0xd0
kernel: [<ffffffff81110a0e>] sleep_on_page+0xe/0x20
kernel: [<ffffffff8150bfef>] __wait_on_bit+0x5f/0x90
kernel: [<ffffffff81110c53>] wait_on_page_bit+0x73/0x80
kernel: [<ffffffff810916a0>] ? autoremove_wake_function+0x50/0x50
kernel: [<ffffffff8111caa5>] ? pagevec_lookup_tag+0x25/0x40
kernel: [<ffffffff81111213>] filemap_fdatawait_range+0x113/0x1a0
kernel: [<ffffffff811113a0>] filemap_write_and_wait_range+0x70/0x80
kernel: [<ffffffff81198aad>] vfs_fsync_range+0x5d/0xa0
kernel: [<ffffffff81198b5c>] vfs_fsync+0x1c/0x20
kernel: [<ffffffffa03baef4>] nfs_file_flush+0x54/0x80 [nfs]
kernel: [<ffffffff8116b23c>] filp_close+0x3c/0x90
kernel: [<ffffffff8116b347>] sys_close+0xb7/0x120
kernel: [<ffffffff81515d02>] system_call_fastpath+0x16/0x1b
kernel: INFO: task date:113772 blocked for more than 120 seconds.

 

Cause

To view full details, sign in with your My Oracle Support account.

Don't have a My Oracle Support account? Click to get started!


In this Document
Symptoms
Cause
Solution
References

My Oracle Support provides customers with access to over a million knowledge articles and a vibrant support community of peers and Oracle experts.