Journal Nodes Exhibit Fsync Latency Alerts When Under Load on the BDA Cluster (Doc ID 2142854.1)

Last updated on AUGUST 08, 2017

Applies to:

Big Data Appliance Integrated Software - Version 4.2.0 and later
Linux x86-64

Symptoms

On BDA V4.2/CDH 5.4.0 Cloudera Manager JournalNode alerts on JOURNAL_NODE_FSYNC_LATENCY are received when the cluster is under load. The alert reports:

The health test result for JOURNAL_NODE_FSYNC_LATENCY has become bad:
The 99th percentile fsync latency over the previous minute is 3.2 second(s). Critical threshold: 3 second(s).
Time: Apr 12, 2016 1:44:18 PM
View Details on bdanode0n.example.com
Monitor Startup: false
Role: journalnode (bdanode0x)
Role Type: JournalNode
Cluster: <cluster-name>
Cluster Display Name: <name>
Service: hdfs
Service Display Name: hdfs
Service Type: HDFS
Hosts: bdanode0x.example.com
Health Test Results:
Health Test Name Event Code Severity Content
JOURNAL_NODE_FSYNC_LATENCY Role health test bad Critical The health test result for
JOURNAL_NODE_FSYNC_LATENCY has become bad: The 99th percentile fsync latency
over the previous minute is 3.2 second(s).
Critical threshold: 3 second(s).

Additional information:

1. The Java Heap Size of JournalNode(JN) in Bytes on the node in question, (bdanode0x), is 256 MB.

2. 'iostat -d' on the node to display the device utilization report shows:

Linux 2.6.39-400.249.3.el6uek.x86_64 (bdanode0x.example.com) 04/12/2016 _x86_64_ (72 CPU)

Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sdb 27.94 1720.18 628.38 18274377089 6675628188
sdd 0.83 45.35 284.50 481762154 3022368272
sde 0.82 45.90 281.68 487615106 2992446328
sdf 0.82 46.19 283.38 490666178 3010498776
sdg 0.82 47.42 283.59 503799010 3012737152
sdh 0.82 45.67 285.37 485229162 3031647352
sdi 0.82 46.12 283.11 489904874 3007661160
sdk 0.82 46.96 282.74 498860002 3003670808
sdl 0.82 45.85 284.88 487070114 3026456576
sdj 0.82 49.00 284.70 520594482 3024470656
sdc 0.81 46.90 282.51 498223690 3001230744
sda 27.94 1718.80 627.72 18259760191 6668636692
sdm 0.00 0.00 0.00 41456 0
md0 0.00 0.00 0.00 5310 1212
md2 21.46 35.00 335.91 371812572 3568545392

There is significantly higher Block Reads on the OS disks sda/sdb.

Note the blk_read/s is: The number of blocks per second read from the device.

Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms