Node Crash With Error 2341 DbtcMain.cpp ; Whatchdog Is Killing The Hard Way (Doc ID 1983412.1)

Last updated on DECEMBER 28, 2016

Applies to:

MySQL Cluster - Version 7.2 to 7.3 [Release 7.2 to 7.3]
Information in this document applies to any platform.

Symptoms

Data node crashes with the following error:

Time: Thursday 29 January 2015 - 04:35:22
Status: Temporary error, restart node
Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug)
Error: 2341
Error data: DbtcMain.cpp
Error object: DBTC (Line: 8327) 0x00000002
Program: ndbmtd
Pid: 34407 thr: 6
Version: mysql-5.5.35 ndb-7.2.15
Trace: /user/database/log/ndb_4_trace.log.2 [t1..t11]
***EOM***

---------------------

Another example of same issue is below.

MySQL Cluster was operating normally with no recent changes: no new schema / users / load being added.

Expected typical or baseline load on the ndbmtd nodes but nothing which appears to overwhelm the OS according to typical metrics ( load average etc ).

The entire ndb cluster crashed , however on investigation it can be seen that a few nodes crashed with a specific issue, which led to a cascading failure throughout the cluster due to Node Groups being unavailable and thus Cluster is forced down to protect the data.

Errors in ndb error reporter file show hard failure of the first node. Then subsequent nodes fail due to the already high load, causing the remaining nodes to become likewise overloaded, before a complete shutdown is forced.

2015-10-17 14:53:23 [ndbd] WARNING -- Ndb kernel thread 10 is stuck in: Packing Send Buffers elapsed=302
2015-10-17 14:53:23 [ndbd] INFO -- Watchdog: User time: 61838125 System time: 13439294
2015-10-17 14:53:23 [ndbd] WARNING -- Ndb kernel thread 11 is stuck in: Packing Send Buffers elapsed=302
2015-10-17 14:53:23 [ndbd] INFO -- Watchdog: User time: 61838125 System time: 13439294
2015-10-17 14:53:23 [ndbd] WARNING -- Ndb kernel thread 12 is stuck in: Packing Send Buffers elapsed=1347
2015-10-17 14:53:23 [ndbd] INFO -- Watchdog: User time: 61838125 System time: 13439294
2015-10-17 14:53:23 [ndbd] WARNING -- thr: 7: Overslept 1575 ms, expected ~10ms
2015-10-17 14:53:23 [ndbd] WARNING -- thr: 6: Overslept 1710 ms, expected ~10ms
2015-10-17 14:53:23 [ndbd] INFO -- Internal program error (failed ndbrequire)
2015-10-17 14:53:23 [ndbd] INFO -- DBTC (Line: 7526) 0x00000002
2015-10-17 14:53:23 [ndbd] INFO -- Error handler shutting down system
2015-10-17 14:53:23 [ndbd] INFO -- Error handler shutdown completed - exiting
2015-10-17 14:53:25 [ndbd] ALERT -- Node 1: Forced node shutdown completed. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or missing error message, please report a bug). Temporary
error, restart node'.
2015-10-17 15:33:48 [ndbd] INFO -- Angel pid: 30653 started child: 30654
2015-10-17 15:33:48 [ndbd] INFO -- Configuration fetched from 'cn-db-ndbmgr01:1190', generation: 1
ThreadConfig: input: main,ldm,ldm,ldm,ldm,ldm,ldm,ldm,ldm,recv,recv,rep,tc,tc,tc,send LockExecuteThreadToCPU: => parsed: main,ldm,ldm,ldm,ldm,ldm,ldm,ldm,ldm,recv,recv,rep,tc,tc,tc,send
NDBMT: MaxNoOfExecutionThreads=16
NDBMT: workers=8 threads=8 tc=3 send=1 receive=2
2015-10-17 15:33:48 [ndbd] INFO -- NDB Cluster -- DB node 1
2015-10-17 15:33:48 [ndbd] INFO -- mysql-5.5.35 ndb-7.2.15 --
2015-10-17 15:33:48 [ndbd] INFO -- numa_set_interleave_mask(numa_all_nodes) : OK
2015-10-17 15:33:48 [ndbd] INFO -- Ndbd_mem_manager::init(1) min: 37838Mb initial: 37966Mb
Adding 873Mb to ZONE_LO (1,27916)

Consulting the Management cluster logs we can see the initial issue, then the cascading node failure.

[ronan@hp ndb_error_report_20151017163646]$ grep ALERT cluster_merge.log| grep Forced | awk '{$1=$3=$4=""; print}'| cut -c1-160
14:53:25 -- Node 1: Forced node shutdown completed. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or mi
14:53:27 -- Node 10: Forced node shutdown completed. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or m
14:53:27 -- Node 18: Forced node shutdown completed. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or m
14:53:27 -- Node 20: Forced node shutdown completed. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or m
14:53:29 -- Node 17: Forced node shutdown completed. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or m
14:53:30 -- Node 4: Forced node shutdown completed. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or mi
14:53:32 -- Node 13: Forced node shutdown completed. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or m
14:53:34 -- Node 9: Forced node shutdown completed. Caused by error 2305: 'Node lost connection to other nodes and can not form a unpartitioned cluster, plea
14:53:36 -- Node 22: Forced node shutdown completed. Caused by error 2305: 'Node lost connection to other nodes and can not form a unpartitioned cluster, ple
14:53:38 -- Node 21: Forced node shutdown completed. Caused by error 2305: 'Node lost connection to other nodes and can not form a unpartitioned cluster, ple
14:53:40 -- Node 11: Forced node shutdown completed. Caused by error 2305: 'Node lost connection to other nodes and can not form a unpartitioned cluster, ple
14:53:42 -- Node 19: Forced node shutdown completed. Caused by error 2305: 'Node lost connection to other nodes and can not form a unpartitioned cluster, pl

Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms