MySQL Cluster Manager Client (MCM) Shows Incorrect Process Status After Network Interface Reconnects (Doc ID 2146317.1)

Last updated on JUNE 03, 2016

Applies to:

MySQL Cluster - Version 7.1 to 7.4 [Release 7.1 to 7.4]
Information in this document applies to any platform.

Symptoms

Performed a failover test with MySQL Cluster Manager (MCM) and a Cluster system composed of 4 Data nodes hosted by 2 VMs.

In order to simulate a failure, shutdown the network interfaces on one of the 2 VMs hosting ndb but when we activated the network again the "show status" command reported that all process were running however the linux "ps" command showed the contrary (process were down).

Steps to repeat:

  1. Cluster running
  2. Disconnect network interface from 1 ndbmtd 
  3. As expected the other nodes will stop seeing host of the ndbmtd that got disconnected, on both MCM and ndb_mgm 
  4. Connect the network interfaces again, then 
    1. MCM shows the ndbmtd that got disconnected as "running" 
    2. ndb_mgm shows the ndbmtd as not connected, and my 'ps' output doesn't show any ndbmtd process running
  5. So the stop of ndbmtd, then start indeed gets the Data node running properly again

Test:

mcm> show status -r mycluster;
+--------+----------+-------+---------+-----------+-----------+
| NodeId | Process  | Host  | Status  | Nodegroup | Package |
+--------+----------+-------+---------+-----------+-----------+
| 49     | ndb_mgmd | host1 | running |           | version_B |
| 1      | ndbmtd   | host1 | running | 0         | version_B |
| 2      | ndbmtd   | host2 | running | 0         | version_B |
| 3      | ndbmtd   | host3 | running | 1         | version_B |
| 4      | ndbmtd   | host4 | running | 1         | version_B |
| 50     | mysqld   | host1 | running |           | version_B |
| 51     | mysqld   | host3 | running |           | version_B |
+--------+----------+-------+---------+-----------+-----------+
7 rows in set (0.06 sec)

mcm> list hosts mysite;
+-------+-----------+---------+
| Host  | Status    | Version |
+-------+-----------+---------+
| host1 | Available | 1.3.4   |
| host2 | Available | 1.3.4   |
| host3 | Available | 1.3.4   |
| host4 | Available | 1.3.4   |
+-------+-----------+---------+
4 rows in set (0.03 sec)

[root@host1 ~]# ndb_mgm -e "show"
Connected to Management Server at: host1:1186
Cluster Configuration
---------------------
[ndbd(NDB)] 4 node(s)
id=1 @192.168.100.201 (mysql-5.6.17 ndb-7.3.5, Nodegroup: 0, Master)
id=2 @192.168.100.202 (mysql-5.6.17 ndb-7.3.5, Nodegroup: 0)
id=3 @192.168.100.203 (mysql-5.6.17 ndb-7.3.5, Nodegroup: 1)
id=4 @192.168.100.204 (mysql-5.6.17 ndb-7.3.5, Nodegroup: 1)

[ndb_mgmd(MGM)] 1 node(s)
id=49 @192.168.100.201 (mysql-5.6.17 ndb-7.3.5)

[mysqld(API)] 2 node(s)
id=50 @192.168.100.201 (mysql-5.6.17 ndb-7.3.5)
id=51 @192.168.100.203 (mysql-5.6.17 ndb-7.3.5)

DISCONNECT NETWORK INTERFACES OF NODE 2

mcm> list hosts mysite;
+-------+-------------+---------+
| Host  | Status      | Version |
+-------+-------------+---------+
| host1 | Available   | 1.3.4   |
| host2 | Unavailable | Unknown |
| host3 | Available   | 1.3.4   |
| host4 | Available   | 1.3.4   |
+-------+-------------+---------+
4 rows in set (2.08 sec)

mcm> show status -r mycluster;
+--------+----------+-------+---------+-----------+-----------+
| NodeId | Process  | Host  | Status  | Nodegroup | Package   |
+--------+----------+-------+---------+-----------+-----------+
| 49     | ndb_mgmd | host1 | running |           | version_B |
| 1      | ndbmtd   | host1 | running | 0         | version_B |
| 2      | ndbmtd   | host2 | failed  | 0         | version_B |
| 3      | ndbmtd   | host3 | running | 1         | version_B |
| 4      | ndbmtd   | host4 | running | 1         | version_B |
| 50     | mysqld   | host1 | running |           | version_B |
| 51     | mysqld   | host3 | running |           | version_B |
+--------+----------+-------+---------+-----------+-----------+
7 rows in set (0.06 sec)

[root@host1 ~]# ndb_mgm -e "show"
Connected to Management Server at: host1:1186
Cluster Configuration
---------------------
[ndbd(NDB)] 4 node(s)
id=1 @192.168.100.201 (mysql-5.6.17 ndb-7.3.5, Nodegroup: 0, Master)
id=2 (not connected, accepting connect from host2)
id=3 @192.168.100.203 (mysql-5.6.17 ndb-7.3.5, Nodegroup: 1)
id=4 @192.168.100.204 (mysql-5.6.17 ndb-7.3.5, Nodegroup: 1)

[ndb_mgmd(MGM)] 1 node(s)
id=49 @192.168.100.201 (mysql-5.6.17 ndb-7.3.5)

[mysqld(API)] 2 node(s)
id=50 @192.168.100.201 (mysql-5.6.17 ndb-7.3.5)
id=51 @192.168.100.203 (mysql-5.6.17 ndb-7.3.5)

ndb_49_cluster.log:

2016-05-03 15:40:56 [MgmtSrvr] INFO -- Node 4: Node 51: API mysql-5.6.17 ndb-7.3.5
2016-05-03 15:45:17 [MgmtSrvr] ALERT -- Node 49: Node 2 Disconnected
2016-05-03 15:45:20 [MgmtSrvr] WARNING -- Node 3: Node 2 missed heartbeat 2
2016-05-03 15:45:22 [MgmtSrvr] WARNING -- Node 1: GCP Monitor: GCP_COMMIT lag 10 seconds (no max lag)
2016-05-03 15:45:25 [MgmtSrvr] WARNING -- Node 3: Node 2 missed heartbeat 3
2016-05-03 15:45:30 [MgmtSrvr] INFO -- Node 1: Communication to Node 2 closed
2016-05-03 15:45:30 [MgmtSrvr] ALERT -- Node 1: Arbitration check won - node group majority
2016-05-03 15:45:30 [MgmtSrvr] INFO -- Node 1: President restarts arbitration thread [state=6]
2016-05-03 15:45:30 [MgmtSrvr] WARNING -- Node 3: Node 2 missed heartbeat 4
2016-05-03 15:45:30 [MgmtSrvr] ALERT -- Node 3: Node 2 declared dead due to missed heartbeat
2016-05-03 15:45:30 [MgmtSrvr] INFO -- Node 3: Communication to Node 2 closed
2016-05-03 15:45:30 [MgmtSrvr] INFO -- Node 4: Communication to Node 2 closed
2016-05-03 15:45:30 [MgmtSrvr] ALERT -- Node 3: Node 2 Disconnected
...
2016-05-03 15:45:33 [MgmtSrvr] INFO -- Node 4: Communication to Node 2 opened

CONNECT AGAIN NETWORK INTERFACES OF NODE 2

mcm> list hosts mysite;
+-------+-----------+---------+
| Host  | Status    | Version |
+-------+-----------+---------+
| host1 | Available | 1.3.4   |
| host2 | Available | 1.3.4   |
| host3 | Available | 1.3.4   |
| host4 | Available | 1.3.4   |
+-------+-----------+---------+
4 rows in set (0.06 sec)

mcm> show status -r mycluster;
+--------+----------+-------+---------+-----------+-----------+
| NodeId | Process  | Host  | Status  | Nodegroup | Package   |
+--------+----------+-------+---------+-----------+-----------+
| 49     | ndb_mgmd | host1 | running |           | version_B |
| 1      | ndbmtd   | host1 | running | 0         | version_B |
| 2      | ndbmtd   | host2 | running | 0         | version_B |
| 3      | ndbmtd   | host3 | running | 1         | version_B |
| 4      | ndbmtd   | host4 | running | 1         | version_B |
| 50     | mysqld   | host1 | running |           | version_B |
| 51     | mysqld   | host3 | running |           | version_B |
+--------+----------+-------+---------+-----------+-----------+
7 rows in set (0.07 sec)

[root@host1 ~]# ndb_mgm -e "show"
Connected to Management Server at: host1:1186
Cluster Configuration
---------------------
[ndbd(NDB)] 4 node(s)
id=1 @192.168.100.201 (mysql-5.6.17 ndb-7.3.5, Nodegroup: 0, Master)
id=2 (not connected, accepting connect from host2)
id=3 @192.168.100.203 (mysql-5.6.17 ndb-7.3.5, Nodegroup: 1)
id=4 @192.168.100.204 (mysql-5.6.17 ndb-7.3.5, Nodegroup: 1)

[ndb_mgmd(MGM)] 1 node(s)
id=49 @192.168.100.201 (mysql-5.6.17 ndb-7.3.5)

[mysqld(API)] 2 node(s)
id=50 @192.168.100.201 (mysql-5.6.17 ndb-7.3.5)
id=51 @192.168.100.203 (mysql-5.6.17 ndb-7.3.5)

[root@host1 ~]# tail -f
/opt/mcm/mcm_data/clusters/mycluster/49/data/ndb_49_cluster.log
2016-05-03 15:45:30 [MgmtSrvr] WARNING -- Node 3: Node 2 missed heartbeat 4
2016-05-03 15:45:30 [MgmtSrvr] ALERT -- Node 3: Node 2 declared dead due to missed heartbeat
2016-05-03 15:45:30 [MgmtSrvr] INFO -- Node 3: Communication to Node 2 closed
2016-05-03 15:45:30 [MgmtSrvr] INFO -- Node 4: Communication to Node 2 closed
2016-05-03 15:45:30 [MgmtSrvr] ALERT -- Node 3: Node 2 Disconnected
2016-05-03 15:45:30 [MgmtSrvr] ALERT -- Node 4: Node 2 Disconnected
2016-05-03 15:45:30 [MgmtSrvr] ALERT -- Node 1: Node 2 Disconnected
2016-05-03 15:45:33 [MgmtSrvr] INFO -- Node 1: Communication to Node 2 opened
2016-05-03 15:45:33 [MgmtSrvr] INFO -- Node 3: Communication to Node 2 opened
2016-05-03 15:45:33 [MgmtSrvr] INFO -- Node 4: Communication to Node 2 opened
[root@host2 ~]# ps aux | grep -i mcm
mysql 2069 0.8 1.0 465416 22480 ? Ssl 15:34 0:15
/opt/mcm/mcm1.3.4/libexec/mcmd --plugin-dir=/opt/mcm/mcm1.3.4/lib/mcmd
--defaults-file=/etc/mcmd.ini --daemon
root 2400 0.0 0.0 103304 816 pts/1 S+ 16:05 0:00 grep -i mcm
[root@host2 ~]# ps aux | grep -i ndb
root 2402 0.0 0.0 103304 820 pts/1 S+ 16:05 0:00 grep -i ndb
[root@host2 ~]#

Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms