Messaging Server Does Not Start Under Sun Cluster; Starts Up Fine Manually

(Doc ID 1402595.1)

Last updated on AUGUST 25, 2017

Applies to:

Oracle Communications Messaging Server - Version: 6.3-6 and later   [Release: 6.3 and later ]
Information in this document applies to any platform.

Symptoms


During a scheduled fail-over test, we detected that the service is not starting under sun cluster within zone IMS1 of node backend1.  It starts without problem within zone IMS1 of node backend2 under sun cluster control.  The messaging service also starts without problem when it is started manually using the command start-msg ha.

There are other resources having the same behavior.

### imsimta version ###
Sun Java(tm) System Messaging Server 6.3-6.03 (built Mar 14 2008; 32bit)
libimta.so 6.3-6.03 (built 17:12:37, Mar 14 2008; 32bit)
SunOS serverx 5.10 Generic_127111-11 sun4v sparc SUNW,SPARC-Enterprise-T5220

There are no recent changes within the platform.

### Workaround ###
We disabled the resource for messaging (sun cluster level) and it is started manually.

When the problem occurred, the sun cluster-backend1 log showed the following error:

Jan 13 11:29:04 backend1 Cluster.RGM.rgmd: [ID 224900 daemon.notice] launching method <ims_svc_start> for resource <ims1-mail-rs>, resource group <lg1-rg>, node <backend1:IMS1>, timeout <300> seconds
Jan 13 11:29:04 backend1 Cluster.RGM.rgmd: [ID 333393 daemon.notice] 49 fe_rpc_command: cmd_type(enum):<1>:cmd=</opt/SUNWscims/bin/ims_svc_start>:tag=<IMS1.lg1-rg.ims1-mail-rs.0>: Calling security_clnt_connect(..., host=<backend1>, sec_type {0:WEAK, 1:STRONG, 2:DES} =<1>, ...)

** Jan 13 11:34:13 backend1 Cluster.RGM.rgmd: [ID 764140 daemon.error] Method <ims_svc_start> on resource <ims1-mail-rs>, resource group <lg1-rg>, node <backend1:IMS1>: Timeout.
Jan 13 11:34:13 backend1 Cluster.RGM.rgmd: [ID 224900 daemon.notice] launching method <hafoip_monitor_stop> for resource <lg1-ip-res>, resource group <lg1-rg>, node <backend1:IMS1>, timeout <300> seconds

From the above, note the time between 11:29:04 and 11:34:13 when the Timeout error appeared. The ps -ef output provided by the customer gives us the start time of 11:29:26, so after 11:29:04 there is nothing until the timeout error appears at 11:34:13.

In the http log, during that time frame, we see (these are the last lines in that log):

[13/Jan/2012:11:27:43 -0300] serverx httpd[23644]: General Warning: Sun Java(tm) System Messaging Server mshttpd 6.3-6.03 (built Mar 14 2008; 32bit) shutting down
[13/Jan/2012:11:38:36 -0300] serverx httpd[23528]: General Warning: mscertd_initialize: configuration has SMIME disabled
[13/Jan/2012:11:38:37 -0300] serverx httpd[23528]: General Warning: Sun Java(tm) System Messaging Server mshttpd 6.3-6.03 (built Mar 14 2008; 32bit) starting up
[13/Jan/2012:11:38:47 -0300] serverx httpd[23528]: Network Error: SMTP connect to failed: Network is unreachable
[13/Jan/2012:11:41:25 -0300] serverx httpd[23528]: General Warning: Sun Java(tm) System Messaging Server mshttpd 6.3-6.03 (built Mar 14 2008; 32bit) shutting down

Imta log shows the following:

[13/Jan/2012:11:38:39 -0300] serverx ims_master[23545]: General Notice: Sun Java(tm) System Messaging Server ims_master 6.3-6.03 (built Mar 14 2008; 32bit) starting up
[13/Jan/2012:11:38:39 -0300] serverx ims_master[23545]: General Error: ldappool: new connection failed: Can't connect to the LDAP server (No route to host)
[13/Jan/2012:11:39:44 -0300] serverx ims_master[23545]: General Notice: Sun Java(tm) System Messaging Server ims_master 6.3-6.03 (built Mar 14 2008; 32bit) shutting down
[13/Jan/2012:11:45:40 -0300] serverx ims_master[9792]: General Notice: Sun Java(tm) System Messaging Server ims_master 6.3-6.03 (built Mar 14 2008; 32bit) starting up

The job_controller.log shows the same for that time period:

[13/Jan/2012:11:38:39 -0300] serverx [23544]: General Error: ldappool: new connection failed: Can't connect to the LDAP server (No route to host)
[13/Jan/2012:11:38:39 -0300] serverx [23544]: General Error: ldappool: new connection failed: Can't connect to the LDAP server (No route to host)

Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms