Application Resource Group Always Failed Over to Secondary Node Real Application Cluster (RAC) with "smsNamingServer:Failed to start service" Error Message

(Doc ID 1374840.1)

Last updated on MARCH 20, 2018

Applies to:

Oracle Communications Network Charging and Control - Version 4.3.0 and later
Information in this document applies to any platform.

Symptoms

On SMS( Service and managment System) node, Application resource group always fails over to the secondary Real Application Cluster (RAC) node.

Complete description is illustrated with below examples:

  1. New installation of smsCluster package on top of Oracle RAC environment.

    Example :

        1. Primary node (ncc-sms01)
        2. Secondary node (ncc-sms02)
     
  2. Switch over from secondary to primary node is failed with complete descriptions as below

    System log on ncc-sms02 node:
    -------------------------------

- All resources on ncc-sms02 become offline :

Cluster.RGM.global.rgmd: [ID 529407 daemon.notice] resource group SmsScreens-harg state on node ncc-sms02 change to RG_PENDING_OFFLINE
Cluster.RGM.global.rgmd: [ID 922363 daemon.notice] resource SmsNamingServer-hars status msg on node ncc-sms02 change to <Stopping>
Cluster.RGM.global.rgmd: [ID 529407 daemon.notice] resource group SmsScreens-harg state on node ncc-sms02 change to RG_OFFLINE

- All resources on ncc-sms01 starting to go online :

Cluster.RGM.global.rgmd: [ID 529407 daemon.notice] resource group SmsScreens-harg state on node ncc-sms01 change to RG_PENDING_ONLINE
Cluster.RGM.global.rgmd: [ID 443746 daemon.notice] resource SmsNamingServer-hars state on node ncc-sms01 change to R_STARTING
Cluster.RGM.global.rgmd: [ID 529407 daemon.notice] resource group SmsScreens-harg state on node ncc-sms01 change to RG_ONLINE

- in process start up the resources on ncc-sms01, smsNamingServer get a fault:

Cluster.RGM.global.rgmd: [ID 784560 daemon.notice] resource SmsNamingServer-hars status on node ncc-sms01 change to R_FM_FAULTED
Cluster.RGM.global.rgmd: [ID 922363 daemon.notice] resource SmsNamingServer-hars status msg on node ncc-sms01 change to <Service daemon not running.>
Cluster.RGM.global.rgmd: [ID 529407 daemon.notice] resource group SmsScreens-harg state on node ncc-sms01 change to RG_ON_PENDING_R_RESTART
Cluster.RGM.global.rgmd: [ID 443746 daemon.notice] resource SmsNamingServer-hars state on node ncc-sms01 change to R_ONLINE_UNMON
Cluster.RGM.global.rgmd: [ID 784560 daemon.notice] resource SmsNamingServer-hars status on node ncc-sms01 change to R_FM_UNKNOWN
Cluster.RGM.global.rgmd: [ID 922363 daemon.notice] resource SmsNamingServer-hars status msg on node ncc-sms01 change to <Stopping>

- Resources in ncc-sms01 is failed to established and failed over to ncc-sms02:

Cluster.RGM.global.rgmd: [ID 529407 daemon.error] resource group SmsScreens-harg state on node ncc-sms01 change to RG_OFFLINE_START_FAILED
Cluster.RGM.global.rgmd: [ID 529407 daemon.notice] resource group SmsScreens-harg state on node ncc-sms01 change to RG_OFFLINE
Cluster.RGM.global.rgmd: [ID 529407 daemon.notice] resource group SmsScreens-harg state on node ncc-sms02 change to RG_PENDING_ONLINE
Cluster.RGM.global.rgmd: [ID 224900 daemon.notice] launching method <hafoip_prenet_start> for resource <ncc-sms-screen>, resource group <SmsScreens-harg>, node <ncc-sms02>, timeout <300> seconds
Cluster.RGM.global.rgmd: [ID 784560 daemon.notice] resource ncc-cbt-sms-screen status on node ncc-sms02 change to R_FM_UNKNOWN
Cluster.RGM.global.rgmd: [ID 922363 daemon.notice] resource ncc-cbt-sms-screen status msg on node ncc-sms02 change to <Starting>
Cluster.RGM.global.rgmd: [ID 515159 daemon.notice] method <SmsNamingServer_monitor_start> completed successfully for resource <SmsNamingServer-hars>, resource group <SmsScreens-harg>, node <ncc-sms02>, time used: 0% of timeout <300 seconds>
Cluster.RGM.global.rgmd: [ID 443746 daemon.notice] resource SmsNamingServer-hars state on node ncc-sms02 change to R_ONLINE
Cluster.RGM.global.rgmd: [ID 922363 daemon.notice] resource SmsTaskAgent-hars status msg on node ncc-sms02 change to <Service is online.>

      3. At the same time in the system log file on ncc-sms01, the following error is logged

root@ncc-sms01$ tail /var/adm/messages
Cluster.PMF.pmfd
: [ID 887656 daemon.notice] Process: tag="SmsScreens-harg,SmsNamingServer-hars,0.svc", cmd="/bin/ksh -c /usr/bin/su - smf_oper -c 'exec /IN/service_packages/SMS/bin/smsNamingServerStartup.sh >> /IN/service_packages/SMS/tmp/smsNamingServer.log 2>/IN/service_packages/SMS/tmp/smsNamingServer.log'", Failed to stay up.
SC[.SmsNamingServer:4,SmsScreens-harg,SmsNamingServer-hars,SmsNamingServer_svc_start]: [ID 499150 daemon.error] Failed to start service.

      4. The following error is logged in smsNamingServer.log on ncc-sms01:

root@ncc-sms01$ cat smsNamingServer.log
/u01/app/oracle/product/9.2/lib32/libclntsh.so.9.0: Permission denied

 

Notes:
  • The System log files are located  in /var/adm/messages
  • The smsNamingServer log files are located in /IN/service_packages/SMS/tmp
  • To perform switchover :  -  login at Service Management System (SMS) as root user

                                  - execute command  "clrg switch -M -n <node_name> <resource_Group>" or  scswitch -z -g <resource_group> -h <node_name>

Changes

Recent installation of the smsCluster package.

Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms