saAmfSURestartCount and saAmfCompRestartCount Not Reset after Probation Period Expiration (Doc ID 1663005.1)

Last updated on APRIL 25, 2014

Applies to:

Oracle Communications OpenSAFfire - Version 6.3.0 and later
Information in this document applies to any platform.

Symptoms

In OpenSAFfire 6.3.0, when viewing the saAmfSURestartCount and saAmfCompRestartCount runtime attributes via immlist, neither appear to reset after the appropriate restart probation period has expired or the recovery escalation algorithm has completed and the system has returned to a steady state.

These values are used to track the number of times the Service Unit (SU) or Component has been restarted since first stage of recovery was undertaken by the Application Management Framework (AMF).

The component/SU recovery should be escalated to the next level when:

saAmfCompRestartCount exceeds SaAmfSG.saAmfSGCompRestartMax within SaAmfSG.saAmfSGCompRestartProb

saAmfSURestartCount exceeds SaAmfSG.saAmfSGSuRestartMax within SaAmfSG.saAmfSGSuRestartProb

For example (assume two component restarts is escalated to a SU restart and two SU restarts is escalated to SU failover:

1. Initial saAmfSURestartCount before any system failures

[root@SC1 ~]# immlist -a saAmfSURestartCount safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1
saAmfSURestartCount=0

2. Component "fails" four times in quick succession

[root@SC1 ~]# pkill amf_demo && immlist -a saAmfSURestartCount safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1
saAmfSURestartCount=0

Apr  9 14:21:41 SC1 osafamfnd[10434]: NO 'safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' faulted due to 'avaDown' : Recovery is 'componentRestart'
[root@SC1 ~]# pkill amf_demo && immlist -a saAmfSURestartCount safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1
saAmfSURestartCount=0

Apr  9 14:21:42 SC1 osafamfnd[10434]: NO 'safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' faulted due to 'avaDown' : Recovery is 'componentRestart'
[root@SC1 ~]# pkill amf_demo && immlist -a saAmfSURestartCount safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1
saAmfSURestartCount=1

Apr  9 14:21:43 SC1 osafamfnd[10434]: NO 'safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' faulted due to 'avaDown' : Recovery is 'suRestart'
[root@SC1 ~]# pkill amf_demo && immlist -a saAmfSURestartCount safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1
saAmfSURestartCount=2

Apr  9 14:21:44 SC1 osafamfnd[10434]: NO 'safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' faulted due to 'avaDown' : Recovery is 'suRestart'

 3. After waiting for the probabation period to expire, further component failures begin again at the first level of system recovery (componentRestart) however the saAmfSURestartCount remains at 2

[root@SC1 ~]# pkill amf_demo && immlist -a saAmfSURestartCount safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1
saAmfSURestartCount=2

Apr  9 14:21:55 SC1 osafamfnd[10434]: NO 'safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' faulted due to 'avaDown' : Recovery is 'componentRestart'

Changes

Component/SU failure resulting in restart actions being taken by AMF.

Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms