My Oracle Support Banner

Diameter Signaling Router (DSR): Server Does Not Recover After Restart; procmgr#31200 Log Shows Bind Error (Doc ID 2463356.1)

Last updated on AUGUST 02, 2019

Applies to:

Oracle Communications Diameter Signaling Router (DSR) - Version DSR 7.0 and later
Information in this document applies to any platform.

Symptoms

After server or process restart (via GUI), or after reboot the server fails to come back up, shows 'OOS' under NOAM GUI / Main Menu / Status and Manage / HA, and may not appear among Servers listed in GUI.

Executing 'prod.state' will show DbUp but provides no DB state message (A/O/X/etc.).

----------------------
[admusr@server ~]$ prod.state
                ...prod.state  (RUNID=00)...
                ...getting current state...
Current state:  DbUp  (database is loaded)
----------------------


Executing 'prod.start' will flow through DbUp but fails on 'waiting for procmgr'
(NOTE: same output can be found in /var/TKLC/rundb/run/log/prod.log or prod.log within savelogs tarball)

----------------------
[admusr@server ~]$ sudo prod.start
                ...prod.start  (RUNID=00)...
                ...getting current state...
Current state:  DbUp  (database is loaded)
                ...starting procmgr ...
                ...waiting for state [XA]...
waiting for procmgr
<>
waiting for procmgr

************** !!!!!!!!!!!!!!!!!!! *******************
***
*** prod.start ABORTING: timed out waiting for state [XA] [no procmgr]
***    NOTE: manual recovery may be required
***   + procmgr may not be running or just responding very slowly.
***   + check whether procmgr is active by running "pm.getstate"
***     and/or by running "iqt PmState" and verifing the 'pid'
***     is valid using "ps" and 'sanity' is being updated,
***     and/or by looking at error message for clues.
***   + if not running, look at its log file for clues by
***     running a command similar to: 'log.tail -30 procmgr'.
***   + the usual reasons for this are related to (1) sluggish
***     system, (2) permissions, or (3) unavailable resources.
***
************** !!!!!!!!!!!!!!!!!!! *******************
----------------------


Executing 'pm.getprocs' will indicate procmgr not running

----------------------
[admusr@server ~]$ pm.getprocs

   10/22/2018 14:04:20  pm.wakeup#31000{S/W Fault}
** GN_INVAL/FTL procmgr not running [PmCtl.cxx:328]
  ^^ PmCtl::wakeUpPm() [PmCtl.cxx:352]
  ^^ [8215:talkToPM.cxx:123]

ERROR: Could not wake up procmgr!  Is procmgr running?
----------------------


Looking at the logdata from procmgr reveals a bind error:
(Note: in savelogs or via 'tr.tail -100 procmgr')

----------------------
  10/22/2018 06:59:43 procmgr#31200{Process Management Fault}
** E_ADDRNOTAVAIL/FTL cannot bind [SockD.cxx:48]            <====== BIND ERROR ======
  ^^ bind(s=3, name=0x24cc8e0, namelen=128) [SockD.cxx:50]
  ^^ address: AF_INET6:[::1]:17406 [SocketPeerLink.cxx:432 ] <====== IPV6 LOOPBACK ======
  ^^ port: 17406 [SocketPeerLink.cxx:555]
  ^^ SocketPeerLink::init() [SocketPeerLink.cxx:603]
  ^^ PhysPollable::reset() [PhysPollable.cxx:244]
  ^^ PmServer::init Failed listen() [procmgr.cxx:1640]
  ^^ [5377:procmgr.cxx:420]
1022:065943.932 TR-V PROGRAM TERMINATED -- procmgr code=1 [5377/ProcUtil.cxx:484]
----------------------

 

Changes

 Operator initiative was made in recent history to disable IPv6 on linux servers and elsewhere in the network.

Cause

To view full details, sign in with your My Oracle Support account.

Don't have a My Oracle Support account? Click to get started!


In this Document
Symptoms
Changes
Cause
Solution


My Oracle Support provides customers with access to over a million knowledge articles and a vibrant support community of peers and Oracle experts.