My Oracle Support Banner

Service Management Framework (SMF) Definition Will Not Go Offline Until All tcp_smtp_server Processes Finish (Doc ID 2204468.1)

Last updated on JULY 09, 2024

Applies to:

Oracle Communications Messaging Server - Version 8.0.1 and later
Information in this document applies to any platform.

Symptoms

The Oracle-provided example SMF definition results in Service Management Framework (SMF) thinking it should manage/monitor all the descendant processes of the processes it starts.
This is contrary to the Messaging Server design of having watcher managing watching and restarting things, as appropriate.
It results in the SMF service refusing to go completely "offline" until all the tcp_smtp_server processes have exited.
That means "svcadm restart messaging_server" is likely to fail because SMF times out waiting for the tcp_smtp_server children of the old dispatcher process.

Likewise, the same scenario applies to SMF's behavior of watching imsched's children.

There must be some way for the SMF definition to result in behavior more in line with the Messaging Server watcher design.
SMF should watch watcher and pretty much nothing else.

We would expect that when doing "svcadm restart messaging_server", it should restart the service nearly instantly.

Instead of "svcadm restart messaging_server" we have to perform the following steps:

 - svcadm disable messaging_server
 - wait for a while, doing "ps -ef|grep tcp_smtp_server"
 - eventually get tired of waiting and pkill -9 tcp_smtp_server
 - svcadm enable messaging_server

This changes the restart action on an MTA and/or MMP system from something which should be almost 0 impact -- the server should restart and begin taking connections again almost instantly; there should be no need to disable monitoring because the SMTP and IMAP ports should stop listening and begin listening again before anything could notice -- to an outage lasting several minutes.

Some additional information related to this:  How to tell why SMF thinks a service has not shut down?

Refer to:  https://blogs.oracle.com/observatory/post/associating-a-pid-with-a-service

Noting the following from the above link:

 * the -o ctid switch on the ps command
 * the
ctstat command
 * the
-l and -p switches on the svcs command

To find which "contract" a process is associated with:

$ ps -eo 'user pid ppid ctid args' | grep smtp_server
mailsrv  7124  6167  3632 /opt/sun/comms/messaging64/lib/tcp_smtp_server
mailsrv  7123  6167  3632 /opt/sun/comms/messaging64/lib/tcp_smtp_server
mailsrv  8789  6547  3612 grep smtp_server

To see what SMF service is that ctid:

$ ctstat -vi 3632
CTID    ZONEID  TYPE    STATE   HOLDER  EVENTS  QTIME   NTIME
3632    5       process owned   8947    0       -       -
       cookie:                0x20
       informative event set: none
       critical event set:    hwerr empty
       fatal event set:       none
       parameter set:         inherit regent
       member processes:      7115 7116 7117 7119 7123 7124 7125 7126 7127 7128 7129
       inherited contracts:   none
       service fmri:          svc:/network/messaging_server:default
       service fmri ctid:     3632
       creator:               svc.startd
       aux:                   start

Note it also shows the process ids running under that contract.

Or, to see the processes via the svcs command:


$ svcs -lp messaging_server

Cause

To view full details, sign in with your My Oracle Support account.

Don't have a My Oracle Support account? Click to get started!


My Oracle Support provides customers with access to over a million knowledge articles and a vibrant support community of peers and Oracle experts.