Service Management Framework (SMF) Definition Will Not Go Offline Until All tcp_smtp_server Processes Finish
(Doc ID 2204468.1)
Last updated on JULY 09, 2024
Applies to:
Oracle Communications Messaging Server - Version 8.0.1 and laterInformation in this document applies to any platform.
Symptoms
The Oracle-provided example SMF definition results in Service Management Framework (SMF) thinking it should manage/monitor all the descendant processes of the processes it starts.
This is contrary to the Messaging Server design of having watcher managing watching and restarting things, as appropriate.
It results in the SMF service refusing to go completely "offline" until all the tcp_smtp_server processes have exited.
That means "svcadm restart messaging_server" is likely to fail because SMF times out waiting for the tcp_smtp_server children of the old dispatcher process.
Likewise, the same scenario applies to SMF's behavior of watching imsched's children.
There must be some way for the SMF definition to result in behavior more in line with the Messaging Server watcher design.
SMF should watch watcher and pretty much nothing else.
We would expect that when doing "svcadm restart messaging_server", it should restart the service nearly instantly.
Instead of "svcadm restart messaging_server" we have to perform the following steps:
- svcadm disable messaging_server
- wait for a while, doing "ps -ef|grep tcp_smtp_server"
- eventually get tired of waiting and pkill -9 tcp_smtp_server
- svcadm enable messaging_server
This changes the restart action on an MTA and/or MMP system from something which should be almost 0 impact -- the server should restart and begin taking connections again almost instantly; there should be no need to disable monitoring because the SMTP and IMAP ports should stop listening and begin listening again before anything could notice -- to an outage lasting several minutes.
Some additional information related to this: How to tell why SMF thinks a service has not shut down?
Refer to: https://blogs.oracle.com/observatory/post/associating-a-pid-with-a-service
Noting the following from the above link:
* the -o ctid switch on the ps command
* the ctstat command
* the -l and -p switches on the svcs command
To find which "contract" a process is associated with:
$ ps -eo 'user pid ppid ctid args' | grep smtp_server
mailsrv 7124 6167 3632 /opt/sun/comms/messaging64/lib/tcp_smtp_server
mailsrv 7123 6167 3632 /opt/sun/comms/messaging64/lib/tcp_smtp_server
mailsrv 8789 6547 3612 grep smtp_server
To see what SMF service is that ctid:
$ ctstat -vi 3632
CTID ZONEID TYPE STATE HOLDER EVENTS QTIME NTIME
3632 5 process owned 8947 0 - -
cookie: 0x20
informative event set: none
critical event set: hwerr empty
fatal event set: none
parameter set: inherit regent
member processes: 7115 7116 7117 7119 7123 7124 7125 7126 7127 7128 7129
inherited contracts: none
service fmri: svc:/network/messaging_server:default
service fmri ctid: 3632
creator: svc.startd
aux: start
Note it also shows the process ids running under that contract.
Or, to see the processes via the svcs command:
$ svcs -lp messaging_server
Cause
To view full details, sign in with your My Oracle Support account. |
|
Don't have a My Oracle Support account? Click to get started! |