imapd and ims_master Processes Hung Waiting for JMQ
(Doc ID 1538492.1)
Last updated on FEBRUARY 06, 2024
Applies to:
Oracle Communications Messaging Server - Version 7.0.0 and laterInformation in this document applies to any platform.
Symptoms
IMAP response time may degrade or be hung completely.
The ims-ms channel queue may begin to form a backlog. If LMTP is being used, the backlog would be in the LMTP channels on the MTA systems.
The mail.log_current log file may show message delivery being postponed due to "Q" status with reason "Mailbox is busy"
Twice in the last 12 hours the ims-channel stopped delivering messages. In the primary node in the cluster <HOSTNAME>, there were numerous entries like this in mail_log.current:
14-Mar-2013 17:09:03.70 ims-ms Q 5 user@domain.info rfc822;userid@example.com user@ims-ms-daemon <Messaging Server DIR>/data/queue/ims-ms/010/ZZi0i3908ruX1.00 Mailbox is busy Mailbox is busy
The job controller log may show:
Could not open TCP connection to broker 127.0.0.1:0 because 'Portmapper returned invalid input' (1700) Transport protocol could not connect to the broker because 'Portmapper returned invalid input' (1700)
Running dbhang while the problem is happening, before msprobe call for a restart due to IMAP not responding, find ims_master and imapd processes have many threads stuck like:
% pstack-summary.sh 1 lwp_park notify_mailbox_event nsplugin_notify_rename 1 lwp_park nsplugin_notify_expunge_notify storenotify_expunge_notify 6 lwp_park nsplugin_notify_copy storenotify_copy 10 pollsys poll GDispCx_DispatchLoop 18 lwp_park nsplugin_notify_append3 storenotify_append 20 accept _pt_root _lwp_start 20 pollsys poll __1cJensconn_tMdispatchloop6M_v_ 20 pollsys poll _pr_poll_with_poll 21 lwp_park cond_wait_queue cond_wait_common 80 lwp_park PR_EnterMonitor ???????? 340 lwp_cond_wait cond_wait_kernel cond_wait_common 1134 lwp_park nsplugin_notify_changeflags storenotify_changeflags
The 1134 threads are waiting for a lock in the notify plugin code. From that brief summary of the thread stacks, we do not know which one. A similar summary of all mailsrv process pstack outputs shows the same.
The 80 in PR_EnterMonitor are also JMQ notify library waiting for lock, like this:
ffffffff7b1d8818 lwp_park (0, 0, 0) ffffffff7ba2f0d0 PR_EnterMonitor (100361db0, ffffffff64b12cfe, ...) + 28 ffffffff6e128148 ???????? (100344aa8, 84f9, 1ea0c, ...) ffffffff6e14fa2c ???????? (100344a70, 100369b40, ...) ffffffff6e1524a0 ???????? (100344a70, 100368ec0, ...) ffffffff6e144a14 ???????? (1003607c0, 100368ec0, ...) ffffffff6e14a938 ???????? (100369260, 100369a40, ...) ffffffff6e16d870 MQSendMessageExt (100369260, bba, ...) + 140 ffffffff6e30d120 nsplugin_notify_changeflags (ffffffff64b131e8, ...) + 1378 ffffffff7ef6cbb0 storenotify_changeflags (1, 1, 0, ...) + 250 0000000100039750 __1cUcmd_store_send_reply6FpnLstoreargs_t_pnIimap_ctx_i_v_ (...) + 570 00000001000391b4 __1cJcmd_store6FpnIimap_ctx_pc2i_v_ (...) + f14 0000000100026504 firstcmdline (1004b27e0, 0, ...) + ca4 ffffffff7d810318 GDispCx_Dispatch (10036b268, ...) + 208 ffffffff7d810c90 GDispCx_InternalWork (100eafe68, ...) + 388 ffffffff7b1d8778 _lwp_start (0, 0, 0, 0, 0, 0)
The 1134 are like this:
----------------- lwp# 1929 / thread# 1929 -------------------- ffffffff7b1d8818 lwp_park (0, 0, 0) ffffffff6e30bf64 nsplugin_notify_changeflags (100342d00, ...) + 1bc ffffffff7ef6cbb0 storenotify_changeflags (1, 1, 0, ...) + 250 0000000100039750 __1cUcmd_store_send_reply6FpnLstoreargs_t_pnIimap_ctx_i_v_ (...) + 570 00000001000391b4 __1cJcmd_store6FpnIimap_ctx_pc2i_v_ (...) + f14 0000000100026504 firstcmdline (10132c2c0, 0, ...) + ca4 ffffffff7d810318 GDispCx_Dispatch (10036b268, ...) + 208 ffffffff7d810c90 GDispCx_InternalWork (100ca20f8, ...) + 388 ffffffff7b1d8778 _lwp_start (0, 0, 0, 0, 0, 0)
As to whether this is JMQ or ENS notify plugin, note the address of the following routine in the former vs the latter:
ffffffff6e30d120 nsplugin_notify_changeflags (ffffffff64b131e8, ...) + 1378 ffffffff6e30bf64 nsplugin_notify_changeflags (100342d00, ...) + 1bc
The address minus the offset is the same in both: FFFFFFFF6E30BDA8. So this is all the JMQ notify plugin.
Changes
The system is being upgraded to current patch rev of Comms Suite and ISS is being implemented.
Cause
To view full details, sign in with your My Oracle Support account. |
|
Don't have a My Oracle Support account? Click to get started! |
In this Document
Symptoms |
Changes |
Cause |
Solution |
References |