imapd and ims_master Processes Hung Waiting for JMQ (Doc ID 1538492.1)

Last updated on SEPTEMBER 21, 2016

Applies to:

Oracle Communications Messaging Server - Version 7.0.0 and later
Information in this document applies to any platform.

Symptoms

IMAP response time may degrade or be hung completely.

The ims-ms channel queue may begin to form a backlog. If LMTP is being used, the backlog would be in the LMTP channels on the MTA systems.

The mail.log_current log file may show message delivery being postponed due to "Q" status with reason "Mailbox is busy"


Twice in the last 12 hours the ims-channel stopped delivering messages. In the primary node in the cluster, mail3.abc.com, there were numerous entries like this in mail_log.current:

14-Mar-2013 17:09:03.70 ims-ms Q 5 logan@trading.info rfc822;j_koff@fe.abc.com koff@ims-ms-daemon /opt/sun/comms/messaging64/data/queue/ims-ms/010/ZZi0i3908ruX1.00 Mailbox is busy Mailbox is busy

The job controller log may show:

Could not open TCP connection to broker 127.0.0.1:0 because 'Portmapper returned invalid input' (1700)
Transport protocol could not connect to the broker because 'Portmapper returned invalid input' (1700)

Running dbhang while the problem is happening, before msprobe call for a restart due to IMAP not responding, find ims_master and imapd processes have many threads stuck like:

% pstack-summary.sh
  1 lwp_park notify_mailbox_event nsplugin_notify_rename
  1 lwp_park nsplugin_notify_expunge_notify storenotify_expunge_notify
  6 lwp_park nsplugin_notify_copy storenotify_copy
 10 pollsys poll GDispCx_DispatchLoop
 18 lwp_park nsplugin_notify_append3 storenotify_append
 20 accept _pt_root _lwp_start
 20 pollsys poll __1cJensconn_tMdispatchloop6M_v_
 20 pollsys poll _pr_poll_with_poll
 21 lwp_park cond_wait_queue cond_wait_common
 80 lwp_park PR_EnterMonitor ????????
340 lwp_cond_wait cond_wait_kernel cond_wait_common
1134 lwp_park nsplugin_notify_changeflags storenotify_changeflags

The 1134 threads are waiting for a lock in the notify plugin code. From that brief summary of the thread stacks, we do not know which one. A similar summary of all mailsrv process pstack outputs shows the same.

The 80 in PR_EnterMonitor are also JMQ notify library waiting for lock, like this:

ffffffff7b1d8818 lwp_park (0, 0, 0)
ffffffff7ba2f0d0 PR_EnterMonitor (100361db0, ffffffff64b12cfe, ...) + 28
ffffffff6e128148 ???????? (100344aa8, 84f9, 1ea0c, ...)
ffffffff6e14fa2c ???????? (100344a70, 100369b40, ...)
ffffffff6e1524a0 ???????? (100344a70, 100368ec0, ...)
ffffffff6e144a14 ???????? (1003607c0, 100368ec0, ...)
ffffffff6e14a938 ???????? (100369260, 100369a40, ...)
ffffffff6e16d870 MQSendMessageExt (100369260, bba, ...) + 140
ffffffff6e30d120 nsplugin_notify_changeflags (ffffffff64b131e8, ...) + 1378
ffffffff7ef6cbb0 storenotify_changeflags (1, 1, 0, ...) + 250
0000000100039750 __1cUcmd_store_send_reply6FpnLstoreargs_t_pnIimap_ctx_i_v_ (...) + 570
00000001000391b4 __1cJcmd_store6FpnIimap_ctx_pc2i_v_ (...) + f14
0000000100026504 firstcmdline (1004b27e0, 0, ...) + ca4
ffffffff7d810318 GDispCx_Dispatch (10036b268, ...) + 208
ffffffff7d810c90 GDispCx_InternalWork (100eafe68, ...) + 388
ffffffff7b1d8778 _lwp_start (0, 0, 0, 0, 0, 0)

The 1134 are like this:

-----------------  lwp# 1929 / thread# 1929  --------------------
ffffffff7b1d8818 lwp_park (0, 0, 0)
ffffffff6e30bf64 nsplugin_notify_changeflags (100342d00, ...) + 1bc
ffffffff7ef6cbb0 storenotify_changeflags (1, 1, 0, ...) + 250
0000000100039750 __1cUcmd_store_send_reply6FpnLstoreargs_t_pnIimap_ctx_i_v_ (...) + 570
00000001000391b4 __1cJcmd_store6FpnIimap_ctx_pc2i_v_ (...) + f14
0000000100026504 firstcmdline (10132c2c0, 0, ...) + ca4
ffffffff7d810318 GDispCx_Dispatch (10036b268, ...) + 208
ffffffff7d810c90 GDispCx_InternalWork (100ca20f8, ...) + 388
ffffffff7b1d8778 _lwp_start (0, 0, 0, 0, 0, 0)

As to whether this is JMQ or ENS notify plugin, note the address of the following routine in the former vs the latter:

ffffffff6e30d120 nsplugin_notify_changeflags (ffffffff64b131e8, ...) + 1378
ffffffff6e30bf64 nsplugin_notify_changeflags (100342d00, ...) + 1bc

The address minus the offset is the same in both: FFFFFFFF6E30BDA8. So this is all the JMQ notify plugin.

 

Changes

The system is being upgraded to current patch rev of Comms Suite and ISS is being implemented.

Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms