My Oracle Support Banner

High Number of LWPs in AService or Imapd Processes (Doc ID 1346043.1)

Last updated on JANUARY 03, 2023

Applies to:

Oracle Communications Messaging Server - Version 5.2.0 and later
Information in this document applies to any platform.

Symptoms

The AService process ran at maxthreads (250) all day. Normally it is only around 40.

Response time for all traffic through the MMP may begin to increase.

The amount of CPU used by the AService process (as shown by prstat or top) increases and vmstat shows it mostly system (kernel) mode CPU rather than user mode.

A prstat -L shows LWP 1 of the AService process is consuming a percentage which is equal to 100% of one processor.

For example

On a T5220 which has 1 CPU, with 8 cores, and 8 threads per core = 64 virtual processors (ie. psrinfo | wc -l == 64) with each zone running MMP handling about 10,000 simultaneous connections, on one of those zones:

prstat

Note that other calls may happen more often, but it is poll in main/dispatch thread which is using all the CPU and that top shows it is all in system mode.

 

Difference between imapd and AService

This problem can occur in the imapd process going back to 5.0 (and probably Netscape Messaging Server 4.x).

It would not occur in the AService process (or not look exactly like above) until Messaging Server 7 update 2, when the MMP was changed to use synchronous processing for some functions, such as DNS lookups and initiating connections to the backend store systems.  That change did not cause the problem.  The same scalability issue has existed in both imapd and AService all along, but the symptoms in the MMP would not look the same as in imapd until after 7u2.

Changes

The problem may appear following an outage.  For example, if all the backend message store systems are restarted at the same time, this would cause all of the clients to try to reconnect.  The load of roughly simultaneous connection attempts creates a lot more load through the MMP, which may cause it to exhibit these symptoms.  The same scenario could happen after a network outage is corrected.  (Also see Doc ID 1277288.1)

Cause

To view full details, sign in with your My Oracle Support account.

Don't have a My Oracle Support account? Click to get started!


In this Document
Symptoms
 For example
 prstat
 prstat -L
 vmstat 5 10
 top on Linux
 strace -c on Linux
 Difference between imapd and AService
Changes
Cause
 pstack
 DTrace - GDispThreadStats.d
 Total sleep and run times for each thread
 Totals of sleep time by call stack per type of thread
 Information specifically about poll() calls
 DTrace - lwpstats.d (new)
Solution
 On Solaris
 On Linux
 Tuning and workaround
 How many processes
 Trade offs
 For the MMP, also see new options
References


My Oracle Support provides customers with access to over a million knowledge articles and a vibrant support community of peers and Oracle experts.