My Oracle Support Banner

Multithreaded Server Freeze In Case Of Blocking tpacall() Under High Load (Doc ID 1332540.1)

Last updated on JUNE 08, 2023

Applies to:

Oracle Tuxedo - Version 8.1 and later
Information in this document applies to any platform.

Symptoms


This is a Tuxedo application with a multi-threaded server advertising several services and getting a huge number of incoming requests.

There are asynchronous requests without any expected replies :

tpacall(Service_Name, buffer, ..., TPNOREPLY)


When the incoming message queue of the server is filled up, the mechanism of using a temporary file on its behalf is used.
If a "handling" thread fails to send a message, several tries are done to  resend the message. Different latency periods are used between the different tries.

Up to 8 times, the thread wanting to send the message sleeps 1 second between each attempt to send a message and then sleeps 4 seconds 2 times between each attempt before managing to send the message.
Such a mechanism of temporization for this "handling" thread is not by  itself a problem.

The problem is : the thread "dedicated" to dequeue the awaiting requests in the incoming message queue remains blocked MORE THAN 16 seconds !

This "dedicated" thread is blocked due to a mutex lock awaiting that the "handling" request releases it. And the consequence is : All of the other threads remain idle instead of handling the awaiting requests.

The following test case underscores the problem :

There is a multi-threaded server advertising 3 services (SVC1, SVC2 and SVC3):

  • SVC1 service : within a loop (item number can be adjusted) SVC2 service is asynchronously called.
  • SVC2 service : just returning the incoming buffer.
  • SVC3 service : invoked by ud32 client utility to asynchronously call SVC1 service, an adjustable number of times.

ud32 utility is used to call SVC3 service and to pass the three following adjustable parameters :
- number of asynchronous calls to SVC1 (by default, 50)
- number of asynchronous calls to SVC2 done within SVC1 service (by default, 20000)
- size of buffer returned by SVC2 service (by default, a 256-byte buffer)

The by default parameters may be changed to more easily exhibit the problem.

All of system call done by the server advertising SVC1, SVC2 and SVC3 services must be traced :
strace -f -tt -T -o strace.txt -p <PID of SRV>
Launch ud32 utility, then wait several minutes and/or check under tmadmin utility the awaiting requests :

> pq
Prog Name      Queue Name # Serve  Wk Queued  # Queued  Ave. Len    Machine
---------      -------------------  ---------  -------- --------    -------
BBL            55430             1         0         0       0.0      SITE1
SRV            SRV333            1       650        13      15.3      SITE1

Then, check if a long delay before acquiring a mutex is reported for "main" PID :

grep "futex resumed> ) = 0 <16" strace.txt

e.g., result is : <pid#2> 11:21:54.131833 <... futex resumed> ) = 0 <16.020905>


Piece of strace.txt file gotten with execution of the test case :

Changes

 

Cause

To view full details, sign in with your My Oracle Support account.

Don't have a My Oracle Support account? Click to get started!


In this Document
Symptoms
Changes
Cause
Solution
References


My Oracle Support provides customers with access to over a million knowledge articles and a vibrant support community of peers and Oracle experts.