My Oracle Support Banner

Exadata system is very slow - connect errors : ossnet: connection failed to server : a Case Study (Doc ID 1388821.1)

Last updated on SEPTEMBER 28, 2012

Applies to:

Oracle Exadata Hardware - Version 11.2.0.1 to 11.2.1.2.1 [Release 11.2]
Exadata Database Machine V2 - Version All Versions and later
Exadata Database Machine X2-2 Full Rack - Version All Versions and later
Exadata Database Machine X2-2 Half Rack - Version All Versions and later
Exadata Database Machine X2-2 Hardware - Version All Versions and later
Information in this document applies to any platform.
Exadata
ossnet
poor performance over network


Purpose

Note: The Network or Infiniband does not have to be the underlying source of the  ossnet: connection failed to server messages:
The errors can be a symptom of a more basic problem that impacts the network.
Perhaps the most important lesson of this note is to confirm that there are several potential sources for the ossnet: connection failed to server
and the solution can range from bug patches to simple tuning of configuration changes

Here are a few associated with the ossnet: connection failed to server message in an Exadata configuration:

  1. Unpublished bug 9338087
    - The fix was included in Exadata patch 11.2.1.3.1.
    Symtom: Cellsrv would not accept new connections
    Cause: cellsrv was unable to keep up with the rate of many connection requests The fix - Improved efficiency in handling incoming connection requests 
  2. Unpublished Bug 9176360 - REMOTESENDPORTS IN THE IMPLICIT FENCING FOR NO DISKMON
    - The fix was included in Exadata patch 11.2.1.3

    Cellsrv internal resource memory leak when a client trys to get access Cellsrv before a disk monitor instance (DSKM) getting initialized
    
    REDISCOVERY INFORMATION:
    
    * On the storage cell, CELLSRV will fail with 7445.
    * Also the alert.log will be full of the following message:
    
    "...Information: implicit fencing: AntMaster reid is not presented, diskmon has not yet registered with Cellsrv
    Information: Cellsrv dropping OpenDisk request for implicit fencing, 
    host nhedwhhpxdb03pd.nhg.local[pid:23780], disk MDATA_CD_1_nhedwhhpxss07pd, 
    reid cid=49fd62900e784f4dbf82d69b0f2247d7,icin=148859737,nmn=3,lnid=148859737,
    gid=11,gin=1,gmn=1,umemid=1,opid=61,opsn=177,lvl=process..."
    
    * Storage cell may run out of memory or swap.
    
  3. Unpublished Bug 8867420/ Unpublished Bug 8801965/ Unpublished Bug 8536204 - DISKMON CRASH AND RESTART TO CELL RESULTS IN NODE-LEVEL IMPLICIT FENCE
    - Fixed in 11.2


    This note describes another source for ossnet error messages and provides an analysis that may assist in researching this type of error message in the future.

After ruling out other potential sources of this error and a review of the OSWatcher logs this problem pointed to IO saturation as a possible underlying source of the ossnet messages

Troubleshooting Steps

To view full details, sign in with your My Oracle Support account.

Don't have a My Oracle Support account? Click to get started!


My Oracle Support provides customers with access to over a million knowledge articles and a vibrant support community of peers and Oracle experts.