Exadata system is very slow - connect errors : ossnet: connection failed to server : a Case Study
(Doc ID 1388821.1)
Last updated on SEPTEMBER 28, 2012
Applies to:Oracle Exadata Hardware - Version 220.127.116.11 to 18.104.22.168.1 [Release 11.2]
Exadata Database Machine V2 - Version All Versions and later
Exadata Database Machine X2-2 Full Rack - Version All Versions and later
Exadata Database Machine X2-2 Half Rack - Version All Versions and later
Exadata Database Machine X2-2 Hardware - Version All Versions and later
Information in this document applies to any platform.
poor performance over network
Note: The Network or Infiniband does not have to be the underlying source of the ossnet: connection failed to server messages:
The errors can be a symptom of a more basic problem that impacts the network.
Perhaps the most important lesson of this note is to confirm that there are several potential sources for the ossnet: connection failed to server
and the solution can range from bug patches to simple tuning of configuration changes
Here are a few associated with the ossnet: connection failed to server message in an Exadata configuration:
- Unpublished bug 9338087
- The fix was included in Exadata patch 22.214.171.124.1.
Symtom: Cellsrv would not accept new connections Cause: cellsrv was unable to keep up with the rate of many connection requests The fix - Improved efficiency in handling incoming connection requests
- Unpublished Bug 9176360 - REMOTESENDPORTS IN THE IMPLICIT FENCING FOR NO DISKMON
- The fix was included in Exadata patch 126.96.36.199
Cellsrv internal resource memory leak when a client trys to get access Cellsrv before a disk monitor instance (DSKM) getting initialized REDISCOVERY INFORMATION: * On the storage cell, CELLSRV will fail with 7445. * Also the alert.log will be full of the following message: "...Information: implicit fencing: AntMaster reid is not presented, diskmon has not yet registered with Cellsrv Information: Cellsrv dropping OpenDisk request for implicit fencing, host nhedwhhpxdb03pd.nhg.local[pid:23780], disk MDATA_CD_1_nhedwhhpxss07pd, reid cid=49fd62900e784f4dbf82d69b0f2247d7,icin=148859737,nmn=3,lnid=148859737, gid=11,gin=1,gmn=1,umemid=1,opid=61,opsn=177,lvl=process..." * Storage cell may run out of memory or swap.
- Unpublished Bug 8867420/ Unpublished Bug 8801965/ Unpublished Bug 8536204 - DISKMON CRASH AND RESTART TO CELL RESULTS IN NODE-LEVEL IMPLICIT FENCE
- Fixed in 11.2
This note describes another source for ossnet error messages and provides an analysis that may assist in researching this type of error message in the future.
To view full details, sign in with your My Oracle Support account.
Don't have a My Oracle Support account? Click to get started!