RP/MSQ 4.0A(VMS) -RP35 - SBS server accvios in AVLtree_find (Doc ID 776241.1)

Last updated on NOVEMBER 03, 2016

Applies to:

Oracle MessageQ / MessageQ / 4.0,5.0
Information in this document applies to any platform

Goal

Product:  MessageQ, V4.0A-RP35
Component:  SBS server
Operating system:  OpenVMS Alpha V7.3-1

PROBLEM DESCRIPTION
SBS server is accvio'ing in customer's production system.  To date, three crashes 
have been reported on different nodes.

SBS Server Access Violation:

$ SET NOVERIFY
%DMQ-S-SETLNM, Set to MessageQ LNM table DMQ$LNM_5001_14011
(LNM$PROCESS_TABLE)
  "DMQ$ENTRYRTL" = "DMQ$EXE:DMQ$ENTRYRTLV40.EXE"
  "DMQ$EXECRTL" = "DMQ$EXE:DMQ$EXECRTLV40.EXE"
  "DMQ$GROUP_OUTPUT" = "EVL_LOG"
                    = "CONSOLE"
  "DMQ$PROCESS_OUTPUT" = "SYSOUT"
  "DMQ$PSSRTL" = "DMQ$EXE:DMQ$ENTRYRTLV40.EXE"
  "DMQ$TRACE_OUTPUT" = "SYSOUT"

(DMQ$LNM_5001_14011)
  "DMQ$ACCESS" = "$1$DGA0:[DMQ$V40.USER.5001_14011]"
  "DMQ$BUS_GROUP" = "5001_14011"
  "DMQ$CHKPT_FILE" = "DMQ$USER:DMQ$CHKPT_5001_14011.DAT"
  "DMQ$COM_SERVER_UP" = "YES"
  "DMQ$DISK" = "DISK$AXPVMS73:"
  "DMQ$DOC" = "DMQ$DISK:[DMQ$V40.DOC]"
  "DMQ$ENTRYRTL" = "DMQ$EXE:DMQ$ENTRYRTLV40.EXE"
  "DMQ$EVENT_LOGGER_MBX" = "MBA91:"
  "DMQ$EXAMPLES" = "DMQ$DISK:[DMQ$V40.EXAMPLES]"
  "DMQ$EXE" = "DMQ$DISK:[DMQ$V40.EXE]"
  "DMQ$EXECRTL" = "DMQ$EXE:DMQ$EXECRTLV40.EXE"
  "DMQ$INIT_FILE" = "DMQ$USER:DMQ$INIT.TXT"
  "DMQ$LIB" = "DMQ$DISK:[DMQ$V40.LIB]"
  "DMQ$LOG" = "DMQ$DISK:[DMQ$V40.LOG.5001_14011]"
  "DMQ$MRS" = "DMQ$MRS_DISK:[DMQ$V40.MRS.5001_14011]"
  "DMQ$MSGSHR" = "DMQ$EXE:DMQ$MSGSHRV40.EXE"
  "DMQ$PSSRTL" = "DMQ$EXE:DMQ$ENTRYRTLV40.EXE"
  "DMQ$ROOT" = "DMQ$V40"
  "DMQ$SET_LNM" = "DISK$AXPVMS73:[DMQ$V40.EXE]DMQ$SET_LNM_TABLEV40.EXE"
  "DMQ$TCPIP_LD" = "DEC"
  "DMQ$TERMINATION_MBX" = "MBA92:"
  "DMQ$USER" = "DMQ$DISK:[DMQ$V40.USER.5001_14011]"
  "DMQ$VERSION" = "V4.0A-111(RP35)"
 
(LNM$JOB_81AF9F00)
(LNM$GROUP_000001)
(LNM$SYSTEM_TABLE)
  "DMQ$DISK" = "SYS$SYSDEVICE"
  "DMQ$EXE" = "DMQ$DISK:[DMQ$V40.EXE]"
  "DMQ$MRS_DISK" = "DISK11"
 
(LNM$SYSCLUSTER_TABLE
(DECW$LOGICAL_NAMES)

15-NOV-2004 18:47:09.64   User: SYSTEM           Process ID:   20800448
                          Node: CDTNH2           Process name: 
"DMQ_S_500114011"

Terminal:           

User Identifier:    [SYSTEM]
Base priority:      9
Default file spec:  SYS$SYSROOT:[SYSMGR]
Number of Kthreads: 1
Process Quotas:

 Account name: SYSTEM  
 CPU limit:                      Infinite  Direct I/O limit:       100
 Buffered I/O byte count quota:    698464  Buffered I/O limit:     100
 Timer queue entry quota:             499  Open file quota:        497
 Paging file quota:                394688  Subprocess quota:        10
 Default page fault cluster:           64  AST quota:              499
 Enqueue quota:                       375  Shared file limit:        0
 Max detached processes:                0  Max active jobs:          0
 
Accounting information:

 Buffered I/O count:        57  Peak working set size:       2512
 Direct I/O count:          19  Peak virtual size:         185984
 Page faults:              259  Mounted volumes:                0
 Images activated:           3
 Elapsed CPU time:          0 00:00:00.07
 Connect time:              0 00:00:00.13
 
Authorized privileges:

 CMKRNL       EXQUOTA      NETMBX       OPER         SYSGBL       SYSLCK
 SYSNAM       SYSPRV       TMPMBX       WORLD

Process privileges:

 CMKRNL               may change mode to kernel
 EXQUOTA              may exceed disk quota
 NETMBX               may create network device
 OPER                 may perform operator functions
 SYSGBL               may create system wide global sections
 SYSLCK               may lock system wide resources
 SYSNAM               may insert in system logical name table
 SYSPRV               may access objects via system protection
 TMPMBX               may create temporary mailbox\
 WORLD                may affect other processes in the world
 

Process rights:

 SYSTEM                            resource
 BATCH                             
 NET$MANAGE                        
 

System rights:

 SYS$NODE_CDTNH2                   

Auto-unshelve: on
Image Dump: off

Soft CPU Affinity: off
Parse Style: Traditional
Case Lookup: Blind
Home RAD: 0
Scheduling class name: none
Process Dynamic Memory Area  

  Current Size (KB)               128.00   Current Size (Pagelets)       256
  Free Space (KB)                 113.15   Space in Use (KB)           14.84
  Largest Var Block (KB)          112.57   Smallest Var Block (By)     16.00
  Number of Free Blocks                9   Free Blocks LEQU 64 bytes       5
 
There is 1 process in this job: 

  DMQ_S_500114011 (*) 

                    ------------------------------------------------
                    ~~~~~~~~~~~~ System Description ~~~~~~~~~~~~~~~~
                    ------------------------------------------------
                             System name: CDTNH2
                            Harware type: AlphaServer 4100 5/533 4MB
                           Software type: OpenVMS Alpha V7.3-1
                         Physical memory: 6291456 pagelets (3072Mb)
                     CPUs (total/active): 4/4
                                 Cluster: Yes, 2 nodes
                       Global pages free: 4623520
                    Global sections free: 1827
                           Pagefile free: 249984
                        Global page file: 300000
                         Bug check fatal: FALSE
                      Virtual page count: 2147483647
                    ------------------------------------------------

$                  fcmd := $DMQ$EXE:DMQ$SBS_SERVER.EXE 
$                  fcmd 
Copyright ) BEA Systems, Inc. 1998. All rights reserved. 

 DMQ Server starting...
%SYSTEM-F-ACCVIO, access violation, reason mask=00, virtual 
address=0000000000000000, PC=000000000023020C, PS=0000001B
%TRACE-F-TRACEBACK, symbolic stack dump follows
  image    module    routine             line      rel PC           abs PC    
  

 DMQ$SBS_SERVER  SBS_CPI_SUBS  sbs_cpi_avail_compare
                                        28880 000000000000926C 
000000000023020C
 DMQ$SBS_SERVER  AVL_TREE  AVLtree_find

                                         8823 00000000000009B0 
000000000026E5C0
 DMQ$SBS_SERVER  SBS_CPI_AVAIL  sbs_cpi_add_avail_list

                                        25910 000000000000034C 
000000000024C42C
 DMQ$SBS_SERVER  SBS_CPI_MSGS  handle_msg_event_reg

                                        34077 0000000000015850 
0000000000246300
 DMQ$SBS_SERVER  SBS_CPI_MSGS  sbs_cpi_handle_sbs_msg

                                        27986 0000000000000B7C 
000000000023162C
 DMQ$SBS_SERVER  DMQ$SBS_SERVER  main   26282 00000000000008FC 
00000000002208FC
 DMQ$SBS_SERVER  DMQ$SBS_SERVER  __main

                                            0 000000000000006C 
000000000022006C

                                            0 FFFFFFFF8028B63C 
FFFFFFFF8028B63C

DmQ I 19:57.7 Time Stamp - 18-JAN-2005 07:19:57.72
DmQ I 19:57.7 MSGPURGED, The MessageQ exit handler has purged 2 incoming 
messages 

  SYSTEM       job terminated at 18-JAN-2005 07:19:57.76

  Accounting information:
  Buffered I/O count:                114      Peak working set size:       
8064
  Direct I/O count:                   54      Peak virtual size:         
542512
  Page faults:                       790      Mounted volumes:                
0
 
---------------

Other stack traces (all around the AVLtree_find) area are:

 DMQ Server starting...
%SYSTEM-F-ACCVIO, access violation, reason mask=00, virtual 
address=000000005336C601, PC=000000000023020C, PS=0000001B
%TRACE-F-TRACEBACK, symbolic stack dump follows
  image    module    routine             line      rel PC           abs PC    
  
 DMQ$SBS_SERVER  SBS_CPI_SUBS  sbs_cpi_avail_compare
                                        28880 000000000000926C 
000000000023020C
 DMQ$SBS_SERVER  AVL_TREE  AVLtree_find
                                         8823 00000000000009B0 
000000000026E5C0
 DMQ$SBS_SERVER  SBS_CPI_AVAIL  sbs_cpi_add_avail_list
                                        25910 000000000000034C 
000000000024C42C
 DMQ$SBS_SERVER  SBS_CPI_MSGS  handle_msg_avail_reg
                                        33403 0000000000013A84 
0000000000244534
 DMQ$SBS_SERVER  SBS_CPI_MSGS  sbs_cpi_handle_sbs_msg
                                        27853 00000000000006FC 
00000000002311AC
 DMQ$SBS_SERVER  DMQ$SBS_SERVER  main   26282 00000000000008FC 
00000000002208FC
 DMQ$SBS_SERVER  DMQ$SBS_SERVER  __main
                                            0 000000000000006C 
000000000022006C
                                            0 FFFFFFFF8028B63C 
FFFFFFFF8028B63C
DmQ I 03:47.9 Time Stamp - 18-JAN-2005 08:03:47.94
DmQ I 03:47.9 MSGPURGED, The MessageQ exit handler has purged 3 incoming 
messages 
  SYSTEM       job terminated at 18-JAN-2005 08:03:47.98

  Accounting information:
  Buffered I/O count:                114      Peak working set size:       
6608
  Direct I/O count:                   54      Peak virtual size:         
917184
  Page faults:                      1000      Mounted volumes:                
0

-----------------------

Copyright ) BEA Systems, Inc. 1998. All rights reserved.

 DMQ Server starting...
%SYSTEM-F-ACCVIO, access violation, reason mask=00, virtual 
address=0000000000000000, PC=000000000023013C, PS=0000001B 
%TRACE-F-TRACEBACK, symbolic stack dump follows
  image    module    routine             line      rel PC           abs PC    
  
 DMQ$SBS_SERVER  SBS_CPI_SUBS  sbs_cpi_compare_by_process
                                        28789 000000000000919C 
000000000023013C  DMQ$SBS_SERVER  AVL_TREE  AVLtree_find
                                         8823 00000000000009B0 
000000000026E5C0  DMQ$SBS_SERVER  SBS_CPI_SUBS  sbs_cpi_fnd_process_entry
                                        26084 000000000000094C 
00000000002278EC  DMQ$SBS_SERVER  SBS_PPI_SUBS  sbs_ppi_msg_gen
                                        25789 00000000000003F8 
00000000002591E8  DMQ$SBS_SERVER  SBS_CPI_MSGS  msg_gen_go
                                        32348 000000000000F14C 
000000000023FBFC  DMQ$SBS_SERVER  SBS_CPI_MSGS  sbs_cpi_handle_mot_msg
                                        31732 000000000000D7E0 
000000000023E290
 DMQ$SBS_SERVER  DMQ$SBS_SERVER  main   26330 0000000000000B2C 
0000000000220B2C
 DMQ$SBS_SERVER  DMQ$SBS_SERVER  __main
                                            0 000000000000006C 
000000000022006C
                                            0 FFFFFFFF8028B63C 
FFFFFFFF8028B63C DmQ I 22:59.0 Time Stamp - 19-JAN-2005 07:22:59.07 DmQ I 
22:59.0 MSGPURGED, The MessageQ exit handler has purged 1 incoming message  
  JUMUNOZ      job terminated at 19-JAN-2005 07:22:59.12

  Accounting information:
  Buffered I/O count:                125      Peak working set size:      
34816
  Direct I/O count:                  201      Peak virtual size:         
542384
  Page faults:                     23449      Mounted volumes:                
0
  Charged CPU time:        0 00:00:08.84      Elapsed time:       0 
22:40:55.01


----------------------
 
Notes:
------

1) Latest kit is RP68; customer is made aware of that
2) There is an SBS accvio fixed in RP49, but is not related to this
3) I have asked the customer what has changed in the system/environment 
leading to these crashes, but they have not been able to provide any info as 
of yet
4) Customer indicates that "Tracing was turned on for the SBS server on the 
SPRNH2 node only (the application runs only on SPRNH2).  The application was 
then failed over from the CDTNH12 cluster to the SPRNH12 cluster.  We will 
run the application on SPRNH2 until the SBS server accvio's again".
5) I have spoken to Lisa, who has "hardened" code around the AVLtree_find 
area so even if the tree becomes corrupted, the SBS server doesn't accvio.  
6) Lisa is also considering putting in some debug statements to see give more 
information when the SBS fails.

Solution

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms