OAM Stops Responding To Requests After Some Time

(Doc ID 2349605.1)

Last updated on FEBRUARY 05, 2018

Applies to:

Oracle Access Manager - Version 11.1.2.0.0 to 11.1.2.3.171017 [Release 11g]
Information in this document applies to any platform.

Symptoms

Oracle Access Manager and an Oracle OHS Server in the same server, OHS Webgate is installed on the OHS Server protecting some resources.

In front of this a proxy server. After some time Oracle Access Manager seems to stop processing requests

Querying server health (i.e. through wlst) shows the server as down or unreachable

When the problem occurs then the OAM managed servers has to be restarted in order to restore the service.

If both nodes have the problem at the same time then we face an outage.

The frequence of occurence is quite different sometimes several times a day.

 

In order troubleshoot the problem made a setup that use a single node and issue reproduce in single node also

telnet <oamserverhost> <port>

when issue happen sometime did not connect anymore, sometime did connect, but no requests were processing.

 

In weblogic servers thread dump we find many threads like

"DynamicListenThread[Default]" daemon prio=3 tid=0x000000010866a800 nid=0xfd runnable [0xfffffffcf5cff000]

java.lang.Thread.State: RUNNABLE

at weblogic.socket.DevPollSocketMuxer.editPollSet(Native Method)

at weblogic.socket.DevPollSocketMuxer.read(DevPollSocketMuxer.java:79)

- locked

<0xfffffffd2715fbf8>

(a java.lang.String)

- locked

<0xfffffffd4e83caa0>

(a weblogic.socket.DevPollSocketInfo)

at weblogic.server.channels.DynamicListenThread.registerSocket(DynamicListenThread.java:393)

at weblogic.server.channels.DynamicListenThread.run(DynamicListenThread.java:188)

at java.lang.Thread.run(Thread.java:745)

Locked ownable synchronizers:

- None

In logs (below are some excerpt from errors find, those are many more)  we see lines like

[2017-09-04T09:12:30.1969+02:00] [OHS] [ERROR:32] [OHS-9999] [mod_weblogic.c] [client_id: 10.10.10.3] [host_id: servername] [host_addr: 10.10.10.193] [tid: 25] [user: sso] [ecid: 0005QqyLAKF6QPiLH6w0yf00051s0001Rf] [rid: 0] [VirtualHost: ohsHost.domain.com:8081] <0005QqyLAKF6QPiLH6w0yf00051s0001Rf> weblogic: *******Exception type [NO_RESOURCES] (apr_socket_connect call failed with error=70007, host=10.10.10.193, port=14000 ) raised at line 1682 of URL.cpp

[2017-09-04T09:12:30.1970+02:00] [OHS] [ERROR:32] [OHS-9999] [mod_weblogic.c] [client_id: 10.10.10.3] [host_id: servername] [host_addr: 10.10.10.193] [tid: 25] [user: sso] [ecid: 0005QqyLAKF6QPiLH6w0yf00051s0001Rf] [rid: 0] [VirtualHost: servername.domain.com:8081] weblogic: Trying HEAD /shoplogin/server.jsp at backend host '10.10.10.193/14000; got exception 'NO_RESOURCES: [os error=150, line 1682 of URL.cpp]: apr_socket_connect call failed with error=70007, host=10.10.10.193, port=14000 '

oam_server1.out

<Sep 2, 2017 10:33:15 PM MEST> <Error> <WebLogicServer> <BEA-000337> <[STUCK] ExecuteThread: '88' for queue: 'weblogic.kernel.Default (self-tuning)' has been busy for "704" seconds working on the request "Workmanager: weblogic.kernel.Default, Version: 0, Scheduled=false, Started=true, Started time: 704306 ms

", which is more than the configured time (StuckThreadMaxTime) of "600" seconds. Stack trace:

weblogic.socket.PosixSocketMuxer.wakeupPoll(Native Method)

weblogic.socket.PosixSocketMuxer.read(PosixSocketMuxer.java:82)

weblogic.rjvm.t3.ConnectionFactoryT3.createConnection(ConnectionFactoryT3.java:36)

weblogic.rjvm.ConnectionManager.createConnection(ConnectionManager.java:1778)

weblogic.rjvm.ConnectionManager.findOrCreateConnection(ConnectionManager.java:1410)

weblogic.rjvm.ConnectionManager.bootstrap(ConnectionManager.java:443)

weblogic.rjvm.ConnectionManager.bootstrap(ConnectionManager.java:321)

weblogic.rjvm.RJVMManager.findOrCreateRemoteInternal(RJVMManager.java:262)

weblogic.rjvm.RJVMManager.findOrCreate(RJVMManager.java:199)

weblogic.rjvm.RJVMFinder.findOrCreateRemoteServer(RJVMFinder.java:238)

weblogic.rjvm.RJVMFinder.findOrCreateInternal(RJVMFinder.java:200)

weblogic.rjvm.RJVMFinder.findOrCreate(RJVMFinder.java:170)

weblogic.rjvm.ServerURL.findOrCreateRJVM(ServerURL.java:165)

weblogic.jndi.WLInitialContextFactoryDelegate.getInitialReference(WLInitialContextFactoryDelegate.java:401)

weblogic.jndi.Environment.getInitialReference(Environment.java:245)

weblogic.server.channels.RemoteChannelServiceImpl.registerInternal(RemoteChannelServiceImpl.java:174)

weblogic.server.channels.RemoteChannelServiceImpl.access$300(RemoteChannelServiceImpl.java:49)

weblogic.server.channels.RemoteChannelServiceImpl$TimerListenerImpl.timerExpired(RemoteChannelServiceImpl.java:110)

weblogic.timers.internal.TimerImpl.run(TimerImpl.java:284)

weblogic.work.SelfTuningWorkManagerImpl$WorkAdapterImpl.run(SelfTuningWorkManagerImpl.java:550)

weblogic.work.ExecuteThread.execute(ExecuteThread.java:263)

weblogic.work.ExecuteThread.run(ExecuteThread.java:221)

AdminServer logs

####<Oct 24, 2017 8:55:27 AM MEST> <Info> <Health> <servername.domain.com> <AdminServer> <weblogic.GCMonitor> <> <> <3ad55e03d08a566c:-27747c2d:15f4848a898:-8000-0000000000000055> <1508828127288> <BEA-310002> <69% of the total memory in the server is free>

####<Oct 24, 2017 9:22:06 AM MEST> <Info> <JMX> <servername.domain.com> <AdminServer> <[STANDBY] ExecuteThread: '2' for queue: 'weblogic.kernel.Default (self-tuning)'> <> <> <3ad55e03d08a566c:-27747c2d:15f4848a898:-8000-0000000000010f5f> <1508829726363> <BEA-149507> <JMX Connectivity has been discontinued with the managed server oam_server1.>

####<Oct 24, 2017 9:22:06 AM MEST> <Info> <Server> <servername.domain.com> <AdminServer> <[STANDBY] ExecuteThread: '3' for queue: 'weblogic.kernel.Default (self-tuning)'> <> <> <3ad55e03d08a566c:-27747c2d:15f4848a898:-8000-0000000000010f60> <1508829726362> <BEA-002634> <The server "oam_server1" disconnected from this server.>

####<Oct 24, 2017 9:30:58 AM MEST> <Info> <Server> <servername.domain.com> <AdminServer> <[ACTIVE] ExecuteThread: '5' for queue: 'weblogic.kernel.Default (self-tuning)'> <weblogic> <> <3ad55e03d08a566c:-27747c2d:15f4848a898:-8000-0000000000011115> <1508830258659> <BEA-002638> <Force shutdown of oam_server1 was issued remotely from 10.10.10.193>

####<Oct 24, 2017 9:33:27 AM MEST> <Info> <Health> <servername.domain.com> <AdminServer> <weblogic.GCMonitor> <> <> <3ad55e03d08a566c:-27747c2d:15f4848a898:-8000-0000000000000055> <1508830407466> <BEA-310002> <57% of the total memory in the server is free>

####<Oct 24, 2017 9:34:52 AM MEST> <Info> <Server> <servername.domain.com> <AdminServer> <[ACTIVE] ExecuteThread: '5' for queue: 'weblogic.kernel.Default (self-tuning)'> <> <> <3ad55e03d08a566c:-27747c2d:15f4848a898:-8000-00000000000111f1> <1508830492726> <BEA-002635> <The server "oam_server1" connected to this server.>

####<Oct 24, 2017 9:34:52 AM MEST> <Info> <JMX> <servername.domain.com> <AdminServer> <[STANDBY] ExecuteThread: '3' for queue: 'weblogic.kernel.Default (self-tuning)'> <> <> <3ad55e03d08a566c:-27747c2d:15f4848a898:-8000-00000000000111f2> <1508830492881> <BEA-149506> <Established JMX Connectivity with oam_server1 at the JMX Service URL of service:jmx:t3://10.10.10.193:14000/jndi/weblogic.management.mbeanservers.runtime.>

####<Oct 24, 2017 10:08:27 AM MEST> <Info> <Health> <servername.domain.com> <AdminServer> <weblogic.GCMonitor> <> <> <3ad55e03d08a566c:-27747c2d:15f4848a898:-8000-0000000000000055> <1508832507623> <BEA-310002> <78% of the total memory in the server is free>

[2017-10-24T09:18:03.483+02:00] [AdminServer] [WARNING] [] [Coherence] [tid: Logger@9253293 3.7.1.15] [userId: <anonymous>] [ecid: 0000Lx8IOR16QPiLH6w0yf1PvQG200000A,1:20686] 2017-10-24 09:18:03.482/83263.653 Oracle Coherence GE 3.7.1.15 <Warning> (thread=PacketPublisher, member=2): Experienced a 5901 ms communication delay (probable remote GC) with Member(Id=3, Timestamp=2017-10-23 10:17:56.082, Address=10.10.10.10:9099, MachineId=53041, Location=site:,machine:servername,process:13854, Role=WeblogicServer); 45 packets rescheduled, PauseRate=0.0, Threshold=1878

[2017-10-24T09:30:58.812+02:00] [AdminServer] [NOTIFICATION] [OAMSSA-04050] [oracle.oam.foundation.access] [tid: DistributedCache:EventDispatcher] [userId: <anonymous>] [ecid: 0000Lx8ILoL6QPiLH6w0yf1PvQG2000004,1:20539] [APP: oam_admin#11.1.2.0.0] Cache member left: Member(Id=3, Timestamp=2017-10-24 09:30:58.811, Address=10.10.10.10:9099, MachineId=53041, Location=site:,machine:servername,process:13854, Role=WeblogicServer), service: PartitionedCache{Name=DistributedCache, State=(SERVICE_STARTED), LocalStorage=disabled}.

[2017-10-24T09:30:58.814+02:00] [AdminServer] [NOTIFICATION] [OAMSSA-04050] [oracle.oam.foundation.access] [tid: DistributedCache:DistributionCache:EventDispatcher] [userId: <anonymous>] [ecid: 0000Lx8ILoL6QPiLH6w0yf1PvQG2000004,1:20537] [APP: oam_admin#11.1.2.0.0] Cache member left: Member(Id=3, Timestamp=2017-10-24 09:30:58.811, Address=10.10.10.193:9099, MachineId=53041, Location=site:,machine:servername,process:13854, Role=WeblogicServer), service: PartitionedCache{Name=DistributionCache, State=(SERVICE_STARTED), LocalStorage=enabled, PartitionCount=257, BackupCount=1, AssignedPartitions=86, BackupPartitions=86}.

[2017-10-24T09:30:58.831+02:00] [AdminServer] [NOTIFICATION] [OAMSSA-04050] [oracle.oam.foundation.access] [tid: DistributedCache:DistributionCache:EventDispatcher] [userId: <anonymous>] [ecid: 0000Lx8IOR16QPiLH6w0yf1PvQG200000A,1:20541] Cache member left: Member(Id=3, Timestamp=2017-10-24 09:30:58.829, Address=10.10.10.193:9099, MachineId=53041, Location=site:,machine:servername,process:13854, Role=WeblogicServer), service: PartitionedCache{Name=DistributionCache, State=(SERVICE_STARTED), LocalStorage=enabled, PartitionCount=257, BackupCount=1, AssignedPartitions=86, BackupPartitions=86}.

 

Changes

 

- No Pre-authentication Rules used

- using custom login pages, login page is not deployed on the admin server.

It is deployed on the managed server (and declared as excludedpage)

The issue also occurs with only the oracle default login page.

 

Customer use custom J2EE applications that is deployed on OAM

managed servers together with login page. Custom J2EE applications have on the other features

one that get user attributes from ldap via JSP pages using java JNDI, this involve a little bit processing.

Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms