Configuring High Availability in WebCenter Interaction (Doc ID 1124199.1)

Last updated on APRIL 29, 2021

Applies to:

Oracle WebCenter Interaction - Version 6.0.0 and later
Information in this document applies to any platform.

Goal

This article describes the process to configure the High Availability feature in WebCenter Interaction. In addition, we will discuss related concepts and concerns surrounding the deployment of the High Availability feature.

What is High Availability?

High Availability is a feature of WCI which allows user session information to be shared across portal web servers in a portal deployment.

When web servers in a portal deployment that is not using High Availability fail, users who are being serviced by each individual web server lose some of their session state. In a typical configuration a load balancer will connect the users to another portal web server, but some portlet-related session state will not be available in the second web server, which leads to inconsistent and unpredictable behavior.

When a web server in a portal deployment that is using High Availability fails and the load balancer connects the user to a second web server, the second web server is able to recreate an identical session without any intervention by the user. Portlet state is preserved; from the perspective of the user, it will be difficult to tell that a failure has occurred. (A perceptive user may be able to detect a pause while the load balancer discovers the failure of the first web server, but the user should not be able to detect any functional differences.)

In the above discussion, the term "portlet state" refers to Portal Session Preferences and Portlet Cookies. It is possible to write a portlet that depends on another state which would not be recreated on the second server (such as the name or IP address of the machine calling the Portlet), but these cases are very unusual.

When Might It Be Useful?

High Availability is useful when users should be insulated from portal web server outages. Three scenarios suggest themselves:

1. The portal deployment is unstable with frequent catastrophic failures of individual web servers.
2. The portal deployment is hosting services where high up-time and consistency are crucial.
3. Administration of the portal deployment involves frequent and unpredictable configuration changes (i.e. it is desirable to be able to pull individual web servers out of rotation on short notice).

When a portal deployment is already stable and reliable, the added complexity and resources associated with High Availability may not be justified.

Also, when a portal deployment is showing instability, but the instability does NOT manifest itself in catastrophic failures of individual portal web servers, then the High Availability feature will be of little benefit. High Availability is triggered when the load balancer detects that web server A is unresponsive, and redirects traffic that would normally go to A to web server B. If web server A continues to be responsive, but displays portal errors to users, or runs slowly, the load balancer will not transfer requests to web server B, and High Availability will not be triggered.

How Does It Work?

To use High Availability, the portal administrator groups portal web servers into clusters. Session information is shared among all web servers in a cluster. A load balancer is configured to detect failure of any portal web server in a cluster and redirect requests destined for that portal web server to another web server in the cluster. (Note: this does NOT remove the need to use "sticky IP" or a similar load balancer features to associate each client with a particular portal web server- using High Availability without using sticky IP will result in very poor performance).

When High Availability is enabled, the portal web servers record session information (session preferences and Portlet cookies) to a shared cache. The shared cache copies the data to all portal web servers in the cluster. Each web server of the portal in a given cluster has a copy of all of the session information for all web servers in the cluster. The shared cache is responsible for keeping these copies in sync in real time. The shared cache is based on the Plumtree Message Bus, and uses UDP multicast network traffic to keep the caches in sync.

It is worth noting that the shared cache data is a copy of the session data. The "standard" session data for each web server is recorded in the normal way and is unaffected by High Availability. When High Availability is enabled, any changes to the session state are copied to the shared cache. For example, even if a cluster contained only one web server, that web server would have two copies of the session data- one "normal" copy, that the web server maintains, and one "shared" copy in the shared cache.

What Information Is Shared?

The High Availability shared cache stores the ephemeral information needed to reconstitute a user session- that is, information that is NOT already stored in the database (data that is stored in the database can, of course, be retrieved from the database whenever needed.) This information includes:

1. Portlet cookies held by the Portal on behalf of the user
2. Session preferences

High Availability does not replicate all of the session data that might be of interest. It does not replicate Activity Spaces and other "large" objects related to portal operations. Thus, state related to portal operations (as opposed to portlet operations) is not replicated. For example, if a user is progressing through a complicated wizard in the Administrative UI, and the portal web server fails during one of the steps, the user will NOT be able to complete the wizard when redirected to a second portal web server- the state of the wizard will have been lost. There are very few places in the portal where end users encounter wizards, so this concern applies almost entirely to portal administrators. This is in line with the main goal of the High Availability feature which is to shield end users from failures.

What Are The Tradeoffs?

There are two downsides to using High Availability.

First, there is some extra administrative overhead involved in setting up High Availability (and planning the clusters, etc.) This is generally minor- High Availability is easy to configure.

Second, High Availability increases the resources used by each portal web server. You should expect each portal web server to use more memory, CPU, and network bandwidth when High Availability is enabled.

The exact effects of High Availability depend on the details of your portal usage. In particular, if your portal uses few portlet cookies and does not use session preferences, you can expect the effect to be very small. If you make heavy use of portlet cookies and session preferences, your resource utilization will, of course, be higher.

Memory and bandwidth requirements will increase with the amount of information stored in the shared cache, which is equivalent to the amount of data stored in portlet cookies and session preferences. Each byte of portlet cookie or session preference data in a cluster must be replicated to each portal web server in the cluster, and stored in each portal web server. A safety factor of 50% can be added in to account for formatting and overhead.

For instance, suppose you have a cluster of two portal web servers, each of which supports 100 concurrent sessions. Each user views 10 portlets, and each portlet uses 1000 bytes of cookie information. User actions cause one of the portlet cookies to be modified every 10 minutes. There are no session preferences.

Under these assumptions, the additional memory used by each web server of the portal deployment would be on the order of:
2 web servers * 100 sessions each * 10 Portlets * 1000 bytes * 1.5 safety factor ~= 3MB of memory.

Likewise, the additional network bandwidth used would be on the order of:
2 web servers * 100 sessions each * 1000 bytes * (1 change / 10 minutes) * 1.5 safety factor ~= 0.5 MB/sec

The size of the cluster has a direct effect on the amount of memory used by High Availability. Larger clusters will use more memory (that is, clusters that handle a higher fraction of the total portal traffic.) On the other hand, smaller clusters offer less redundancy.

For example, suppose you are deploying 6 portal web serveres. You could arrange them into 3 clusters of 2 web servers, 2 clusters of 3 web servers, or even 1 cluster of 6 web servers.

If you choose 3 clusters of 2 web servers, each cluster offers one fallback web server- if one of the web servers fails, the other web server can take over. But if both web servers fail, the cluster fails, and users will experience errors. The effect of the failure of one web server can be insidious- if a cluster contains 2 web servers that are both close to being overloaded, and one of the web servers fails, the other web servers will experience a doubled load, and is likely to fail soon thereafter.

If you choose 2 clusters of 3 web servers, each web server will have to absorb the memory hit of holding session data for all three web servers. However, the cluster can suffer the failure of 2 web servers without noticeably affecting the user experience. Further, if only one of the web servers fails, the load that normally would have gone to that web server will be split between the two remaining web servers, so the chance that you will overwhelm the remaining web servers in the cluster is reduced.

The choice is up to you- larger clusters consume more resources, but offer more fault tolerance whereas smaller clusters consume less resources but offer less fault tolerance.