Enhancement Request: How To Keep Systems At An "identical" Known State (Doc ID 2216775.1)

Last updated on JULY 14, 2017

Applies to:

Oracle Communications Messaging Server - Version 8.0.1 and later
Information in this document applies to any platform.

Goal

We want to keep all systems of a given type/role at an "identical" known state, using Ansible (www.ansible.com).

BACKGROUND

If we are installing a new system of that type, we run an Ansible playbook that does many things:


On backend systems, the app is managed by VCS, so the team who manages VCS does their thing, including starting the app for us.
On frontend systems, the above steps also include deploying SMF config, which starts the app.

If we make a config change for a type of system, we run an Ansible play that runs the appropriate recipe on all of those systems, compiles the config, and performs the appropriate refresh/reload/restart.

Obviously they are not "identical". The differences are managed by Ansible variables substituted into recipes before they are run.

There can be some staggering of these config changes. Obviously they get tested in-house. Then we might deploy a change only on a few systems of that type, monitor to make sure it had the desired effect and no unforeseen consequences, then run it on all of them.

Two problems exist:

  1.  things that get set and then we change our mind -- solution, don't just remove it from the recipe, change the set to an unset
  2.  humans change things -- especially live debugging, but possibly other manual changes while solving a problem


So we want the automation to wipe out any changes that are not included in the recipe.

And, of course, we want minimum down time.

For things where we use replace_* functions in the recipe, we can be sure anything that was set manually will be wiped out.
But that also leaves us owning the responsibility to make sure the default initial config has not changed.
So for things where we decide to use replace_* functions, with each new patch, we need to run an initial configure on a dev system and compare that to the previous one to see if any "defaults" (initial config) things have changed and decide whether to incorporate those into our recipes.
A better way to notice those changes would be great.
Even if "commpkg upgrade" was able to notice and fix things like that, that would be worse.
We don't plan to use commpkg upgrade. We plan to use the Ansible equivalent of pkgrm pkgadd.
But even if we did use commpkg upgrade, if it changed the config on the live systems, those changes would be likely to get clobbered by our next config update.

For things where there is no replace_* function (or we are not using it because we wanted to try to avoid owning the responsibility described above) if someone changes it manually on the system, running the recipe again doesn't change that. The recipe doesn't know what things are set that the recipe doesn't set, so it can't know to unset them.

So I think I would like to have a recipe statement that puts everything back to the initial config. And then I would run the rest of my recipe. Then compile and refresh/reload/restart as needed, with minimal downtime.

If I have to run the initial configure again, that leaves the system (perhaps only briefly) in an invalid state. If anything that doesn't use compiled config were to start between when configure was run and when the recipe was run, it would have invalid config.
So to use configure again, we would have stop the server, run configure, run the recipe, compile, start.
That could be done pretty quickly, but it can't be done on all the systems at the same time.

Solution

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms