Sun Storage Availability Suite (AS): How to Suspend/Restart Data Replication Between Two Oracle Solaris Clusters (Doc ID 1670774.1)

Last updated on APRIL 10, 2017

Applies to:

Solaris Cluster - Version 3.2 12/06 to 3.3 [Release 3.2 to 3.3]
Sun Storage Availability Suite - Version 4.0 and later
Information in this document applies to any platform.

Goal

How to Suspend/Restart Data Replication of one or more Remote Mirror Network Data Replication Volume Sets between Two Oracle Solaris Clusters

Availability Suite (AS) Introduction

Sun Storage Availability Suite (AS) incorporates two distinct modules :

  StoreEdge Network Data Replication (SNDR) also known as Remote Mirror (RM)  or Remote Data Copy (RDC)    

  Instant Image (II) also known as Point-in-Time Copy (PiTC)   

    StoreEdge Network Data Replication (SNDR)  is used to replicate disks, disk slices and logical volumes across a network to a remote server. 

    Instant Image (II) is used to take local shadow copies (instant images/snapshots) of disks, disk slices and logical volumes.

    A logical volume can be under the control of either Solaris Volume Manager (SVM) or Symantec's Veritas Volume Manager.

Outline of this StorEdge Network Data Replication (SNDR) Example: Suspend/Restart data replication of one or more volume sets

A systems administrator can use one or more variants of the AS command "sndradm -l" to suspend SNDR data replication at any time. However planning, consideration and scheduling may be required if the administrator wishes to ensure that the data copy on the secondary cluster remote site is a 100% exact and complete copy of the primary cluster volumes. 


            Taking an active mounted filesystem that resides on Cluster Boston

             Master Volume MV   Mount Point: /boston/MASTER_VOLUME      SVM Volume: /dev/md/proddg/dsk/d1

        which is currently being replicated to Cluster Newton using host based data replication (SNDR)

             Remote Mirror RM    Mount Point: /newton/REMOTE_MIRROR      SVM Volume: /dev/md/proddgr/dsk/d1

 

The goal in this example is to set the state of the volume set proddg/rdsk/d1 to "logging" at a time when both sites are 100% in sync. i.e both copies of the data are exactly the same.

When replication is suspended, that is the copying of data blocks between the primary and secondary server, each active server will monitor any data block changes at each site when in logging mode.  If data is modified in a volume set/pair at either site the volume bitmaps are updated.

When switching from the "logging" state the system compares the bitmaps at each site to determine which data blocks need to be copied dependant on the direction in which the administrator wants changed data blocks to go. These changed data blocks are copied as part of a data block copy sychronization and then finally the volume set will switch back to the state of "replicating".

This document will discuss how and when best to suspend data replication manually. The document will show how to determine the current replication status of the volume set within the cluster and then how a systems adminstrator might manually suspend the replication.

Below is an example of the command "sndradm -P" shows two volume sets.  One volume set is already in the "logging" state and the other is in the "replicating" state.



You can see that the volume "devgrp/rdsk/d1" is in logging mode; No data is being copied across the network. Replication is suspended. For the volume "proddg/rdsk/d1" data blocks are being replicated and replication traffic should be traveling across the network.

 

Typically an administrator would want to put a replication volume set into the "logging" state in order to mount and use the remote volume set or as part of a process to use/process/archive or copy the data.  Normally manual suspension of replication should be done when the administrator confirms that the data at the remote site is complete and up to date. Usually such an operation would be scheduled after any databases or applications are suspended/shut down at the primary site/server or at a quiet time where little or no write I/O is being generated by the database or applications.


If data replication is being carried out in "synchronous" mode then it is likely the remote data copy is 100% complete at any given time.  This is because in "sync" mode data is not commited as being written (to the application) until the copy operation is acknowledged from the remote server and all volume bitmaps have been updated. If there are pending I/O write operations in progress these should only take a few seconds to complete. The above is only true when the replication state is "replicating".


If data replication is being carried out in "asychronous" then it is very likely the remote data is not 100% complete at any given time. In order to improve application performance "async" mode uses memory/cache buffers and optionally a special disk storage to buffer data locally.  This could mean the remote cluster/server/site is not synchronized with the primary by KB's, MB's or GB's. Depending on the network performance and many other factors it could take the primary system seconds, minutes, hours or days to synchronize the two sites up and for the data to be complete at the remote server/site.

 

It is therefore important for an adminstrator to determine whether it is a requirement to stop the replication at a point where the remote data is complete, intact and therefore can be put to good use. Whilst an administrator can manually put a volume set into the "logging" state, often a volume set will be put into this mode automatically as a result of unscheduled problems including; a network outage, a server failure or reboot, a cluster node failover, stopping/suspension of the replication software.  When this happens it is not always possible to determine with 100% certainty that the remote data is completely synchronized with the primary data copy down to the last data block without analysis. Analysis requires understanding the configuration by checking the modes of data replication used and the percentage statistics from command "dsstat" showing the percentage of data different between the two sites. Confirmation of the integrity of the data on the remote server is also best confirmed by testing. However if the AS integration is well designed/implemented and also implements Point in time copies the data can be protected.

 

Description of the underlying cluster configuration in this example

   Built upon Solaris 10, Oracle Solaris Cluster 3.3 with latest patches

   Two Oracle Solaris Clusters each comprised of two nodes.

     Cluster Boston      First Node phys-boston-1   Second Node phys-boston-2  
     Cluster Newton     First Node phys-newton-1  Second Node phys-newton-2  

   Replicating from Cluster Boston to Cluster Newton via a private network using Availability Suite Software Version 4.0 with latest patches

            Cluster Boston Logical Host for replication:  lh-boston-sndr-pri

            Cluster Newton Logical Host for replication: lh-newton-sndr-sec

        Solaris Volume Manager (SVM) volumes and filesystems built upon Shared Storage.

        The Master Volume (MV) and its associated filesystem on the primary cluster is configured as a failover cluster resource group.

        The secondary cluster volume remote mirror (RM) is configured as a failover resource; however, manual intervention is required to mount the RM filesystem on Cluster Newton.


Diagram showing an AS configuration including Cluster Nodes, Filesystems, Master and Remote Mirror Volumes and Shadow copies


             Note: The data path this document focuses on in the diagram is "A - Remote Mirror Replication MV -> RM"

 

Solution

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms