Solaris Cluster 4.x - Zpool SUSPENDED, device possibly UNAVAIL when rebooting another node in the cluster. (Doc ID 2302091.1)

Last updated on AUGUST 31, 2017

Applies to:

Solaris Cluster - Version 4.0 to 4.3 [Release 4.0 to 4.3]
Oracle Solaris on SPARC (64-bit)
Oracle Solaris on x86-64 (64-bit)

Symptoms

 

--

/var/adm/messages:

Aug 28 14:33:08 Node3 cl_runtime: [ID 273354 kern.notice] NOTICE: CMM: Node Node2 (nodeid = 2) is dead.
Aug 28 14:33:11 Node3 cl_runtime: [ID 446068 kern.notice] NOTICE: CMM: Node Node2 (nodeid = 2) is down.

Aug 28 14:33:12 Node3 Cluster.Framework: [ID 801593 daemon.notice] stdout: fencing node Node2 from shared devices

Aug 28 14:33:13 Node3 Cluster.CCR: [ID 651093 daemon.warning] reservation message(fence_node) - Fencing node 2 from disk /dev/did/rdsk/d50s2
Aug 28 14:33:13 Node3 Cluster.CCR: [ID 551094 daemon.warning] reservation warning(fence_node) - Unable to open device /dev/did/rdsk/d50s2, will retry in 2 seconds

Aug 28 14:33:17 Node3 fmd: [ID 377184 daemon.error] SUNW-MSG-ID: ZFS-8000-NX, TYPE: Fault, VER: 1, SEVERITY: Major
Aug 28 14:33:17 Node3 EVENT-TIME: Mon Aug 28 14:33:16 PDT 2017
Aug 28 14:33:17 Node3 PLATFORM: Sun-Fire-T200, CSN: unknown, HOSTNAME: Node3
Aug 28 14:33:17 Node3 SOURCE: zfs-diagnosis, REV: 1.0
Aug 28 14:33:17 Node3 EVENT-ID: 3951d5a3-3b72-417f-8b19-fb0f4242dccf
Aug 28 14:33:17 Node3 DESC: Probe of ZFS device 'id1,vdc@n600a0b800011a4be000035465550d0ef/a' in pool 'test-pool3' has failed.
Aug 28 14:33:17 Node3 AUTO-RESPONSE: The device has been offlined and marked as faulted. An attempt will be made to activate a hot spare if available.
Aug 28 14:33:17 Node3 IMPACT: Fault tolerance of the pool may be compromised.
Aug 28 14:33:17 Node3 REC-ACTION: Use 'fmadm faulty' to provide a more detailed view of this event. Run 'zpool status -lx' for more information. Please refer to the associated reference document at http://support.oracle.com/msg/ZFS-8000-NX for the latest service procedures and policies regarding this diagnosis.
Aug 28 14:33:18 Node3 fmd: [ID 377184 daemon.error] SUNW-MSG-ID: ZFS-8000-FD, TYPE: Fault, VER: 1, SEVERITY: Major
Aug 28 14:33:18 Node3 EVENT-TIME: Mon Aug 28 14:33:18 PDT 2017
Aug 28 14:33:18 Node3 PLATFORM: Sun-Fire-T200, CSN: unknown, HOSTNAME: Node3
Aug 28 14:33:18 Node3 SOURCE: zfs-diagnosis, REV: 1.0
Aug 28 14:33:18 Node3 EVENT-ID: 0f49c2ae-457c-4ea4-98e9-b27f877e117c
Aug 28 14:33:18 Node3 DESC: The number of I/O errors associated with ZFS device 'id1,vdc@n600a0b800011a4be000035465550d0ef/a' in pool 'test-pool3' exceeded acceptable levels.
Aug 28 14:33:18 Node3 AUTO-RESPONSE: The device has been offlined and marked as faulted. An attempt will be made to activate a hot spare if available.
Aug 28 14:33:18 Node3 IMPACT: Fault tolerance of the pool may be compromised.
Aug 28 14:33:18 Node3 REC-ACTION: Use 'fmadm faulty' to provide a more detailed view of this event. Run 'zpool status -lx' for more information. Please refer to the associated reference document at http://support.oracle.com/msg/ZFS-8000-FD for the latest service procedures and policies regarding this diagnosis.

Aug 28 14:33:22 Node3 fmd: [ID 377184 daemon.error] SUNW-MSG-ID: ZFS-8000-8A, TYPE: Fault, VER: 1, SEVERITY: Critical
Aug 28 14:33:22 Node3 EVENT-TIME: Mon Aug 28 14:33:22 PDT 2017
Aug 28 14:33:22 Node3 PLATFORM: Sun-Fire-T200, CSN: unknown, HOSTNAME: Node3
Aug 28 14:33:22 Node3 SOURCE: zfs-diagnosis, REV: 1.0
Aug 28 14:33:22 Node3 EVENT-ID: e86bc0b4-49e8-43fa-9fb2-c9b0290fcddd
Aug 28 14:33:22 Node3 DESC: A file or directory in pool 'test-pool3' could not be read due to corrupt data.
Aug 28 14:33:22 Node3 AUTO-RESPONSE: No automated response will occur.
Aug 28 14:33:22 Node3 IMPACT: The file or directory is unavailable.
Aug 28 14:33:22 Node3 REC-ACTION: Use 'fmadm faulty' to provide a more detailed view of this event. Run 'zpool status -xv' and examine the list of damaged files to determine what has been affected. Please refer to the associated reference document at http://support.oracle.com/msg/ZFS-8000-8A for the latest service procedures and policies regarding this diagnosis.

 

 

Changes

The zpool had previously been destroyed, the original lun removed, a new lun provided and the zpool re-created.
The data was restored and the Zpool was reactivated. All nodes in the cluster remained up during this change control.

Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms