Oracle ZFS Storage Appliance: Disabling Write Cache Enable (WCE) for ZFSSA exported LUNs on Solaris Clients (Doc ID 2063355.1)

Last updated on AUGUST 01, 2017

Applies to:

Sun ZFS Storage 7120 - Version All Versions and later
Sun Storage 7210 Unified Storage System - Version All Versions and later
Sun Storage 7310 Unified Storage System - Version All Versions and later
Sun Storage 7410 Unified Storage System - Version All Versions and later
Sun ZFS Storage 7320 - Version All Versions and later
7000 Appliance OS (Fishworks)
Due to Solaris Bug 15662604 - "ZFS should retry or abort an entire transaction when a WCE LUN gets a power-on-reset condition" and ZFS-SA Bug 15753332 - "appliance should ignore over the wire dkiocsetwce" Solaris clients may experience data integrity issues when the write cache has been enabled on LUNs exported from an Oracle ZFS Storage Appliance.

NOTE: This also applies to Solaris LDOM's using Vdisks created on top of ZFS-SA LUNs, so if the client LDOM or zone is putting a zpool on the vdisk the fix below will also need to be applied.

Goal

To permanently disable Write Cache on LUNs exported from an Oracle ZFS Storage Appliance when used in a zpool on a Solaris client.

Background:

ZFS enables the write cache on pool devices upon zpool import, and safely handles cache flushing in the event of a system power loss. However, a power-on-reset condition can potentially occur while data has not yet been committed to stable storage.

In an environment with no single point of failure, this situation is automatically detected and corrected by ZFS the next time the data is read. Routine pool scrubs of the pool can increase the detection and repair of any lost writes.

In an environment with a single point of failure, this problem could lead to data loss.

This problem might also occur more frequently when accessing LUNs that are exported from a clustered configuration. During cluster failover, data cached by the failing head may be lost due to a power-on-reset event that is explicitly sent by the SCSI target on the surviving head. In this situation, even pools with no single point of failure might be affected.

A symptom of this issue is clusters of persistent checksum errors. You can use the output from fmdump –eV to determine whether the checksum errors have been diagnosed as persistent. The zio_txg entry in the fmdump –eV output represents the time that a block of data is written. Note that a pattern of persistent checksum errors could also be a symptom of failing devices, software, or hardware.

 

Solution

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms