My Oracle Support Banner

OLVM: DataCenter Non-Responsive with Error "Cannot find master domain" (Doc ID 2772938.1)

Last updated on OCTOBER 01, 2021

Applies to:

Linux OS - Version Oracle Linux 7.9 with Unbreakable Enterprise Kernel [5.4.17] and later
Linux x86-64

Symptoms

All hosts in the DC become non-operational:

2021-04-27 22:19:47,441-04 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-234037) [7c0c9795] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), 
VDSM xxx1 command ConnectStoragePoolVDS failed: Cannot find master domain: u'spUUID=7c7903b6-c199-4ef1-97fb-6e11bf8de5e2, msdUUID=d19f6e7d-e9ff-4743-a0a5-d25fb377f81d'
2021-04-27 22:22:39,857-04 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-234149) [] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), 
VDSM xxx2 command ConnectStoragePoolVDS failed: Cannot find master domain: u'spUUID=7c7903b6-c199-4ef1-97fb-6e11bf8de5e2, msdUUID=d19f6e7d-e9ff-4743-a0a5-d25fb377f81d'
2021-04-27 22:22:40,406-04 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (EE-ManagedThreadFactory-engine-Thread-234114) [] Command 'ConnectStoragePoolVDSCommand(HostName = xxx, 
ConnectStoragePoolVDSCommandParameters:{hostId='e3768de0-0baa-4576-8ff7-afbcee94f605', vdsId='e3768de0-0baa-4576-8ff7-afbcee94f605', storagePoolId='7c7903b6-c199-4ef1-97fb-6e11bf8de5e2', masterVersion='617'})'
 execution failed: IRSGenericException: IRSErrorException: IRSNoMasterDomainException: Cannot find master domain: u'spUUID=7c7903b6-c199-4ef1-97fb-6e11bf8de5e2, msdUUID=d19f6e7d-e9ff-4743-a0a5-d25fb377f81d'
2021-04-27 22:22:41,057-04 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-234113) [6d8f7f8] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), 
VDSM xxx3 command ConnectStoragePoolVDS failed: Cannot find master domain: u'spUUID=7c7903b6-c199-4ef1-97fb-6e11bf8de5e2, msdUUID=d19f6e7d-e9ff-4743-a0a5-d25fb377f81d'

DataCenter becomes non reponsive:

2021-04-27 16:17:35,146-04 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-90) [4eed2096] EVENT_ID: 
SYSTEM_CHANGE_STORAGE_POOL_STATUS_PROBLEMATIC_WITH_ERROR(987), Invalid status on Data Center xxx. Setting Data Center status to Non Responsive (On host xxx, Error: General Exception).

By checking system logs from all hosts, they report path offline and I/O errors with the master domain:

Apr 27 15:20:04 host1 multipathd: 3600144f0d329c44b00005ee2686e0003: sdj - path offline
Apr 27 15:20:09 host1 multipathd: 3600144f0d329c44b00005ee2686e0003: sdj - path offline
Apr 27 15:20:09 host1 kernel: blk_update_request: I/O error, dev dm-8, sector 1124478346 op 0x1:(WRITE) flags 0x8800 phys_seg 3 prio class 0
Apr 27 15:20:09 host1 kernel: blk_update_request: I/O error, dev dm-8, sector 104762199904 op 0x1:(WRITE) flags 0x8800 phys_seg 5 prio class 0
Apr 27 15:20:09 host1 kernel: blk_update_request: I/O error, dev dm-8, sector 12758732825 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 0
Apr 27 15:20:09 host1 kernel: blk_update_request: I/O error, dev dm-8, sector 26579493016 op 0x1:(WRITE) flags 0x8800 phys_seg 16 prio class 0
Apr 27 15:20:09 host1 kernel: blk_update_request: I/O error, dev dm-8, sector 11248957761 op 0x1:(WRITE) flags 0x8800 phys_seg 4 prio class 0
Apr 27 15:20:09 host1 kernel: blk_update_request: I/O error, dev dm-8, sector 9314783624 op 0x0:(READ) flags 0x0 phys_seg 15 prio class 0
Apr 27 15:20:09 host1 multipathd: 3600144f0d329c44b00005ee2686e0003: Disable queueing
...
Apr 27 22:58:48 host2 kernel: blk_update_request: I/O error, dev dm-8, sector 0 op 0x0:(READ) flags 0x0 phys_seg 32 prio class 0
Apr 27 22:59:05 host2 multipathd: 3600144f0d329c44b00005ee2686e0003: sdj - path offline
Apr 27 22:59:07 host2 kernel: blk_update_request: I/O error, dev dm-8, sector 264192 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Apr 27 22:59:08 host2 kernel: blk_update_request: I/O error, dev dm-8, sector 0 op 0x0:(READ) flags 0x0 phys_seg 32 prio class 0
Apr 27 22:59:10 host2 multipathd: 3600144f0d329c44b00005ee2686e0003: sdj - path offline
Apr 27 22:59:15 host2 multipathd: 3600144f0d329c44b00005ee2686e0003: sdj - path offline
Apr 27 22:59:17 host2 kernel: blk_update_request: I/O error, dev dm-8, sector 264192 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Apr 27 22:59:20 host2 multipathd: 3600144f0d329c44b00005ee2686e0003: sdj - path offline
Apr 27 23:02:48 host3 kernel: blk_update_request: I/O error, dev dm-8, sector 0 op 0x0:(READ) flags 0x0 phys_seg 32 prio class 0
Apr 27 23:02:48 host3 vdsm[5068]: ERROR Unhandled exception in <Task discardable <UpdateVolumes vm=464b1bd0-e869-4085-b8db-213c5c3618c7 at 0x7f0fac11a5d0> timeout=30.0, duration=0.19 at 0x7f0fac11a3d0>#01
2Traceback (most recent call last):#012 File "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 315, in _execute_task#012 task()#012 File "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 3
91, in __call__#012 self._callable()#012 File "/usr/lib/python2.7/site-packages/vdsm/virt/periodic.py", line 315, in __call__#012 self._execute()#012 File "/usr/lib/python2.7/site-packages/vdsm/virt/pe
riodic.py", line 357, in _execute#012 self._vm.updateDriveVolume(drive)#012 File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 4189, in updateDriveVolume#012 vmDrive.volumeID)#012 File "/usr
/lib/python2.7/site-packages/vdsm/virt/vm.py", line 6101, in _getVolumeSize#012 (domainID, volumeID))#012StorageUnavailableError: Unable to get volume size for domain d19f6e7d-e9ff-4743-a0a5-d25fb377f81d vo
lume 0281c278-7834-48a0-90b7-110a384b1561
....
Apr 27 23:02:48 host3 kernel: blk_update_request: I/O error, dev dm-8, sector 0 op 0x0:(READ) flags 0x0 phys_seg 32 prio class 0

Cause

To view full details, sign in with your My Oracle Support account.

Don't have a My Oracle Support account? Click to get started!


In this Document
Symptoms
Cause
Solution
References


My Oracle Support provides customers with access to over a million knowledge articles and a vibrant support community of peers and Oracle experts.