OLVM: DataCenter Non-Responsive with Error "Cannot find master domain"
(Doc ID 2772938.1)
Last updated on MAY 01, 2024
Applies to:
Linux OS - Version Oracle Linux 7.9 with Unbreakable Enterprise Kernel [5.4.17] and laterLinux x86-64
Symptoms
All hosts in the DC become non-operational:
2021-04-27 22:19:47,441-04 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-234037) [7c0c9795] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM xxx1 command ConnectStoragePoolVDS failed: Cannot find master domain: u'spUUID=7c7903b6-c199-4ef1-97fb-6e11bf8de5e2, msdUUID=d19f6e7d-e9ff-4743-a0a5-d25fb377f81d' 2021-04-27 22:22:39,857-04 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-234149) [] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM xxx2 command ConnectStoragePoolVDS failed: Cannot find master domain: u'spUUID=7c7903b6-c199-4ef1-97fb-6e11bf8de5e2, msdUUID=d19f6e7d-e9ff-4743-a0a5-d25fb377f81d' 2021-04-27 22:22:40,406-04 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (EE-ManagedThreadFactory-engine-Thread-234114) [] Command 'ConnectStoragePoolVDSCommand(HostName = xxx, ConnectStoragePoolVDSCommandParameters:{hostId='e3768de0-0baa-4576-8ff7-afbcee94f605', vdsId='e3768de0-0baa-4576-8ff7-afbcee94f605', storagePoolId='7c7903b6-c199-4ef1-97fb-6e11bf8de5e2', masterVersion='617'})' execution failed: IRSGenericException: IRSErrorException: IRSNoMasterDomainException: Cannot find master domain: u'spUUID=7c7903b6-c199-4ef1-97fb-6e11bf8de5e2, msdUUID=d19f6e7d-e9ff-4743-a0a5-d25fb377f81d' 2021-04-27 22:22:41,057-04 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-234113) [6d8f7f8] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM xxx3 command ConnectStoragePoolVDS failed: Cannot find master domain: u'spUUID=7c7903b6-c199-4ef1-97fb-6e11bf8de5e2, msdUUID=d19f6e7d-e9ff-4743-a0a5-d25fb377f81d'
DataCenter becomes non reponsive:
2021-04-27 16:17:35,146-04 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-90) [4eed2096] EVENT_ID: SYSTEM_CHANGE_STORAGE_POOL_STATUS_PROBLEMATIC_WITH_ERROR(987), Invalid status on Data Center xxx. Setting Data Center status to Non Responsive (On host xxx, Error: General Exception).
By checking system logs from all hosts, they report path offline and I/O errors with the master domain:
Apr 27 15:20:04 host1 multipathd: 3600144f0d329c44b00005ee2686e0003: sdj - path offline Apr 27 15:20:09 host1 multipathd: 3600144f0d329c44b00005ee2686e0003: sdj - path offline Apr 27 15:20:09 host1 kernel: blk_update_request: I/O error, dev dm-8, sector 1124478346 op 0x1:(WRITE) flags 0x8800 phys_seg 3 prio class 0 Apr 27 15:20:09 host1 kernel: blk_update_request: I/O error, dev dm-8, sector 104762199904 op 0x1:(WRITE) flags 0x8800 phys_seg 5 prio class 0 Apr 27 15:20:09 host1 kernel: blk_update_request: I/O error, dev dm-8, sector 12758732825 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 0 Apr 27 15:20:09 host1 kernel: blk_update_request: I/O error, dev dm-8, sector 26579493016 op 0x1:(WRITE) flags 0x8800 phys_seg 16 prio class 0 Apr 27 15:20:09 host1 kernel: blk_update_request: I/O error, dev dm-8, sector 11248957761 op 0x1:(WRITE) flags 0x8800 phys_seg 4 prio class 0 Apr 27 15:20:09 host1 kernel: blk_update_request: I/O error, dev dm-8, sector 9314783624 op 0x0:(READ) flags 0x0 phys_seg 15 prio class 0 Apr 27 15:20:09 host1 multipathd: 3600144f0d329c44b00005ee2686e0003: Disable queueing ...
Apr 27 22:58:48 host2 kernel: blk_update_request: I/O error, dev dm-8, sector 0 op 0x0:(READ) flags 0x0 phys_seg 32 prio class 0 Apr 27 22:59:05 host2 multipathd: 3600144f0d329c44b00005ee2686e0003: sdj - path offline Apr 27 22:59:07 host2 kernel: blk_update_request: I/O error, dev dm-8, sector 264192 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0 Apr 27 22:59:08 host2 kernel: blk_update_request: I/O error, dev dm-8, sector 0 op 0x0:(READ) flags 0x0 phys_seg 32 prio class 0 Apr 27 22:59:10 host2 multipathd: 3600144f0d329c44b00005ee2686e0003: sdj - path offline Apr 27 22:59:15 host2 multipathd: 3600144f0d329c44b00005ee2686e0003: sdj - path offline Apr 27 22:59:17 host2 kernel: blk_update_request: I/O error, dev dm-8, sector 264192 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0 Apr 27 22:59:20 host2 multipathd: 3600144f0d329c44b00005ee2686e0003: sdj - path offline
Apr 27 23:02:48 host3 kernel: blk_update_request: I/O error, dev dm-8, sector 0 op 0x0:(READ) flags 0x0 phys_seg 32 prio class 0 Apr 27 23:02:48 host3 vdsm[5068]: ERROR Unhandled exception in <Task discardable <UpdateVolumes vm=464b1bd0-e869-4085-b8db-213c5c3618c7 at 0x7f0fac11a5d0> timeout=30.0, duration=0.19 at 0x7f0fac11a3d0>#01 2Traceback (most recent call last):#012 File "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 315, in _execute_task#012 task()#012 File "/usr/lib/python2.7/site-packages/vdsm/executor.py", line 3 91, in __call__#012 self._callable()#012 File "/usr/lib/python2.7/site-packages/vdsm/virt/periodic.py", line 315, in __call__#012 self._execute()#012 File "/usr/lib/python2.7/site-packages/vdsm/virt/pe riodic.py", line 357, in _execute#012 self._vm.updateDriveVolume(drive)#012 File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 4189, in updateDriveVolume#012 vmDrive.volumeID)#012 File "/usr /lib/python2.7/site-packages/vdsm/virt/vm.py", line 6101, in _getVolumeSize#012 (domainID, volumeID))#012StorageUnavailableError: Unable to get volume size for domain d19f6e7d-e9ff-4743-a0a5-d25fb377f81d vo lume 0281c278-7834-48a0-90b7-110a384b1561 .... Apr 27 23:02:48 host3 kernel: blk_update_request: I/O error, dev dm-8, sector 0 op 0x0:(READ) flags 0x0 phys_seg 32 prio class 0
Cause
To view full details, sign in with your My Oracle Support account. |
|
Don't have a My Oracle Support account? Click to get started! |
In this Document
Symptoms |
Cause |
Solution |
References |