My Oracle Support Banner

[OVM 3.3+ ] Virtual Machine Takes a Very Long Time to Complete Live Migration (Doc ID 2679653.1)

Last updated on JUNE 25, 2023

Applies to:

Oracle VM - Version 3.3.1 to 3.4.6 [Release OVM33 to OVM34]
Linux x86-64

Symptoms

Sometimes, it can be observed that Virtual Machine(VM) live migration takes a very long time caused different application outages/failures in guest OS, like RAC Cluster node fence, Network application error and etc. The time to complete live migration may spans 5 - 11 mins or even longer which has something to do with a huge amount of memory configured to the VM.

By looking into log files, below log patterns can be identified, 

  1. xend.log

on source server it took 11+ mins to destroy the VM, 

[2019-12-09 14:15:25 2500] DEBUG (XendDomainInfo:5242) Release running lock: status=0, output=
[2019-12-09 14:15:25 2500] DEBUG (XendCheckpoint:138) [xc_save]: /usr/lib64/xen/bin/xc_save 25 5 0 0 5
[2019-12-09 14:15:26 2500] INFO (XendCheckpoint:430) xc_save: failed to get the suspend evtchn port
[2019-12-09 14:15:26 2500] INFO (XendCheckpoint:430)
[2019-12-09 14:15:26 2500] INFO (XendCheckpoint:430) [3613] xc: detail: xc_domain_save: starting save of domid 5
[2019-12-09 14:15:27 2500] INFO (XendCheckpoint:430) [3613] xc: detail: type fail: page 527 mfn 0000220f ... ...
[2019-12-09 14:15:33 2500] INFO (XendCheckpoint:430) [3613] xc: detail: type fail: page 71 mfn 000f2047
[2019-12-09 14:17:52 2500] INFO (XendCheckpoint:430) [3613] xc: detail: delta 145999ms, dom0 13%, target 36%, sent 5413Mb/s, dirtied 63Mb/s 283474 pages
[2019-12-09 14:17:54 2500] INFO (XendCheckpoint:430) [3613] xc: detail: delta 1643ms, dom0 6%, target 44%, sent 5092Mb/s, dirtied 963Mb/s 48308 pages ... ...
[2019-12-09 14:18:00 2500] INFO (XendCheckpoint:430) [3613] xc: detail: Start last iteration
[2019-12-09 14:18:00 2500] DEBUG (XendCheckpoint:401) suspend
[2019-12-09 14:18:00 2500] DEBUG (XendCheckpoint:141) In saveInputHandler suspend
[2019-12-09 14:18:00 2500] DEBUG (XendCheckpoint:143) Suspending 5 ... ... ...
[2019-12-09 14:18:03 2500] INFO (XendCheckpoint:430) [3613] xc: detail: Save exit of domid 5 with rc=0
[2019-12-09 14:18:03 2500] DEBUG (XendDomainInfo:3842) XendDomainInfo.destroy: domid=5
[2019-12-09 14:29:25 2500] DEBUG (XendDomainInfo:2584) Destroying device model
[2019-12-09 14:29:25 2500] INFO (image:741) migrating-0004fb0000060000e7cd3059fd9d9bb4 device model terminated

 

on destination server

[2019-12-09 14:15:26 2424] DEBUG (XendDomainInfo:246) XendDomainInfo.restore(['domain', ['domid', '5']...['maxmem', '94208'], ['memory', '94208'] ... ...
[2019-12-09 14:15:26 2424] DEBUG (XendDomain:476) Adding Domain: 5 ... ...
[2019-12-09 14:15:27 2424] DEBUG (XendCheckpoint:312) [xc_restore]: /usr/lib64/xen/bin/xc_restore 4 5 6 8 1 1 1 0 1
[2019-12-09 14:15:27 2424] INFO (XendCheckpoint:430) [20440] xc: detail: xc_domain_restore: starting restore of new domid 5
[2019-12-09 14:15:27 2424] INFO (XendCheckpoint:430) [20440] xc: detail: xc_domain_restore: p2m_size = 1710001 ... ...
[2019-12-09 14:16:52 2424] INFO (XendCheckpoint:430) [20440] xc: detail: Mapping order 0,  1024; first pfn e47000 ... ...
[2019-12-09 14:17:52 2424] INFO (XendCheckpoint:430) [20440] xc: detail: Mapping order 0,  1024; first pfn 170f000
[2019-12-09 14:17:52 2424] INFO (XendCheckpoint:430) [20440] xc: detail: Mapping order 0,  1024; first pfn 170f400
[2019-12-09 14:17:52 2424] INFO (XendCheckpoint:430) [20440] xc: detail: Mapping order 0,  1024; first pfn 170f800
[2019-12-09 14:17:52 2424] INFO (XendCheckpoint:430) [20440] xc: detail: Mapping order 0,  1024; first pfn 170fc00
[2019-12-09 14:18:01 2424] INFO (XendCheckpoint:430) [20440] xc: detail: Received all pages (0 races)
[2019-12-09 14:18:03 2424] INFO (XendCheckpoint:430) [20440] xc: detail: Read 2560 bytes of QEMU data
[2019-12-09 14:18:03 2424] INFO (XendCheckpoint:430) [20440] xc: detail: Read 5632 bytes of QEMU data
[2019-12-09 14:18:03 2424] INFO (XendCheckpoint:430) [20440] xc: detail: 8192-byte QEMU buffer full, reallocating...
[2019-12-09 14:18:03 2424] INFO (XendCheckpoint:430) [20440] xc: detail: Read 1362 bytes of QEMU data
[2019-12-09 14:29:25 2424] INFO (XendCheckpoint:430) [20440] xc: detail: Writing 9554 bytes of QEMU data
[2019-12-09 14:29:25 2424] INFO (XendCheckpoint:430) [20440] xc: detail: Restore exit of domid 5 with rc=0

 

2. ovs-agent.log

on both source and destination server, there are heaps of vm-list lock time out errors, 

[2019-11-22 11:47:18 21416] DEBUG (linuxstats:58) Error getting VM stats:
Lock file /var/run/ovs-agent/vm-list.lock failed: timeout occurred.
[2019-11-22 11:49:21 6387] ERROR (service:97) catch_error: Lock file
/var/run/ovs-agent/vm-list.lock failed: timeout occurred.
Traceback (most recent call last):
  File "/usr/lib64/python2.6/site-packages/agent/lib/service.py", line 95, in wrapper
    return func(*args)
  File "/usr/lib64/python2.6/site-packages/agent/api/hypervisor/xenxm.py", line 302, in list_vms
    return get_vms()
  File "/usr/lib64/python2.6/site-packages/agent/lib/xenxm.py", line 118, in get_vms
    lock.acquire(wait=120)
  File "/usr/lib64/python2.6/site-packages/agent/lib/filelock.py", line 96, in acquire
    self.filename)
LockError: Lock file /var/run/ovs-agent/vm-list.lock failed: timeout occurred.

 

Changes

 Doing VM live migration

Cause

To view full details, sign in with your My Oracle Support account.

Don't have a My Oracle Support account? Click to get started!


In this Document
Symptoms
Changes
Cause
Solution
References


My Oracle Support provides customers with access to over a million knowledge articles and a vibrant support community of peers and Oracle experts.