The Importance of Live Migration Capabilities in Modern Datacenters

Recently the popular Xen hypervisor system showed its limitations due to leading cloud service providers being forced to reboot chunks of their compute instances due to a glitch in the software. In the past, this may have been an acceptable solution, however now that datacenters are powering mission critical applications, system administrators need to know all their options when it comes to protecting their servers from expected and unexpected downtime.

Rackmounted Servers in a Data Center

For any data center using cloud technologies, live migration is an ideal way to minimize downtime as it allows the host to move the guest virtual machine (VM) to another machine while the original system reboots. This is typically done in two ways:

Pre-copy memory migration

  1. Warm-up phase: The hypervisor will copy the contents of the memory from the source to the destination while the original VM is running. If data corruption occurs in transit (known as dirtying), the data will be re-copied until the dirtying rate is at an acceptable level.
  2. Stop-and-copy phase: In this phase, the original VM is stopped, any remaining data is copied to the secondary server and the cloned VM will then resume. Typically there will be a few milliseconds to seconds of downtime depending on the size of memory and applications running on the VM.

Post-copy memory migration

This type of migration happens by suspending the VM at the source and then pushing core information (CPU state, registers and, optionally, non-pageable memory) to the secondary system. Afterwards the VM is resumed again on the second system. As the second system is running, the source VM continues to push the rest of the data to the new system. Unlike pre-copy migration, post-copy migration only sends the data exactly once across the network. Additionally with post-copy systems, the VM state is distributed across the source and destination systems, so if the destination system has a failure, the VM cannot be recovered.

If you are evaluating a live migration system for your infrastructure, you should keep the following rule of thumb in mind: pre-copy migration can require more resources making it less ideal for projects where high-performance is a requirement of the client. On the other hand, if reliability is the primary focus, then it is a better fit.

If you are currently using or considering going with a cloud provider for your hosting needs Site24x7 has a checklist to help you address other considerations beyond live migration capabilities.

Also read more about ways to handle passwords and firewall issues in a datacenter.

Comments (0)