TagvSphere replication

vSphere Replication 6.0 part 3: enhanced linked mode, backup and recovery scenarios

In the 3rd part of my series I am going to talk about the usage of vCenter enhanced linked mode and vSphere Replication 6.0 and how it can be used to protect the vSphere replication infrastructure itself.

In the newest version vSphere replication makes use of the LookupService provided by SSO on the new Plattform Service Controller (PSC). Having multiple vCenter instances sharing the same PSC, the so called vCenter enhanced linked mode, we are not just able to manage all vCenter within a single vSphere Web Client. We can also use vSphere replication to replicate and therefore protect VMs from one site to another and migrate simply a VM back after a restore of the protected site within an integrated view.

The following demonstrated a logical view on a recommended vCenter enhanced linked mode setup.


This architecture has a lot of benefits. You have a vCenter in both sites which is required when you are forced to recover your VMs (in a supported way). As soon as we are having our vCenter in an enhanced linked mode we are able to select all joined vCenter as a target for our vSphere replication protection.

vSphere Replication linked mode target site

I see very often that the backup strategy of some organizations does not take it into consideration that you very often MUST have a vCenter to recover a VM with your backup solution ( if there is no ’emergency-direct-to-ESXi-recovery-feature’ included). For sure there are ways to register the replicated VM back to on the recovery site, but hey … (officially) we need to make sure that our recovery procedures are supported by the vendor.

In the current situation there is one thing I am uncomfortable with. The current recommended way by VMware tells us to create a dedicated PSC-Cluster with a Network Load Balancer in front of it. Since only NSX, F5 and NetScaler is supported this puts a little additional fee for licensing, operating and implementing of the solution. To be honest, I don’t believe to see such a setup pretty often in non-enterprise environments (On this level people are waiting for vVol replication support ;-)).

The only ‘easier’ suitable option would be to a solution like the following in place


Referring to VMware blog post on the new PSC architecture possibilities the only recommended option is the one mentioned in the beginning. I am currently evaluating and searching discussions about the pros/cons of the mentioned configuration. I will write about the findings in a different post.

Protect and Recover vSphere Replication Appliances and Server (Demo)

It’s worth to remember protecting the vSphere Replication Appliances as well, so that in case of an outage your are able to bring back the replication infrastructure pretty painless. I am going to show you how to recover from a vSphere Replication Appliance data-loss.

In my Lab environment I have two sites and I am going to protect the vSphere replication appliance from LE02 (managed by LE02VCE03) to LE01 (managed by vCSA01). The PSC of each vCenter has joined to the same SSO-Domain.

On my protected site I have 6 machines protected.

In the first scenario I have lost my vSphere replication appliance data on the protection site, so I recover it (vSRA) with help of vSphere replication


and once the original site has been restored, I failback to it via cross vCenter vMotion.

vSphere_Replication_2 vSphere_Replication_4

One thing you need to take care of is that the vSphere Replication Appliance and Server are registered against a vCenter. If you restore this machine in the way I described it above or with any other backup solution that restores the VM you need to make sure to re-register the VM with the vCenter, otherwise you see the following error within the vSphere replication menu.



So what to do? Register the recovered VM as a vSphere replication server

Screen Shot 2015-07-14 at 16.16.10

and verify that all of your vSphere replication jobs are still in place / running.

Screen Shot 2015-07-11 at 11.15.50

Voila… we recovered the vSphere Replication Appliance and can go on with our next test.

Recover  protected virtual machines with and failback with cross vCenter vMotion (Demo)

My protected site has been failed and the data has been lost. Lucky me I was able to recover all protected VMs on my recovery site. Depending on the network-characteristics you might be forced to change the IPs of your VMs (PowerCLI can be your friend 😉 )

Screen Shot 2015-07-14 at 17.11.58

Screen Shot 2015-07-13 at 21.32.39

After the rebuild of my primary site. I was able to to failback/migrate all VMs with cross vCenter vMotion to the original site.

Screen Shot 2015-07-13 at 22.53.59


Finalize the steps and voila. You have successfully failed back the VMs.

Make sure to configure a re-protection of the virtual machines.

Final words

The thing I am still missing is a smooth way of having a simple setup of a vCenter in a linked mode. Once I lost my protected site the behaviour of the Web Client was getting really slow and sluggish. Even after the site recovery I needed a reboot of my primary vCenter to get it fully functional again. At this time I am still not sure what’s the best way to establish a vCenter in enhanced linked mode in a ‘stretched’ environment. Any input / discussions / opinions are very appreciated.



vSphere Replication 6.0 Part 1: Architecture and features at a glance. vSphere Replication standalone

vSphere Replication is a really cool application helping us to replicate our virtual machines on a VM-level without the need to have a dedicated and replicating storage. Besides the conservative replication methods it can also be used to replicate to a public cloud provider (call it DaaS or whatever 😉 like VMware vCloud Air (I am still lacking technical-deep knowledge of VMware’s hybrid cloud appriache. It will be a seperate blog-post once I know more;-).

In the following I want to give an architectural overview of the new version 6.0 of vSphere replication. I realized during some of my Site Recovery Manager class people might get confused with some KB-mentioned terminologies and maximums so I wanted to create something that clarifies all of those things.

This article is the first part of the vSphere replication 6.0 series (not sure if series is the right word if only 3 episodes are planned 😉 )

General features and what’s new in vSphere Replication 6.0 (NEW)

  • Replication on a virtual machine level
  • Replication functionality embedded in the vmkernel as vSphere replication agent
  • RPO (Recovery Point Objective = data that can be lost): 15min – 24hours
  • Quiescing of the guest-OS to ensure crash-consistency for Windows (via VSS) and Linux (NEW)
  • Replication indepedant of theunderlaying storage technology (vSAN, NAS, SAN, Local)
  • Support for enhanced vSphere functionalities (Storage DRS, svMotion, vSAN)
  • Initial replication can be optimized by manually transferring the virtual machine to the recovery location (replication seeds)
  • Transferring of the changed blocks (not CBT) and initial synchronization can be compressed and therefore minimize the required network-bandwidth (NEW)
  • Network can be more granular configured and more isolated from each other (NEW) with VMKernel functions for NFC and vSphere Replication (vSphere 6.0 required)
  • Up to 2000 Replications (NEW)

Components required for vSphere replication

For vSphere replication we need besides to the mandatory components (vCenter, SSO, Inventory Service, Webclient and ESXi) download the vSphere replication appliance from VMware.com.

The general task of the vSphere replication appliance is to get data  (VM files and changes) from the vSphere agent of a protected ESXi and transfer it to the configured recovery ESXi (via a mechanism called Network File Copy – NFS).

Now it might get a little bit confusing. The appliance we are downloading are in fact 2 different appliances with 2 different OVF-file pointing to the same disk file.

  1. 1x OVF (vSphere_Replication_OVF10.ovf) which is the virtual machine descriptor for the vSphere replication (manager) appliance – used for SRM OR single vSphere Replication – 2 or 4 vCPU / 4GB RAM
  2. 1x OVF (vSphere_Replication_AddOn_OVF10.ovf) which is the virtual machine descriptor for the vSphere replication server – can be used to balance the load and increase the maximum number of replicated VMs – 2 vCPU / 512 MB RAM

vSphere replication (manager) appliance

The vSphere replication (manager) appliance is the ‘brain’ in the vSphere replication process and is registered to the vCenter so that the vCenter-Webclient is aware of the new functionality. It stores the configured data in the embedded postgres-SQL database or in an externally added SQL database. The VMware documentation typically talks about the vSphere replication appliance, to make sure not to mix it up with the replication server I put the (manager) within the terminology. The vSphere replication (manager) appliance includes also the 1st vSphere replication Server. Only 1 vSphere replication appliance can registered with a vCenter and supports theoretically up to 2000 replications if we have 10 vSphere replication server in total. Please be aware of the following KB if you want to replicate more than 500 VMs since minor changes at the Appliance are mandatory.

vSphere replication server

The vSphere replication server in general is responsible for the replication job (data gathering from source-ESXi and data transferring to target-ESXi). It is included within the vSphere replication appliance and can effectively handle 200 replication jobs. Even though it I have read in some blogs that it is only possible to spread the replication load over several vSphere replication server in conjunction with Site Recovery Manger it works out of the box without the need for Site recovery manager.

Sample Architecture

The following picture should illustrate the components and traffic flow during the vSphere replication process.



The following diagrams shows two sample Architectures regarding the network.

In the first diagram the vSphere replication network MUST be routed/switched on layer 3, while in the  second example we are able to stay in a single network segmet with our replication-traffic (thanks to the new VMkernel functionalites for NFC/Replication traffic in vSphere 6).

Option 1: External routing/switch mandatory (would be a good use case for the internal routing of NSX ;-)):


Option 2: No routing mandatory & switching occurs within the ESXi


Of course those are only two simple configuration samples, but I want to make you aware of that the (virtual) network design has an impact on the replication performance in the end.

I will focus on the performance difference between those two options (and the usage of the compression mode within an LAN) in part 2. Stay tuned 😉

© 2020 v(e)Xpertise

Theme by Anders NorénUp ↑