Manage your VMware Metro Storage Cluster with Tags

Running a VMware vSphere Metro Storage Cluster (VMware vMSC) makes it possible to move VMs between two geographical locations without downtime. In a Metro Storage Cluster a vSphere Cluster consists of ESXi hosts from both locations with shared storage that is stretched over these locations. Using a VMware Metro Storage Cluster makes your environment even more resilliant against failures and can help you get up and running again much faster should a disaster occur.

To get more information on VMware Metro Storage Cluster read the following articles:
Implementing vSphere Metro Storage Cluster (vMSC) using HP 3PAR Peer Persistence (2055904)
Implementing vSphere Metro Storage Cluster (vMSC) using EMC VPLEX (2007545)
VMware vSphere Metro Storage Cluster Case Study

(In this post I used an example and image from  the VMware vSphere Metro Storage Cluster Case Study)

While a Metro Storage Cluster offers resilliance in many failure scenario’s, a datacenter pration can still cause downtime to your VMs. This is because a stretched VMFS volume is in reality a virtual VMFS volume based on a read-write volume in one datacenter and a read-only volume in the other datacenter. When a datacenter partition ocurs the virtual VMFS volume will only be accessible by the ESXi host on the Read-Write site. When this partition happens, it could potentially bring down VMs that are running “cross datacenter”.

VMware vSphere Metro Storage Cluster

(Image taken from the White Paper “VMware vSphere Metro Storage Cluster Case Study”).

For example, looking at the image above: A VM is running on ESXi-01 and is using datastore Bluefin01. Usualy this is fine as the IO’s from the VM are handled by storage A and the storage Metro Clustering software (for example EMC VPLEX) will make sure the IO will also be written on storage B. When a partition of the datacenter will happen, the datastore Bluefin01 will become invisible on ESXi-01 / Storage A and only visible and write-able on storage B, which will cause the VM to lose connection to its disks.

To prevent these scenario’s from happening, usually DRS host-to-VM affinity rules are used. VMs that reside on the Bluefin datastores are connected to ESXi-03 and ESXi-04 through a host-affinity rule. In theory this should be enough but in practise we noticed that now and then admins would forget to add new VMs to those DRS rules or storage VMotion from a Bluefin VMFS to a Frimley VMFS without adjusting the DRS rules.

To keep an eye on changes like these and prevent unwanted down time, my colleague Peter Lammers and I used vCenter Tags and PowerShell to make sure a VM is always on the correct datastore and host. Each morning a PowerShell script is run that checks for a matching VM location tag, datastore location tag and host location tag.

VM location tags, datastore location tags and host location tags

For the location tags to work, fire-up the good old vSphere Web Client (hehehe). Go to the “Tags” section, click on “Categories” and create a new categorie named: “Location” with a cardinality of “One tag per object” and select “Associable Object types” for Datastore, Host and Virtual Machine. Next click on “Tags” and create two tags connected to the Location category, in this example “Bluefin” and “Frimley”.

Next in the vSphere Web Client, select the datastores from “Frimley” and add the “Frimley” tag. For “Bluefin” add the “Bluefin” tag. Do the same for the hosts and the VMs.

When you’re done, you can use this script to make sure that VM, datastore and host are on the same location.

$Report = @()
# Get all VMs
$VMList = Get-VM

ForEach( $vm in $VMList )
# 1 get VM Location tage
$VMLocation = Get-TagAssignment -Entity $vm -Category "Location"

# 2 What datastores is the VM running on?
$datastores = Get-Datastore -RelatedObject $vm

ForEach( $datastore in $datastores )
	# Get the location of the datastore
	$DSLocation = Get-Datastore -Name $datastore | Get-TagAssignment -Category "Location"

	# Get the location of the host
	$HostLocation = $vm.Host | Get-TagAssignment -Category "Location"

	$row = "" | Select Name, Location, HostName, HostLocation, DSName, DSLocation, ResultDS, ResultHost
	$row.Name = $vm.Name
	$row.Location = $VMLocation.Tag
	$row.HostName = $vm.Host
	$row.HostLocation = $HostLocation.Tag
	$row.DSName = $datastore.Name
	$row.DSLocation = $DSLocation

	if( $VMLocation.Tag -ne $DSLocation.Tag )
		Write-Host $vm.Name $VMLocation.Tag $datastore.Name $DSLocation.Tag "Incorrect storage assignment"
		$row.ResultDS = "Incorrect storage assignment"
		$row.ResultDS = "OK"

	if( $VMLocation.Tag -ne $HostLocation.Tag   )
		Write-Host $vm.Name $VMLocation.Tag $vm.Host $HostLocation.Tag "Incorrect host assignment"
		$row.ResultHost = "Incorrect host assignment"
		$row.ResultHost = "OK"
	$Report += $row