Stretched storage using SRM or vMSC?

I recently had a discussion regarding the setup of a secondary datacenter to provide business continuity to an existing infrastructure using stretched storage.  This in itself is an interesting topic with many architectural outcomes which I will not get into on this post but lets for the sake of argument say we decide to create active active data centers with EMC VPLEX Metro. 

The other side of the coin is how do you manage these active active data centers by balancing management overhead and resiliency?

vSphere Metro Storage Cluster (sMCS) and SRM 6.1, which supports stretched storage, are the two solutions I am going to review. There are already a whole bunch of articles out there on this but some of them mainly focused on pre 6.1.  These are just my notes and views and if you have a different viewpoint or it needs some tweaking please let me know, I cherish all feedback.

Why use stretched clusters:

  • Disaster avoidance or site maintenance without downtime is important.
    • Non-disruptive migration of workloads between the active-active datacenters.
  • When availability is one of your top priorities.  Depending on the failure scenario you have more outcomes where your VMs will not be impacted by outages from network, storage or host chassis failure at a site.
  • When your datacenters has network links which do no exceed 5 milliseconds round trip response time.
    • Redundant network links are highly recommended.
  • When you require multi-site load balancing of your workloads.

 

vPlex requirements: 

  • The maximum round trip latency on both the IP network and the inter-cluster network between the two VPLEX clusters must not exceed 5 milliseconds round-trip.
  • For management and vMotion traffic, the ESXi hosts in both data centers must have a private network on the same IP subnet and broadcast domain. Preferably management and vMotion traffic are on separate networks.
  • Stretched layer 2 network, meaning the networks where the VMs reside on needs to be available/accessible from both sites.
  • The data storage locations, including the boot device used by the virtual machines, must be active and accessible from ESXi hosts in both data centers.
  • vCenter Server must be able to connect to ESXi hosts in both data centers.
  • The VMware datastore for the virtual machines running in the ESX Cluster are provisioned on Distributed Virtual Volumes.
  • The maximum number of hosts in the HA cluster must not exceed 32 hosts for 5.x and 64 hosts for 6.0.
  • The configuration option auto-resume for VPLEX Cross-Connect consistency groups must be set to true.
  • Enabling FT on the virtual machines is supported except for Cluster Witness Servers.
  • This configuration is supported on both VS2 and VS6 hardware for VPLEX 6.0 and later releases.

 

VMSC:

a VMSC infrastructure is a stretched cluster that enables continues availability across sites, including support for:

  • vSphere vMotion
  • HA
  • DRS
  • FT over distance
  • Storage failure protection

VMSC requirements:

  • Single vCenter server
  • Cluster with DRS and HA enabled
  • Regular vCenter server requirements apply here

VMSC positives:

  • Continuous Availability
  • Fully Automatic Recovery
    • VMware HA (near zero RTO)
  • Automated Load Balancing
    • DRS and Instant vMotion
  • vMSC using VPLEX Metro
    • Certified Since vSphere 5.0
  • Behaves just like a single vSphere cluster

VMSC negatives:

  • Major architectural and operational considerations for HA and DRS configurations. This is especially true for highly customized environments with rapid changes in configuration.  Some configuration change examples:
    • Admission control
    • Host affinity rules to make sure that VMs talk to local storage
    • Datastore heartbeat
    • Management address heartbeat and 2 additional IPs
    • Change control for when workloads are migrated to different sites, rules would need to be updated.
  • Double the amount of resources required.  When you buy one, well you need to buy a second!  This is important since you have keep enough resources available on each site to satisfy the resources requirements for HA failover since all VMs are restarted within the cluster.
    • Recommended to set Admission control to 50%
  • No orchestration of powering on VMs after HA restart.
    • HA will attempt to start virtual machines with the categorization of High, Medium or Low. The difficulty here is, if critical systems must start first before other systems that are dependent on those virtual machines, there is no means by which VMware HA can control this start order more affectively or handle alternate workflows or run books that handle different scenarios for failure.
  • Single vCenter server
    • Failure of the site where vCenter resides disrupts management of both sites. Look out for development on this shortcoming in vSphere 6.5

 

SRM 6.1 with stretched storage:

Site Recovery Manager 6.1 adds support for stretched storage solutions over a metro distance from several major storage partners, and integration with cross-vCenter vMotion when using these solutions as replication. This allows companies to achieve application mobility without incurring in downtime, while taking advantage of all the benefits that Site Recovery Manager delivers, including centralized recovery plans, non-disruptive testing and automated orchestration.

Adding stretched storage to a Site Recovery Manager deployment fundamentally reduces recovery times.

  • In the case of a disaster, recovery is much faster due to the nature of the stretched storage architecture that enables synchronous data writes and reads on both sites.
  • In the case of a planned migration, such as in the need for disaster avoidance, data center consolidation and site maintenance using stretched storage enables zero-downtime application mobility. When using stretched storage, Site Recovery Manager can orchestrate cross-vCenter vMotion operations at scale, using recovery plans. This is what enables application mobility, without incurring in any downtime

SRM requirements:

  • Storage policy protection groups in enhanced linked mode

  • External PSCs for enhanced linked mode requirement
  • Supported compatible storage arrays and SRAs

  • vCenter server each site
  • Windows server each site for SRM application and SRA.

SRM positives:

  • Provides orchestrated and complex reactive recovery solution
    • For instance a 3 tiered application which has dependancies on specific services/servers to power on first.
  • Provides consistent, repeatable and testable RTOs
  • DR compliance shown through audit trails and repeatable processes.
  • Disaster Avoidance (Planned)
    • Manually Initiate  with SRM
    • Uses vMotion across vCenters for VMs
  • Disaster Recovery (Unplanned)
    • Manually Initiate Recovery Plan Orchestration
    • SRM Management Resiliency
  • VMware SRM 6.1 + VPLEX Metro 5.5
    • Stretched Storage with new VPLEX SRA
    • Separate failure domains, different vSphere Clusters

SRM negatives:

  • No Continuous Availability
  • No HA, DRS or FT across sites
  • No SRM “Test” Recovery plan due to stretched storage
    • Have to make use of planned migration to “test” but just be aware that your VMs associated to protection group will migrate live to second site.

 

Questions to ask:

At the end I really think it all comes down to a couple questions you can ask to make the decision easier.  SRM has narrows the gap on some of the features that VMSC provides so these questions are based on the remaining differences between each solution.

  1. Do you have complex tiered applications with dependancies on other applications like for instance databases?
  2. Do you have a highly customized environment which incurs rapid changes?
  3. Do you require DR compliance with audit trails and repeatable processes?

Pick SRM!

  1. Do you require a “hands off” fast automated failover?
  2. Do you have non-complex applications without any dependancies and do not care on how these power on during failover?
  3. Do you want to have your workloads automatically balanced across different sites?

Pick vMSC!

 

Links:

http://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/techpaper/vmware-vsphere-metro-storage-cluster-recommended-practices-white-paper.pdf

https://kb.vmware.com/kb/2007545

http://pubs.vmware.com/Release_Notes/en/srm/61/srm-releasenotes-6-1-0.html

EMC UnityVSA with SRM configuration

I am not going to get into the details of setting up SRM and ECM Unity this is very well documented so the information I will provide is after SRM is installed and configured on vCenter and EMC Unity is installed and configured.

Previous blog post shows UnityVSA setup:
https://virtualrealization.blogspot.com/2016/05/how-to-emc-unityvsa-installation-and.html

EMC UnityVSA:

I already have my pools and LUN’s configured on both Unity virtual storage appliances.
Firstly we want to setup an interface for replication on both Unity VSA’s.
In Unisphere select Data protection -> Replication
Select Interfaces
Click + sign

Select Ethernet Port and provide IP address information.

click OK

Now lets configure the remote connections between Unity arrays.
In Unisphere select Data protection -> Replication
Select Connections
Click + sign

Enter Replication connection information for your remote Unity VSA.
Asynchronous is the only supported method for the Unity VSA.

Click OK.
Select the remote system and click “Verify and Update” to make sure everything is working correctly.

Now lets go ahead and setup the Consistency groups.
In Unisphere select Storage -> Block
Select Consistency Groups
Click + sign

Provide name

Configure your LUN’s.  You have to create a minimum on 1 LUN but you can later add your existing LUN’s to this consistency group if that is required.

Click + to Configure access

Add initiators

Create Snapshot schedule

Specify replication mode and RPO

Specify destination

Click Finish

Now that we have replication configured we can go to vCenter and configure SRM.

SRM:
I already have my EMC Unity Block SRA installed on my SRM server. My mappings is also configured within each site so we will skip this.

Open vCenter server and select Site recovery.
Select each site -> Monitor -> SRA’s
Select rescan all SRA’s
Verify that EMC Unity Block SRA is available.

Let’s configure Array Base Replication.
Select Site recovery
Select Inventories -> Double click Array Base Replication
Select “Add array manager”
On popup wizard select “Add a pair or array managers”

Select location

Select Storage replication adapter, EMC Unity Block SRA

Configure Array manager

Configure array manager pair for secondary site.

Enable the pairs

Click Finish

Verify Status is OK

Click on each storage array and verify no errors and that you can see the local devices being replicated.

Now we can setup the protection group
Select Site recovery
Select Inventories -> Protection Groups
Select “Create Protection group”
Enter name

Select protection group direction and type. For this we will select array base replication with datastore groups.

Select datastore groups

This will provide information on the VM’s which will be protected.

Click Finish
Verify protection status is OK

Finally you can configured your Replication plan:
Select Site recovery
Select Inventories -> Recovery Plans
Select “Create Recovery plan”
Enter name


Select recovery site
Select protection group

Select network to be used for running tests of the plan.
Click Finish

You can now test your recovery plan.

EMC UnityVSA installation and configuration

I am currently testing SRM and installed Nimble as my virtual storage array with Nimble SRA 3.0 but having to many problems with getting the array pairs working correct so decided to setup UnityVSA community addition which is available for free with up to 4TB of data.  At the bottom of the page I provided some useful links:

Installation:
first off lets review requirements:

  • vCenter 5.5 update and later.
  • ESXi 5.x and later
  • 12GB Memory
  • 2 vCPU
Deploy the OVA downloaded “UnityVSA-4.0.0.7329527.ova”.
I am not going to provide the steps to deploy and OVA since this is pretty straight forward and nothing really to configure except for management and data ports and management IP address.
After deployment is completed and VM powered on, open a browser and point to IP address specified during OVF deployment.
You will be presented with a login screen.
Type admin / Password123#
Wizard will appear for initial configuration.
Specify password to replace the default.
You need to request the license file by providing the System UUID to the following website: 
Download the license file and install it.
Enter DNS information
Enter NTP information
Pools can be configured here but you do you require a manually created VM disk.  If you have not added the new disk within vCenter for the VM then I would recommend just skipping this step for now.
Enter SMTP information
Create iSCSI network interface.  
This can also be performed later but I created this on the data network ports i specified during the OVF deployment.
Creating NAS server but this can be done at later time.
Initial setup is now completed, yay!

Setup Pool:
Next step is to setup your storage to used by the UnityVSA.  
This is very easily accomplished through vCenter server. 
Edit settings on the UnityVSA VM
Select new device “New Hard disk” and click Add
Create hard disk with following recommended settings:
  • SCSI controller 0 which is VMware Paravirtual
  • Thick provision eager zero
  • Max size of 4TB
  • Min size of 10GB
  • Connect up to maximum of 12 disks for user data
Wait 60 seconds for UnityVSA to recognize the new storage.
Now we can setup our Pool
Select Storage -> Pools
Select + to create new pool
Select the newly create disk, and make sure to select the storage tier.  After select the Storage tier, press either enter or anywhere else on screen to make the Next button available.
check box for storage tier
Select virtual disks
Create Capability profile.  This is a set of storage capabilities for VVol datastore.  The capabilities are derived from the underlying pools so best practices is to configure it during pool creation.  Capabilities needs to be created before you can create a VVol datastore.
Specify Tag.  Usage tags can be applied to capability profiles to designate them and their associated VVol datastores for a particular use
Setup initiators:
Select Access -> VMware
You have some options here to either directly connect and configure the ESXi hosts or connect directly to vCenter server and select which ever ESXi hosts within the environment you want to setup initiator access for.  I selected the latter since easier to connect to vCenter.
Enter vCenter information
Select the ESXi hosts.
Click Finish
To verify the added ESXi hosts you can select Access -> Initiators.  Here you can review your which will import both FC and iSCSI protocols if configured on hosts.
Setup LUN:
Enter name
Select the Pool previous created and size of LUN
Click + to Select initiators for access
Create snapshot schedule  (this is a very welcome addition since was lacking in VNX)
Setup replication.  I will be adding another blog shortly to setup replication between two UnityVSA and using VMware SRM.
Finish

Hopefully get some time here shortly to work on setting up SRM with Unity so stay tuned.

Links:

SRM 5.8: Synchronize storage freezes at 90%

SRM 5.8 with storage array replication VNX mirrorview.

Scenario:
Run a recovery and once completed run reprotect.
During the reprotect the storage synchronization gets stuck at 90%.

No real information from SRM on the status or errors so had to do some digging.  

Solution:
On the storage array reviewed the replicated LUN for the specific recovery plan and found that the the secondary image was showing “waiting for administrator to start synchronization”.

By default SRM queries an ongoing synchronization every 30 seconds to report status so after selecting synchronization and its completion did the SRM status also update and completed.

This setting is adjustable in the SRM advanced settings per site:  storage.querySyncStatusPollingInterval.

VNX MnR: Not showing SAN data after upgrade

Recently ran the upgrade of VNX monitoring and reporting from version 1.2 to 2.0.

Upgrade completed successfully but after logging in and viewing the data we were unable to view the file storage information.

Resolution:

  • Verify NaviSECCLI path is correct in VNX MnR Config 

  • If the above is correctly configured, attempt updating NaviSecCLI to latest Version (Found on EMC Support Site under downloads if you search for NAVISECCLI)
  • Once Installed issue any command to accept the certificate from VNX This is only required if NAVICLI version is updated 
    • Open Cmd Prompt and issue any NAVI CLI Command to be prompted to accept certificate