Blog post from VMware on vSAN performance with Optane.
EMC, Netapp, VSAN
Blog post from VMware on vSAN performance with Optane.
I recently had a discussion regarding the setup of a secondary datacenter to provide business continuity to an existing infrastructure using stretched storage. This in itself is an interesting topic with many architectural outcomes which I will not get into on this post but lets for the sake of argument say we decide to create active active data centers with EMC VPLEX Metro.
The other side of the coin is how do you manage these active active data centers by balancing management overhead and resiliency?
vSphere Metro Storage Cluster (sMCS) and SRM 6.1, which supports stretched storage, are the two solutions I am going to review. There are already a whole bunch of articles out there on this but some of them mainly focused on pre 6.1. These are just my notes and views and if you have a different viewpoint or it needs some tweaking please let me know, I cherish all feedback.
Why use stretched clusters:
vPlex requirements:
a VMSC infrastructure is a stretched cluster that enables continues availability across sites, including support for:
VMSC requirements:
VMSC positives:
VMSC negatives:
Site Recovery Manager 6.1 adds support for stretched storage solutions over a metro distance from several major storage partners, and integration with cross-vCenter vMotion when using these solutions as replication. This allows companies to achieve application mobility without incurring in downtime, while taking advantage of all the benefits that Site Recovery Manager delivers, including centralized recovery plans, non-disruptive testing and automated orchestration.
Adding stretched storage to a Site Recovery Manager deployment fundamentally reduces recovery times.
SRM requirements:
Storage policy protection groups in enhanced linked mode
Supported compatible storage arrays and SRAs
SRM positives:
SRM negatives:
Questions to ask:
At the end I really think it all comes down to a couple questions you can ask to make the decision easier. SRM has narrows the gap on some of the features that VMSC provides so these questions are based on the remaining differences between each solution.
Pick SRM!
Pick vMSC!
Links:
https://kb.vmware.com/kb/2007545
http://pubs.vmware.com/Release_Notes/en/srm/61/srm-releasenotes-6-1-0.html
I upgraded the ESXi hosts from 6.0 GA to 6.0U2 and selected upgrade for VSAN On-disk format Version, however this failed with following error message:
“Failed to realign following Virtual SAN objects: XXXXX, due to object locked or lack of vmdk descriptor file, which requires manual fix”
I reviews the VSAN health log files at following location:
/storage/log/vmware/vsan-health/vmware-vsan-health-service.log
I was aware of this issue due to previous blog posts on same problem and new of KB 2144881 which made the task of cleaning objects with missing descriptor files much easier.
I ran the script: python VsanRealign.py -l /tmp/vsanrealign.log precheck.
I however received another alert and the python script did not behave as it should with it indicating a swap file had either multiple reverences or was not found.
I then used RVC to review the object info for the UUID in question.
I used RVC again to try and purge any inaccessible swap files:
vsan.purge_inaccessible_vswp_objects ~cluster
no objects was found.
I then proceeded to review the vmx files for the problem VM in question and found reference to only the original *.vswp file and not with additional extension of *.vswp.41796
Every VM on VSAN has 3 swap files:
vmx-servername*.vswp
servername*.vswp
sername*.vswp.lck
I figured this servername*.vswp.41796 is just a leftover file and bear no reference to the VM and this is what is causing the on-disk upgrade to fail.
I proceeded to move the file to my /tmp directory (Please be very careful with delete/moving any files within a VM folder, this is done at your own risk and if you are not sure I highly recommend you contact VMware support for assistance)
I ran the python realign script again. This time I received a prompt to perform the autofix actions to remove this same object in question for which i selected yes.
I ran the on-disk upgrade again and it succeeded.
Even though VMware provides a great python script that will in most instance help you clean up the VSAN disk groups, there are times when this will not work as planned and then you just have to a bit more troubleshooting and perhaps a phone call to GSS.
I ran into an issue at customer where the SSD which is to be used as the cache disk on the VSAN disk group was showing up as regular HDD. However when I reviewed the storage device the disk is visible and is marked as flash…weird. So what is going on here.
As I found out this due to a flash device being used with a controller that does not support JBOD.
To fix this I had to create a RAID 0 virtual disk for the SSD. If you have a Dell controller this means you have to set the mode to RAID but make sure that all your regular HDDs to be used in the disk group is set to non-raid! Once host is back online you have to go and mark the SSD drive as flash. This is the little “F” icon in the disk devices.
This environment was configured with all the necessary VSAN prerequisites for Dell in place, you can review this on the following blog post:
http://virtualrealization.blogspot.com/2016/07/vsan-and-dell-poweredge-servers.html
Steps to setup RAID-0 on SSD through lifecycle controller:
So had to recently make some changes for customer to set the PERC controller to HBA (non-raid), since previously it was configured with RAID mode and all disks was in RAID 0 virtual disks. Each disk group consists of 5 disks with 1 x SSD and 4 x HDD.
I cannot overstate this but make sure you have all the firmware and drivers up to date which is provided in the HCL.
Here are some prerequisites for moving from RAID to HBA mode: I am not going to get into details for performing these tasks.
I followed these steps:
I had to recently perform a VSAN cluster migration from one vCenter Server to another. This sounds like a daunting task but ended up being very simple and straight forward due to VSAN’s architecture to not have a reliance on vCenter Server for its normal operation(nice on VMware!) As a bonus the VMs does not need to be powered off or loose any connectivity.(bonus!)
I have been meaning to write up on a VSAN upgrade on a Dell R730xd’s with PERC H730 which I recently completed at a customer. This is not going to be lengthy discussion on this topic but primarily want to provide some information on tasks I had to perform for upgrade to VSAN 6.2
Ran into some serious trouble and had a resync task that ran for over a week due to a VSAN 6.0 KB 2141386 which appears on heavy utilization storage utilization. Only way to fix this was to put host into maintenance mode with full data migration, destroy and recreate the disk group.
Also ALWAYS check the VMware HCL to make sure your firmware is compatible. I can never say this enough since it is super important.
This particular VSAN 6.0 was running with outdated firmware for both backplane and PERC H730. Also found that controller was set to RAID for disks in stead of non-raid (passthrough or HBA mode).
Links:
VMware as a kick@ass KB on best practices for Dell PERC H730 for VSAN implementation. Link provide below.
https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2109665
https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2144614
https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2141386
This is just my list of prerequisites for a VSAN upgrade. Most of these items are applicable to a new install as well but more focused on a upgrade.
Please feel free to provide feedback and would to add to my list with your experiences.
Prerequisites:
I am not going to get into the details of setting up SRM and ECM Unity this is very well documented so the information I will provide is after SRM is installed and configured on vCenter and EMC Unity is installed and configured.
Previous blog post shows UnityVSA setup:
https://virtualrealization.blogspot.com/2016/05/how-to-emc-unityvsa-installation-and.html
EMC UnityVSA:
I already have my pools and LUN’s configured on both Unity virtual storage appliances.
Firstly we want to setup an interface for replication on both Unity VSA’s.
In Unisphere select Data protection -> Replication
Select Interfaces
Click + sign
Select Ethernet Port and provide IP address information.
click OK
Now lets configure the remote connections between Unity arrays.
In Unisphere select Data protection -> Replication
Select Connections
Click + sign
Enter Replication connection information for your remote Unity VSA.
Asynchronous is the only supported method for the Unity VSA.
Click OK.
Select the remote system and click “Verify and Update” to make sure everything is working correctly.
Now lets go ahead and setup the Consistency groups.
In Unisphere select Storage -> Block
Select Consistency Groups
Click + sign
Provide name
Configure your LUN’s. You have to create a minimum on 1 LUN but you can later add your existing LUN’s to this consistency group if that is required.
Click + to Configure access
Add initiators
Create Snapshot schedule
Specify replication mode and RPO
Specify destination
Click Finish
Now that we have replication configured we can go to vCenter and configure SRM.
SRM:
I already have my EMC Unity Block SRA installed on my SRM server. My mappings is also configured within each site so we will skip this.
Open vCenter server and select Site recovery.
Select each site -> Monitor -> SRA’s
Select rescan all SRA’s
Verify that EMC Unity Block SRA is available.
Let’s configure Array Base Replication.
Select Site recovery
Select Inventories -> Double click Array Base Replication
Select “Add array manager”
On popup wizard select “Add a pair or array managers”
Select location
Select Storage replication adapter, EMC Unity Block SRA
Configure Array manager
Configure array manager pair for secondary site.
Click on each storage array and verify no errors and that you can see the local devices being replicated.
Now we can setup the protection group
Select Site recovery
Select Inventories -> Protection Groups
Select “Create Protection group”
Enter name
Select protection group direction and type. For this we will select array base replication with datastore groups.
Select datastore groups
This will provide information on the VM’s which will be protected.
Click Finish
Verify protection status is OK
Finally you can configured your Replication plan:
Select Site recovery
Select Inventories -> Recovery Plans
Select “Create Recovery plan”
Enter name
Select recovery site
Select protection group
Select network to be used for running tests of the plan.
Click Finish
You can now test your recovery plan.