Blog post from VMware on vSAN performance with Optane.
Blog post from VMware on vSAN performance with Optane.
I recently had a discussion regarding the setup of a secondary datacenter to provide business continuity to an existing infrastructure using stretched storage. This in itself is an interesting topic with many architectural outcomes which I will not get into on this post but lets for the sake of argument say we decide to create active active data centers with EMC VPLEX Metro.
The other side of the coin is how do you manage these active active data centers by balancing management overhead and resiliency?
vSphere Metro Storage Cluster (sMCS) and SRM 6.1, which supports stretched storage, are the two solutions I am going to review. There are already a whole bunch of articles out there on this but some of them mainly focused on pre 6.1. These are just my notes and views and if you have a different viewpoint or it needs some tweaking please let me know, I cherish all feedback.
Why use stretched clusters:
a VMSC infrastructure is a stretched cluster that enables continues availability across sites, including support for:
Site Recovery Manager 6.1 adds support for stretched storage solutions over a metro distance from several major storage partners, and integration with cross-vCenter vMotion when using these solutions as replication. This allows companies to achieve application mobility without incurring in downtime, while taking advantage of all the benefits that Site Recovery Manager delivers, including centralized recovery plans, non-disruptive testing and automated orchestration.
Adding stretched storage to a Site Recovery Manager deployment fundamentally reduces recovery times.
Storage policy protection groups in enhanced linked mode
Supported compatible storage arrays and SRAs
Questions to ask:
At the end I really think it all comes down to a couple questions you can ask to make the decision easier. SRM has narrows the gap on some of the features that VMSC provides so these questions are based on the remaining differences between each solution.
I upgraded the ESXi hosts from 6.0 GA to 6.0U2 and selected upgrade for VSAN On-disk format Version, however this failed with following error message:
I reviews the VSAN health log files at following location:
I was aware of this issue due to previous blog posts on same problem and new of KB 2144881 which made the task of cleaning objects with missing descriptor files much easier.
I ran the script: python VsanRealign.py -l /tmp/vsanrealign.log precheck.
I however received another alert and the python script did not behave as it should with it indicating a swap file had either multiple reverences or was not found.
I used RVC again to try and purge any inaccessible swap files:
no objects was found.
I then proceeded to review the vmx files for the problem VM in question and found reference to only the original *.vswp file and not with additional extension of *.vswp.41796
Every VM on VSAN has 3 swap files:
I figured this servername*.vswp.41796 is just a leftover file and bear no reference to the VM and this is what is causing the on-disk upgrade to fail.
I proceeded to move the file to my /tmp directory (Please be very careful with delete/moving any files within a VM folder, this is done at your own risk and if you are not sure I highly recommend you contact VMware support for assistance)
I ran the python realign script again. This time I received a prompt to perform the autofix actions to remove this same object in question for which i selected yes.
Even though VMware provides a great python script that will in most instance help you clean up the VSAN disk groups, there are times when this will not work as planned and then you just have to a bit more troubleshooting and perhaps a phone call to GSS.
I ran into an issue at customer where the SSD which is to be used as the cache disk on the VSAN disk group was showing up as regular HDD. However when I reviewed the storage device the disk is visible and is marked as flash…weird. So what is going on here.
As I found out this due to a flash device being used with a controller that does not support JBOD.
To fix this I had to create a RAID 0 virtual disk for the SSD. If you have a Dell controller this means you have to set the mode to RAID but make sure that all your regular HDDs to be used in the disk group is set to non-raid! Once host is back online you have to go and mark the SSD drive as flash. This is the little “F” icon in the disk devices.
This environment was configured with all the necessary VSAN prerequisites for Dell in place, you can review this on the following blog post:
Steps to setup RAID-0 on SSD through lifecycle controller:
So had to recently make some changes for customer to set the PERC controller to HBA (non-raid), since previously it was configured with RAID mode and all disks was in RAID 0 virtual disks. Each disk group consists of 5 disks with 1 x SSD and 4 x HDD.
I cannot overstate this but make sure you have all the firmware and drivers up to date which is provided in the HCL.
Here are some prerequisites for moving from RAID to HBA mode: I am not going to get into details for performing these tasks.
I followed these steps:
I had to recently perform a VSAN cluster migration from one vCenter Server to another. This sounds like a daunting task but ended up being very simple and straight forward due to VSAN’s architecture to not have a reliance on vCenter Server for its normal operation(nice on VMware!) As a bonus the VMs does not need to be powered off or loose any connectivity.(bonus!)