VSAN – cache disk unavailable when creating disk group on Dell

I ran into an issue at customer where the SSD which is to be used as the cache disk on the VSAN disk group was showing up as regular HDD.  However when I reviewed the storage device the disk is visible and is marked as flash…weird.  So what is going on here.

As I found out this due to a flash device being used with a controller that does not support JBOD.

To fix this I had to create a RAID 0 virtual disk for the SSD.  If you have a Dell controller this means you have to set the mode to RAID but make sure that all your regular HDDs to be used in the disk group is set to non-raid!  Once host is back online you have to go and mark the SSD drive as flash.  This is the little “F” icon in the disk devices.

This environment was configured with all the necessary VSAN prerequisites for Dell in place, you can review this on the following blog post:
http://virtualrealization.blogspot.com/2016/07/vsan-and-dell-poweredge-servers.html

Steps to setup RAID-0 on SSD through lifecycle controller:

  1. Lifecycle Controller
  2. System Setup
  3. Advanced hardware configuration
  4. device settings
  5. Select controller (PERC)
  6. Physical disk management
  7. Select SSD
  8. From drop down select “convert to Raid capable”
  9. Go back to home screen
  10. Select hardware configuration
  11. Configuration wizard
  12. Select RAID configuration
  13. Select controller
  14. Select Disk to convert from HBA to RAID (if required)
  15. Select RAID-0
  16. Select Physical disks (SSD in this case)
  17. Select Disk attribute and name Virtual Disk.
  18. Finish
  19. Reboot
After ESXi host is online again then you have to change the Disk to flash. This is due to RAID abstracting away most of the physical device characteristics and the media type as well.

  • Select ESXi host 
  • Manage -> Storage -> Storage adapters
  • Select vmhba0 from PERC controller
  • Select the SSD disk
  • Click on the “F” icon above.

VSAN – Changing Dell Controller from RAID to HBA mode

So had to recently make some changes for customer to set the PERC controller to HBA (non-raid), since previously it was configured with RAID mode and all disks was in RAID 0 virtual disks.  Each disk group consists of 5 disks with 1 x SSD and 4 x HDD.

I cannot overstate this but make sure you have all the firmware and drivers up to date which is provided in the HCL.

Here are some prerequisites for moving from RAID to HBA mode:  I am not going to get into details for performing these tasks.

  • All virtual disks must be removed or deleted.
  • Hot spare disks must be removed or re-purposed.
  • All foreign configurations must be cleared or removed.
  • All physical disks in a failed state, must be removed.
  • Any local security key associated with SEDs must be deleted.

I followed these steps:

  1. Put host into maintenance mode with full data migration. Have to select full data migration since we will be deleting the disk group.
    1. This process can be monitored in RVC using command vsan.resync_dashboard ~cluster
  2. Delete the VSAN disk group on the host in maintenance.
  3. Use the virtual console on iDRAC and select boot next time into lifecycle controller
  4. Reboot the host
  5. From LifeCycle Controller main menu
  6. System Setup
  7. Advanced hardware configuration
  8. Device Settings
  9. Select controller card
  10. Select Controller management
  11. Scroll down and select Advanced controller management
  12. Set Disk Cache for Non-RAID to Disable
  13. Set Non RAID Disk Mode to Enabled

VSAN upgrade – Dell Poweredge servers

I have been meaning to write up on a VSAN upgrade on a Dell R730xd’s with PERC H730 which I recently completed at a customer.  This is not going to be lengthy discussion on this topic but primarily want to provide some information on tasks I had to perform for upgrade to VSAN 6.2

  1. The VSAN on-disk metadata upgrade is equivalent to doing a SAN array firmware upgrade and therefore requires a good backup and recovery strategy to be in place before you proceed.
  2. Migrate VM’s off of host.
  3. Place host into maintenance mode.
    1. You want to use whatever the quickest method is to update the firmware, for VSAN’s sake. Normally Dell FTP update if network available to configure.
    2. When you put a host into maintenance mode and choose the option to “ensure accessibility”, it doesn’t migrate all the components off but just enough so that the policies will be in violation.  A timer starts when you power it off, and if the host isn’t back in the VSAN cluster after 60 minutes, it begins to rebuild that host’s data elsewhere in the cluster  If you know it will take longer than 60min or where possible select full data migration.
    3. You can view the resync using the RVC command “vsan.resync_dashboard “
  1. Change advanced settings required for PERC H730
    1. https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2144936
    2. esxcfg-advcfg -s 100000 /LSOM/diskIoTimeout
    3. esxcfg-advcfg -s 4 /LSOM/diskIoRetryFactor
  2. Upgrade the lsi_mr3 driver. VUM is easy!
  3. Login to DRAC and perform firmware upgrade:
  4. Upgrade Backplane expander (BP13G+EXP 0:1)
    1. Firmware version 1.09 ->  3.03
  5. Upgrade DRAC H730 version
      1. 25.3.0.0016 ->  25.4.0.0017
  1. Login to lifecycle controller and set/verify BIOS configuration settings for controller
    1. https://elgwhoppo.com/2015/08/27/how-to-configure-perc-h730-raid-cards-for-vmware-vsan/
    2. Disk cache for non-raid = disabled
    3. Bios mode = pause on errors
    4. Controller mode = HBA (non-raid)
  2. After all hosts upgraded, verify VSAN cluster functionality and other prerequisites:
    1. Verify no stranded objects on VSAN datastores by running python script on each host.
    2. Verify persistent log storage for VSAN trace files.
    3. Verify advanced settings still set from task 3!
  3. Place each host into maintenance mode again.
  4. Upgrade ESXi host to 6.0U2.
  5. Upgrade the on-disk format to V3.
    1. This task runs for a very long time and has alot of sub-steps which takes place in the background.  It also migrates the data off of each disk group to recreate as V3 .  This has not impact on the VMs.
    2. This process is repeated for all disk groups.
  6. Verify all disk groups upgrade to V3.
  7. Completed

Ran into some serious trouble and had a resync task that ran for over a week due to a VSAN 6.0 KB 2141386 which appears on  heavy utilization storage utilization.  Only way to fix this was to put host into maintenance mode with full data migration, destroy and recreate the disk group.

Also ALWAYS check the VMware HCL to make sure your firmware is compatible. I can never say this enough since it is super important.

This particular VSAN 6.0 was running with outdated firmware for both backplane and PERC H730. Also found that controller was set to RAID for disks in stead of non-raid (passthrough or HBA mode).

Links:

VMware as a kick@ass KB on best practices for Dell PERC H730 for VSAN implementation. Link  provide below.

https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2109665

https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2144614

https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2144936


https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2141386

Dell rack servers – Upgrade firmware using Dell repository manager

I had to recently perform some firmware upgrades for customer on their Dell R710 and R730xd servers.  As you all know there are multiple ways to successfully upgrade the firmware and I want to touch on the upgrade through bootable virtual cd, optical cd or USB since this was the only method available to me at the time.

Firmware upgrade methods available:
  • Upgrade using bootable linux iso
  • Upgrade using server update utility(SSU) iso/folder with Dell lifecycle controller
  • Upgrade using Dell FTP Site with lifecycle controller
All of these methods have some great information out there on Dell website as well as blogs but I wanted to just go through my steps using bootable linux iso and primarily how to create it that ISO.
My preferred method is using the Dell FTP site with the lifecycle controller but this not always possible especially if you have trunked ports and have to specify a VLAN (in later iDRAC firmware it is now possible to specify a VLAN!)
The reason why the FTP site method is better in my opinion is because the firmware comparison is done upfront and only the necessary firmware is downloaded for component that are outdate. This decrease the firmware upgrade process considerably compared to the bootable iso that compares everything single component.(this only when you use the bundle, which I do in most instance since who wants to go manually through every single component and check which is required for your server:) 
Steps:
Firstly we need to create an ISO and this done using Dell repository manager.
Open dell repository manager (Data center version) , business client version for desktops

View job queues for plugins install and select each to perform confirmation needed and click accept! (only required after first install)
Create new repository

Select name

Select Dell online catalog

Select brand – poweredge rack
Select Linux

Select your Poweredge server

Click Next
Click Finish
Click Close

Check the box for bundle and select Create deployment tools (Other option is to select the components tab and select each individual component manually but this requires that you know exact which components you have installed on all your Dell servers)

Option 1:
Select Create server update utility (SUU) -> SUU to ISO, but remember to use this iso you have to mount this ISO through iDRAC virtual console as virtual CD, boot into lifecycle controller and select firmware upgrade specifying the CD
Option 2:
Select Create bootable ISO

Make your necessary selection and click Next

Select folders

Click Next
Click Ok
Review job queue for progress on file being created

"Boot from SAN" step by step with Windows 2012 R2 and Cisco UCS using Brocade and EMC VNX.

“Boot from SAN” step by step with Windows 2012 R2 and Cisco UCS using Brocade and EMC VNX.
UCS:
  • Create service profile for windows server.
  • Create “Boot from san” boot policy
    • Setup SAN primary and secondary target.
    • The WWN required are that of your VNX array ports.

Brocade:
  • Login to create an initial zone for one of the ports.
  • Create new Alias
    • Type in the Alias name and select the WWN from blade
  • Create zone
    • Select the blade Alias and VNX Storage processor
  • Add to Zone configurations
  • Activate

VNX:
  • Start EMC Unisphere
  • Create Initiator
    • WWN/IQN can be obtained from UCS director
      • Open properties window for service profile of server
      • Select storage tab
      • At top copy the World Wide Node Name  (this is the first part of WWN/IQN)
      • Under vHBAs copy the WWPN
    • Now combine the WWNN and WWPN and with “:” as separator paste into WWN/IQN
  • Select “New Host” radio button
    • Type in the server name and IP address
  • Create LUNs
  • Create Storage Group per server
    • Associate the hosts
    • Associate the LUNs
Server:
  • Start the server and boot from Windows disk
  • Load the UCS disk drivers when asked for during installation and selection of the installation disk.
  • Verify disks show up and select where it will be installed.
  • After installation is completed and Windows is up and running, go ahead and install EMC Powerpath!

Cisco UCS – step by step configuration

As mentioned I don’t go into too much details on my post since i think there are a lot of other great blogs and vendor documentation out there.  Here is my short bullet point task list.   If I am missing anything please let me know.
Set equipment policies:
  • Equipment tab -> equipment – > policies tab
    • Chassis/fex discovery policy
      • Action = 4 ports
      • Link grouping preference = port channel
    • Power policy = grid
Configure server/uplink port:
  • Equipment tab -> select FI-A/B -> expand -> fixed modules
    • Configure the appropriate unconfigured ports as “Server” (connections between IOM and Fabric Interconnect) and “Uplink” (connection to network)
Configure FC storage ports
  • Equipment tab
  • All the way bottom, select FI A
    • Right hand side select Configure unified ports
    • Run slider to end of fiber storage ports you need
    • This will reboot FIA, after reboot re-login.
  • Select FI B
    • Perform same steps
Create Port Channels:
  • Setup ports as uplink ports
  • LAN TAB
    • Fabric – Port Channels
    • Setup port channel ..set same PORT ID on both Fis
  • SAN TAB ( will not be creating port channel due to connection to Brocade)
    • San Cloud -> Fabric A -> Under general tab select “create Port Channel”
Create VSANs: (brocade):
  • SAN > SAN Cloud > Fabric A > VSANs (both Fabric A & B)
    • Create VSAN
    • Select the specific Fabric A or B (not common)!
  • Assign VSAN to FC uplinks
    • Equipment tab -> Fabric interconnect A & B -> Fixed modules -> FC ports
      • Select FC port
      • Under general tab click drop down for VSAN.
        • Select VSAN which is associated to FI.
Upgrade firmware
  • . An “*.A.bin” file and a “*.B.bin” file. The “*.B.bin” file contains all of the firmware for the B-Series blades. The “*.A.bin” file contains all the firmware for the Fabric Interconnects, I/O Modules and UCS Manage
  • Equipment tab -> Eqiupment -> Firmware management
  • Download firmware
  • Update firmware (view progress under Firmware auto install -> general tab, or press Apply to view status in same window))
    • Adapters
    • CIMC
    • IOMs
  • Activate firmware in the following order:  Choose “Ignore Compatibility Check” anywhere applicable.
    • Adapters
    • UCS manager
    • I/O Modules
    •  Choose “skip validation” anywhere applicable. Make sure to uncheck “Set startup version only”, since this is an initial setup and we aren’t concerned with rebooting running hosts
  • Activate subordinate FI and then primary FI
Create sub-organization
This is optional to create specific organizational servers/pools/policies for instance ESXi, SQL, Windows etc
  • Right click and root directory, select Create organization
  • Specify name
Create KVM IP pool:
  • Lab tab -> pools -> root -> IP Pools -> IP Pool ext-mgmt
  • Create block of IPv4 Addresses
    • Specify IP range
Create Server pool
  • Servers tab -> Pools -> Sub-Organization -> -> Server pools
  • Create server pool
Create UUID suffix pool
  • Servers tab -> Pools -> Sub-Organization -> -> UUID Suffix Pool
  • Create UUID suffix pool
  • Create Suffixes
Create MAC pool
  • For each suborganization create 2 groups of MAC pools. 1 for FI-A and 1 for FI-B
  • LAN TAB: -> Pools -> Root -> MAC Pools
    • Create new pool for A
    • Create block
    • Create new pool for B
Create HBA pools:
  • SAN TAB:
    • Pools -> root -> sub-organization -> WWNN Pools
      • Create WWNN pool
        • Add double the amount since each server will have two HBA’s
    • For WWPN we will again create separate pools for FI-A and FI-B:
      • Pools -> root -> sub-organization -> WWPN Pools
        • Create WWNN pool for FI-A
        • Create WWNN pool for FI-B

Create VLANS:
  • LAN TAB -> Lan -> Lan Cloud -> VLANs
    • Create new VLANs
    • Provide name and ID
Create vNICs templates:
  • LAB TAB -> LAN -> Policies -> root -> Sub-organization -> vNIC templates
    • Create vNIC template (this is again done for each FI-A and FI-B
Create VHBAs templates:

  • SAN TAB -> Policies -> root -> sub-organizations -> vHBA templates
    • Create vHBA Templates for both FI-A & FI-B

Create a Service Profile Templates:
Servers tab -> Servers -> Service Profiles -> root -> Sub-organizations
  • Create service profile template
Under networking select expert.
Click Add
Select Use vNIC template
Storage, select Local storage SD card policy
Select WWNN assignment policy
Select Expert connectivity
Create vHBA
Next zoning, leave defaults since we using Brocades
Set PCI ORDER
Select vMedia to use, default
Server boot order, select boot policy create for SDCard

Select Maintenance policy create earlier
Select server assignment
Operational Policies
Set Bios policy
Deploy service profile from template
Servers tab -> Service profile template -> root -> sub-organizations
Right click server profile template and select “create service profiles from template”
Select naming prefix
Configure call home:
Admin tab -> Communication Management -> call home
Turn on and fill in the requirements
In profiles tab add “callhome@cisco.com” to Profile CiscoTAC-1
Under call home policies add the following to provide a good baseline
Configure NTP:
Admin tab -> Time zone management
Add NPT servers
Backup configuration:
Admin tab -> ALL -> Backup configuration on right hand side pane
Select “create backup operations
Admin state = enables

Select location = local file system



For setting policies i created another blog:

Cisco UCS – configure policies

Set Policies:
Network control policies (enable CDP)
  • LAB tab -> Policies -> root -> sub-organizations -> network control policies
    • Create network control policy
    • Enable CDP
Bios Policy:
  • Servers tab -> Policies -> root -> sub-organizations -> Bios Policies
  • Create bios policies
    • Mostly setting cpu settings
Host Firmware:
  • Servers tab -> Policies -> root -> sub-organizations -> Host Firmware Packages
  • Create host firmware package
    • Set simple and only blade package version.
Local disk configuration:
  • Servers tab -> Policies -> root -> sub-organizations -> Local disk config policies
    • Create local disk configuration policy
      • This is to setup SD card
        • Disable protect configuration
        • Enable flexflash state
        • Enable flexflash RAID reporting state
      • For SAN boot
        • Set mode to No local storage
Maintenance policy:
  • Servers tab -> Policies -> root -> sub-organizations -> maintenance policies
    • Create Maintenance Policy
Boot policy:

  • Servers tab -> Policies -> root -> sub-organizations -> boot policies
    • Create boot policy
      • Expand local devices and add to boot order
        • Start with Local CD, then remote virtual drive then SD card