VMware Cloud on AWS: New features

VMware delivered its first ever Cloud Briefing today. It was kind of weird that the event was pre-recorded and not live since their marketing emails made it sound like it would be live.  Nothing to fuss about but just worth mentioning.

This is already the 4th update to VMC on AWS within the last 6 months, so what is new?

  • New UK based service offering in London.
  • Introduction to upcoming Germany based service offering in Frankfurt.
  • Preview support for stretched clusters across AWS Availability Zones within AWS regions.
    • Synchronous replication
    • Zero RPO high availability across AZs
    • During failure vSphere, HA will restart the VMs on surviving AZ
  • Multi-Cluster support
    • Add additional clusters to your SDDC
    • Maximum of 10/SDDC
  • vSAN Compression and deduplication
  • RESTful API now available.
  • vROPS support for VMC on AWS
    • Predictive DRS not yet supported
    • Service delivery management not yet supported
  • VMware Cloud services updates:
    • HCX Enterprise
      • On-premise offering and supports for any-to-any vSphere migration.
    • HCX for Clouds
      • Migrated workload between different Clouds.
    • Cost Insight
      • Now supports VMC on AWS
      • Calculates VMC on AWS capacity required to migrate from on-premises to VMC
      • Integration with Network insight to calculate networking costs during migration as well.
    • Wavefront
      • Now supports VMC on AWS
      • Monitoring and analytics to optimize cloud-native and enterprise applications
      • Wavefront for vRealize Operations
        • Complete visibility into your enterprise applications as well custom apps.
    • Log intelligence
      • Think of it as vRealize Log insight delivered as a SaaS offering
      • IT troubleshooting and log management across multiple clouds including VMC on AWS

 

Links:

https://docs.vmware.com/en/VMware-Cloud-on-AWS/0/rn/vmc-on-aws-relnotes.html

https://cloud.vmware.com/cloudbriefing

 

VMware Skyline:  Making support easy!

You barely take a sip of your first cup of coffee after just arriving at the office.  Your mail client pops up with an email notification, your phone rings, you look up and see people walking towards your desk.  Augh, you know this is not going to be a good day.

We have all been in this situation, and it’s never fun but also inevitable. We work in IT. Things don’t always go as planned and stuff breaks.  How successful you are at fixing the issues depends how you deal with the situation. Some of us open our favorite tools (vROPS, vRLI, PowerCLI) and go straight into troubleshooting mode, while others pick up the phone and call support. Regardless of how you troubleshoot, it can be time-consuming and at times frustrating.

Well, VMware understands our pain and is trying to limit the frequency of these bad days by introducing automated proactive actions to prevent problems and resource contention before they even occur. Minority Report sci-fi stuff, right? Well not really, but the new features mentioned below are a game changer.

  • ProActive HA provides an additional layer of availability and avoids the need to use vSphere HA, so when your hardware starts showing signs of failure, DRS will kick in and evacuate your running VMs onto another host and place the faulty host into quarantine.
  • Predictive DRS provides placement and balance decisions using DRS and vROPS by predicting future demand and determine when and where hot spots will occur.

With all these new proactive features, we can certainly sleep better at night, but what if you still run into a problem after exhausting your troubleshooting skills and need to call support?

VMware did not stop with just proactive features in their product but took this one step further by introducing proactive support, as well as making support a more user-friendly experience with VMware Skyline.

What exactly is VMware Skyline?

VMware describes it the following way “It is a proactive support technology which uses automation to securely collect data and perform environment-specific analytics on configuration, feature, and performance data.  The resulting information may radically improve visibility into a customer’s environment, enabling richer, more informed interactions between customers and VMware without extensive time and investment by support administrators.”

In layman’s terms, what does this mean to you when you open a case with GSS?

  • Well first off, you might not even need to place a call. Skyline might proactively identify an abnormality from your system logs and contact YOU with the recommended fix and avert an outage. (Go ahead and read that again…customer support will contact YOU. Like I said, game changer!)
  • You do not have to upload any log files, GSS already has this information available. The manual process is very time consuming to perform and wastes precious time to collect, upload, and wait for VMware to analyze.
  • No need to provides information on your VMware products and versions, or changes that were introduced to the environment before the problem occurred, GSS already has this information available. Again, this can otherwise be a very time-consuming exercise.

You see a pattern here?

  • Reduced time-to-resolution for service requests
  • Identification of potential product bugs and guided resolution before problems occurs

How does VMware Skyline work?

The solution includes customer site and VMware cloud components.

Customer site:

A VMware Skyline Collector standalone appliance is required, which will automatically and securely collect usage data and then listen for changes, events, and patterns which are then streamed back to VMware.

VMware cloud component:

The platform receives the data from the onsite collector and performs analyses to as determine the following:

  • Alignment with VMware best practices
  • Comparing deployed products with licensing history
  • Determining if a problem is a known issue that can be addressed with an automatic remediation solution.

VMware Skyline advisor engine then delivers rich insight and recommendations, and uses rules to perform the following checks:

  • Checking of data such as configurations and patch levels
  • Cross-product, cross-cloud checks

VMware Technical Support Engineers

Takes the analyzed data and proactively reviews your environment, performs research analysis for service requests, and provides prescriptive recommendations back to you to improve your environment’s health and performance.

VMware will also provide regularly scheduled reports, which include information on observations and insight from ongoing analytics as well as provide prioritized recommendations.

Which products does VMware Skyline support?

With future plans to support all core products, VMware Skyline is initially available for Premier Support customers in North America.

  • VMware vSphere 5.5 and up
  • VMware NSX 6.1 and up
  • vSAN 6.6

Spectre and Meltdown – How to check your VMware environment for vulnerabilities

Updates added to the blog

Unless you have been on a very long vacation without internet access (The BEST type of vacation!) you should know of the Spectre (CVE-2017-5753 & CVE-2017-5715) and Meltdown (CVE-2017-5754) vulnerabilities that affect nearly every computer chip manufactured in the last 20 years.

I am not going to provide any specific details on these vulnerabilities since there are more than enough material already available, which you can read here:

I do however want to provide more detailed information related to VMware specifically, as well as different ways on how you can verify what in your VMware environment is vulnerable to these exploits:

VMware responded to the Speculative Execution security issues with KB 52245, which I highly recommend you read and subscribe to.

Intel and AMD released microcode updates that provide hardware support for branch target injection mitigation, for which VMware released KB 52085. The KB provides instructions on how to enable Hypervisor-Assisted Guest Mitigation, which is required in order to use the new hardware feature within VMs.  The KB also provides manual verification instructions for the following:

  • ESXi – Verify that the microcode included in ESXi patch has been applied
  • VM – Verify that the VM is seeing the new microcode ( VM needs to on HWv9 or newer)

ALERT: VMware also released KB 52345, which rollback the recently issued security patch recommendation (ESXi650-201801402-BG, ESXi600-201801402-BG, and ESXi550-201801401-BG). The rollback is due to customers complaining of unexpected reboot after applying Intel’s initial microcode patch on Intel Haswell and Broadwell processors.

UPDATE 01.24.18: VMware updated KB 52345 to include updated list of all Intel CPUs affected by Intel Sightings

  • VMware provides some manual workarounds for these specific processors that have already been patched.
  • For ESXi hosts that have not yet applied one of the patches, VMware recommends not doing so at this time and using the patches listed in VMSA-2018-0002 instead.

That is a lot of information to take in, and the rollbacks just add complexity to IT teams who are trying to secure their customer’s data.

UPDATE 02.15.18: VMware security advisory for VMware Virtual appliance mitigation available here

UPDATE 03.20.18: VMware provided an update to KB 52085 for patching the vSphere vCenter server to latest 6.5U1g, 6.0U3e, 5.5U3h and Hypervisor to ESXi 6.5: ESXi650-201803401-BG* and ESXi650-201803402-BG**, ESXi 6.0: ESXi600-201803401-BG* and ESXi600-201803402-BG**, ESXi 5.5: ESXi550-201803401-BG* and ESXi550-201803402-BG**.

* = Framework to allow guest OSes to utilize the new speculative-execution control mechanisms

** = Applies the microcode updates

 

Option 1: (The best of the best)

However, to make things a bit easier we have William Lam to the rescue who wrote an excellent script that automates the verification for both the ESXi and Virtual Machines. as well as provide ESXi microcode versions.

The PowerCLI script is called VerifyESXiMicrocodePatch.ps1 and performs the following validations

  • Verify that VM’s are running at least HWv9
  • Verify that VM completed a power cycle to see the new CPU features
  • Verify ESXi microcode has been applied
  • Verify that one of the three new CPU features are exposed to the ESXi host.
  • Verify if CPU is affected by Intel Sighting
  • Show the current Microcode version for each ESXi (requires SSH to be enabled)
  • UPDATE 01.24.18: Script was updated to validated the affected CPUs

All the detail regarding the script can be read on virtuallyGhetto here.

Option 2:  (Acceptable, but limited)

Although not nearly as thorough as William’s Script, with RuneCast Analyzer latest 1.6.7 you can detect ESXi hosts that are not protected and patched against these vulnerabilities.

Runecast Analyzer enables you to scan and detect the CPU chip vulnerabilities on your VMware infrastructure.  It detects which ESXi hosts are not protected and advise on how to patch them against such security vulnerabilities.  This solution is continuously updated as new guidance from VMware is released.

Currently only supports VMSA-2018-0002.2

Update 01.26.18: New 1.6.8 release updated to support VMSA-2018-0002.3

Screen Shot 2018-01-18 at 6.34.09 PM.png

Update 01.21.18: Option 3: (Coolest of them all)

This option does not only show what in your VMware environment is impacted but it will also assess the performance impact of both Spectre and Meltdown patches using vRealize Operations Manager (vROPS). We already know the patches will impact the speculative execution capabilities of the processor, which will lead to higher CPU utilization in your cluster due to each OS slower processing times.

The questions that come up then before patching:

  • Will I have enough resources available in my cluster to support these patches?
  • How will my ESXi host resources be impacted?
  • Should I roll out the patches in stages or all at once?

These are hard questions that are not easy to answer, or is it?

If you are using vROPS 6.6.x Advanced or Enterprise, which allows the creation of custom dashboards, then you can download and install the Spectre Meltdown Specific Dashboard kit created by Sunny Dua.  The download is available here.

The Dashboard kit consists of 3 Dashboards:

Screen Shot 2018-01-24 at 10.59.56 AM.png

  • Performance monitoring dashboard
    • Track resources utilization of your environment and will provide valuable information on the impact of patching as it relates to your Clusters, ESXi hosts  VMs.
    • Screen Shot 2018-01-24 at 11.59.56 AM.png
  • VM Patching dashboard
    • Provides views showing which VMs are running idle and can potentially be patched first since it should not have a large overall impact on performance.  Evaluate the resource utilization with the performance monitoring dashboard after the idle VM’s are upgraded, and then make a decision to continue patching or first add additional resources to the cluster.
    • Screen Shot 2018-01-24 at 11.11.41 AM.png
  • vSphere Patching dashboard
    • Shows the ESXi hosts that have been patched and also affected by Intel Sighting.
    • Shows the ESXi hosts that still needs to be patched.
    • Show the Virtual Machines that required Hardware versions upgrade since the recommended version is 9 or higher.
    • I recommend keeping an eye on VMware’s advisory site since this problem is still ongoing and the build numbers will change as new patches are released.  This will then required that you make a manual update in the filters of this dashboard
    • Screen Shot 2018-01-24 at 11.51.08 AM.png

The Performance monitoring dashboard can also be accomplished by just using the default dashboards available in vROPS standard, which means you can download the evaluation version and have that piece of mind that you can track the performance impact while going through these tough times.

Links:

https://communities.vmware.com/message/2738226#2738226

https://blogs.vmware.com/management/2018/01/assess-performance-impact-spectre-meltdown-patches-using-vrealize-operations-manager.html

https://kb.vmware.com/s/article/2143832?r=2&Quarterback.validateRoute=1&KM_Utility.getArticleData=1&KM_Utility.getGUser=1&KM_Utility.getArticleLanguage=1&KM_Utility.getArticle=1

Free VMware related eBooks available!

Lately, we have seen a lot of excellent guys releasing their valuable books for free! The time and effort they put into these must be crazy, and we thank them for it.  I highly recommend you pick them up and give them a read!

vSphere HA Deep Dive 6.x:

This book was made available some time ago but still relevant, and all the thanks must go to Duncan Epping. You can download it here

vSAN Essentials:

Lots of thanks and praises must go to Cormac Hogan and Duncan Epping for making their Essential Virtual SAN (vSAN) book available for free. You can download it here

vSphere 6.5 Host Resources Deep Dive:

Lots of thanks must go to Frank Denneman and Niels Hagoort for writing this book which was a big hit at VMworld 2017. The book has been made available for free by Rubrik and VMUG, which you can download it here

NSX:

If you ever want to learn more about use cases for NSX then these PDF documents are a must read.

VMware NSX Micro-segmentation Day 1 by Wade Holmes

VMware NSX Micro-segmentation Day 2 by Geoff Wilmington

Operationalizing VMware NSX by Kevin Lees

Automating NSX for vSphere with PowerNSX by Anthony Burke

VCP6.5-DCV:

Vladan Seget wrote a study guide for VCP6.5-DCV which Nakivo sponsored and made available here

VMware on AWS:

AWS  made available an E-Book for VMware Cloud on AWS which covers the challenges of Hybrid Cloud adoption, how VMC on AWS is optimized for common uses cases and also the available AWS services. Available to download here

vSphere Upgrade

Emad Younis has made available his Upgrading to VMware vSphere 6.5 book which you can download here.

VMware Cloud on AWS

At the recent AWS re:invent conference in Las Vegas, VMware announced a bunch of new features for VMC on AWS.   Here is a complete list of the new features with some already being available and others in preview, which means they might not apply to all customers or AWS regions:

  • VMware site recovery service
    • This new service provides a great use case for an end-to-end DR solution, which simplifies DR operations, faster time-to-protect and removes the requirements for a second data center.
    • Build on top of VMware Site Recovery Manager with vSphere Replication, the service protects workloads between an on-premises data center and VMC on AWS, as well as between different instances of VMC on AWS.
  • 1 and 3-year subscriptions
    • Provide significant cost savings
    • Additional cost savings available base on the number of eligible on-premise product license you own (Hybrid loyalty program)
  • VMware Hybrid Cloud Extension (Preview)
    • In short, this is an add-on SaaS offering which will provide large-scale migration between your on-premise environment running vSphere 5.0+ and VMC on AWS.
    • Provides built-in high-performance layer 2 extensions so you will be able to keep the same networks, IP addresses, and routing policies in place during migration.
  • Layer 2 VPN (Preview)
    • Extending Layer 2 networks from an on-premise data center to VMC on AWS, which allows you to migrate VMs to your cloud SDDC without having to change their IP addresses.
    • Only one Layer 2 VPN is supported per cloud SDDC
    • Hybrid Linked Mode is optional for configuring Layer 2 VPN but is required for cold migration and migration with vMotion between your on-premises data center and cloud SDDC.
    • In your on-premise data center, you can use NSX or configure a Standalone Edge.
  • L3 VPN Generic Download (Preview)
    • This will reduce configuration issues with IPsec deployments since you will be able to download a generic configuration after VPN is configured, which provides all the parameters that need to be set on remote VPN device.
  • AWS Direct Connect
    • High speed, reliable and private network connectivity to an on-premise data center.
    • Single or Multiple DX links option available.
    • While connecting to an SDDC, customers can choose a Private VIF, Public VIF, or both VIF options.
      • Private VIF – carry vMotion and ESXi management traffic
      • Public VIF – optional, and used to establish VPN tunnel and carry management appliance and workload VM traffic.
    • VMC on AWS scale
      • Supports 32 host clusters
      • Multiple SDDC per organization
      • 10 Clusters per SDDC (future)
    • VMC on AWS regions
      • New region US East (N. Virginia)
    • Support for Wavefront by VMware
      • Collects data from application metrics collectors (Java, Ruby, Python, and more) as well as service metrics collectors (MySQL, Pivotal, Kubernetes, AWS, and more)
      • Allows customer to visualize and troubleshooting applications as well as receive alerts.
    • Scripting support
      • API
        • You can use NSX APIs and Power CLI for the Day0 and Day2 automation activities.
      • PowerCLI (preview)
        • A new module has been added since PowerCLI 6.5.4, a which enables the automation and scripting of VMware Cloud on AWS features
      • AWS SDKs (preview)
        • Existing vSphere Automation SDKs for both Python and Java will include functionality for access to VMC on AWS
      • Datacenter CLI (preview)
      • VMC on AWS API is available via a multi-platform simple command line interface
    • AWS service access enhancements
      • You have the choice to access S3 buckets over the internet or over the AWS Connected VPC.
    • VM template support in MVP
      • You can now add VM templates to Content Library, as well as delete and deploy them
    • Live migrations!! (This is a biggy, but still in preview)
      • Live vSphere vMotion will be supported over L2VPN and Direct Connect
      • Need to setup Hybrid Linked Mode (HLM) and L2VPN for this to work
    • vCenter HLM
      • Hybrid link mode sounds similar to enhanced linked mode but differs in requirements, how they work, and what problem each solves. William wrote a great blog describing the differences.
      • Supports vCenter Servers with an embedded or external PSC.
      • Support a single on-premise vCenter Server or multiple on-premise vCenter Servers that are joined to the same SSO domain.
    • External Storage access from inside Guest VM
      • NFS, SMB and iSCSI storage protocols are validated over following networks:
        • AWS Elastic Network Interface (ENI)
        • VMware Cloud on AWS Compute Gateway (CGW)
        • VMware Cloud on AWS Internet Gateway (IGW)

vRealize Network Insight (vRNI) 3.5 upgrade process that works

It is have been almost a year ago since my initial post on upgrading vRealize Network insight to 3.2 and since then there has been couple of new versions released. So time for me to upgrade!

The bad part I found out about the upgrade process is that you have to upgrade each version consecutively meaning I had to upgrade my 3.2 environment to 3.3 (which i am currently on right now) and then next step is to upgrade to 3.4 and following that another upgrade to 3.5.  You cannot skip version upgrades all!  Anyways, not going to comment on that but you see where this can be very time consuming so plan accordingly.

As before there are still two upgrade options available with online, which is handled through the GUI and offline, which is handled through the CLI.  I am currently running 3.3 and in the GUI under Settings -> Install it states that my Application is up to date. I did verify through CLI command “show-connectivity-status”  that my upgrade connectivity status shows passed and I also have no proxy.  Not wanting to open a support ticket I am going to go the manual route, and oh yes if you have a cluster configured, your only option is manual upgrade as well. Sorry!

Firstly we must upgrade the vRNI Platform appliances before we upgrade the Proxy appliances. If you have cluster then you have to start with platform1.  VMware’s KB on the manual upgrade process to 3.5 does not do such a good job of showing the exact steps to upgrade so here are mine:

  1. Download the upgrade bundle
  2. Extract the bundle from the downloaded zip file.
  3. Snapshot your vRNI Platform and proxy appliances before upgrade. (always have a backup)
  4. Login to Platform CLI with consoleuser
  5. Change password for the support user
    1. (cli) modify-password support
    2. Enter the password
  6. Use a popular tool like WinSCP to copy the bundle file to the all vRNI appliances
    1. Login with the support user
    2. Copy the bundle file in directory /home/support/
  7. Now we need to use the package-installer command to copy the bundle file to the vRNI VM
    1. package-installer copy –host localhost –user support –path /home/support/VMWare-vRNI.3.4.0.1495004044.upgrade.bundle
    2. Enter password
    3. Verify copied completed
    4. Remember one version at a time so first off have to upgrade from 3.3 to 3.4.
  8. Stop the service
    1. (cli) services stop
  9. Run the upgrade
    1. (cli) package-installer upgrade (3.3 -> 3.4)
    2. (cli) package-installer upgrade –name VMware-vRealize-Network-Insight.3.5.0.1502978926.upgrade.bundle (3.4 -> 3.5)
    3. This could take up to 30 minutes to complete so go have a cup of tea or coffee.
    4. Verify upgrade completed by checking the version
      • (cli) show-version
    5. If the service does not start..
      • (cli) services start
  10. Run step 4 through 9 on all appliances
    1. vRNI Platform appliances first
    2. vRNI Proxy appliances last

After the upgrade from 3.3 to 3.4, the upgrade KB states that a reboot is not necessary, but I found that if you do not perform a reboot you are not able to run the upgrade command “package-installer upgrade –name VMware-vRealize-Network-Insight.3.5.0.1502978926.upgrade.bundle”.  The –name parameter is not recognizable.

Note:

Do not copy/paste the commands in the KB since the filename is different that what you actually download “VMWare” and this make your upgrade fail.

Links:

 

 

 

 

Upgrade your vCenter 6.5 HA environment

As discussed in my previous post here, you can easily setup vCenter HA to provide a decent (not the best and hopefully this will improve) RTO of around 4 minutes for a fail over of your vCenter server.

So now that you have vCenter HA configure, how do you patch or upgrade this environment.  In a single vCenter Server instance the upgrade is really straight forward.

  • Login to the VAMI
  • Before starting the upgrade, take a File based Backup of the vCSA, using the backup utility in the VAMI.
  • Select Update
  • Select Check Updates -> Check Repository (if you have internet access)
    • Otherwise download the software and mount the ISO to the CD/DVD drive.
  • View Available Updates
  • Screen Shot 2017-10-10 at 10.30.25 AM.png
  • Select Install Updates -> Install All Updates

 

For a vCenter HA the steps are bit more complicate since we will use the software-packages utility from the appliance shell which requires us to SSH into the three nodes in a sequence and use manual failover so that we always patch the non-active node.  Below are my quick step by step notes for the upgrade process:

  • There are multiple ways to use the software-package utility:
    • Use the default repository
    • Use a local repository by attaching the ISO to the vCenter Server appliance.
    • Use a remote repository by using a custom repository URL that points to a local webserver in your environment to retrieve the file.
  • In my case I downloaded the vCenter Server Appliance patch ISO from “https://my.vmware.com/group/vmware/patch” and attached the file to the CD/DVD drive of the vCSA.
  • Before I start the upgrade I perform the following tasks:
    • Put the vCenter HA cluster in maintenance mode
    • Make sure SSH is enabled in the vCSA VAMI
    • For each node, I open the console and mount the patch ISO to the CD/DVD drive.
    • Take a File based Backup of the vCSA, using the backup utility in the VAMI.
  • Run the upgrade first on the Witness Node
    • First off SSH into the active vCSA node
      • From the active vCSA node, SSH into the witness node and make sure you are in the appliance shell by running:
        • “appliancesh”
        • Run: “software-packages install –iso”
        • Press Enter way to many times
          • Type yes and press Enter
        • When upgrade is completed, reboot the server
          • “shutdown reboot -r patching”
        • Exit the SSH session
  • Now run the upgrade on the Passive Node
    • First off SSH into the active vCSA node
      • From the active vCSA node, SSH into the passive node and make sure you are in the appliance shell by running:
        • “appliancesh”
        • Run: “software-packages install –iso”
        • Press Enter way to many times
          • Type yes and press Enter
        • When upgrade is completed, reboot the server
          • “shutdown reboot -r patching”
        • Exit the SSH session
  • Log out of the active vCSA node
  • Wait for the nodes to shows status up after reboot.
  • Initiate a vCenter HA failover manually
    • Login to Web client
    • Select the vCenter server -> Configure -> Settings -> vCenter HA
    • Click Initiate failover
    • Click Yes to start the failover
      • Make sure to select performing synchronization first
  • Now lastly run the upgrade on the new Passive Node
    • First off SSH into the new active vCSA node
      • From the active vCSA node, SSH into the passive node and make sure you are in the appliance shell by running:
        • “appliancesh”
        • Run: “software-packages install –iso”
        • Press Enter way to many times
          • Type yes and press Enter
        • When upgrade is completed, reboot the server
          • “shutdown reboot -r patching”
        • Exit the SSH session
  • Optional: Perform another vCenter HA failover manually back to the original vCSA node.
  • Exit vCenter HA maintenance mode
    • Login to Web client
    • Select the vCenter server -> Configure -> Settings -> vCenter HA
    • Click Edit
    • Select “Enable vCenter HA”
    • click OK

Patching of all the vCenter HA nodes should now be completed.

Food for though: This process is quite involved and I wonder, depending on company policy, would it not be easier to just remove vCenter HA, upgrade the single vCSA node through the VAMI and then configure vCenter HA again? It takes way less time and much simpler process. Let me know what you think.