Step by Step upgrade of distributed vRealize Automation 7.2 with external vRO to 7.4

As with most of my other blog posts, I am just providing a step by step guide for quick reference.  Please refer to the documentation here for detailed information and please read the vRealize Automation 7.4 Release Notes known issues section which is updated regularly and helps you to be better prepare for the upgrade.

My environment consists of a distributed vRealize Automation running version 7.2 with an external clustered vRealize Orchestrator, which I am upgrading and not migrating to 7.4 Build 8182598.  This will be a similar process if you have vRA 7.1 and greater.  If you have an older version, refer to VMware’s documentation here.

The in-place upgrade process for the distributed vRA environment happens in 3 stages in the following order:

  1. vRealize Automation appliances
  2. IaaS Web server
  3. vRealize Orchestrator

Pre-requisites before we start:

  1. Make sure all VMware products are compatible with vRA’s current and new release by consulting the Product Interoperability Matrix.
  2. Verify enough storage space on servers
    • At least 5GB on IaaS, SQL and Model Manager
    • At least 5 GB on the root partition of vRA appliance

    • 5 GB on the /storage/db partition for the master vRA appliance

    • 5 GB on the root partition for each replica virtual appliance

  3. Verify that MSDTC is enabled on all vRA and associated SQL servers.
    • Check that the service “Distributed Transaction Coordinator” is running.
  4. The primary IaaS Website node (Model Manager data is installed) must have JAVA SE Runtime Environment 8, 64 bits, update 161 or later installed, and also verify JAVA_HOME environment variable is set correctly after the upgrade.
  5. If using embedded Postgres DB in a distributed vRA environment
    • On master vRA node, navigate to /var/vmware/vpostgres/current/pgdata/
    • Close any opened files in the pgdata directory and remove any files with a .swp suffix
    • Verify the correct ownership of all files in this directories: postgres:users
  6. In a distributed vRA environment, change Postgres synchronous replication to async.
    • Click vRA Settings > Database.
    • Click Async Mode and wait until the action completes.
    • Verify that all nodes in the Sync State column display Async status
    • I have only a master and replica so I am already async but just FYI
  7. In vRA tenants verify the following
    • Make sure that no custom properties have spaces in the names.
    • All saved and in-progress requests have finished successfully

Additional requirements before we start:

  1. If you are using NSX in vRA, run the NSX Network and Security Inventory Data Collection for the Endpoint.
  2. Backup specific config files from both vRA appliances using a tool like WinSCP.
    • /etc/vcac/

    • /etc/vco/

    • /etc/apache2/

    • /etc/rabbitmq/

  3. Backup DataCenterLocations.xml
    • RDP to IaaS Web Server
    • Backup file in folder C:\Program Files (x86)\VMware\vCAC\Server\Website\XmlData
  4. Backup app.config if modified
    • RDP to your IaaS servers
    • C:\users\<profile>\appdata\local\temp\vCAC\Bin
  5. Backup the external workflow configuration (xmldb) files
    • RDP to IaaS Manager server
    • Backup folder \VMware\vCAC\Server\ExternalWorkflows\xmldb\
    • Do not place the XML backup files in the same directory otherwise, the system will pick it up and run duplicates causing the system to time out.
  6. Backup the IaaS Microsoft SQL Server database
  7. Backup and export the vRA Postgres DB
    • service vcac-server stop
    • cd /tmp
    • su -m -c “/opt/vmware/vpostgres/current/bin/pg_dumpall -c -f /tmp/vcac.sql” postgres
    • bzip2 -z /tmp/vcac.sql
    • service vcac-server start
    • Use tool like WinSCP to export the database file
  8. Create Snapshots of all vRA severs
    1. Shutdown servers in the following order
      1. IaaS Windows servers
      2. vRA Appliances
    2. In vCenter Server, take a snapshot of each vRA server
    3. Also, if you have a backup solution available I would recommend taking a full backup of each server.
    4. Power on the servers in the following order
      1. vRA appliances (need quorum to start)
      2. Wait for services to start
      3. Remaining vRA appliance at the same time
      4. Primary Web node (wait until services are up)
      5. Primary Manager server (wait 5 min)
      6. Secondary Web and Manager
      7. DEM Orchestrator and Workers and all vRA proxy agents
      8. Log into vRA VAMI to verify all services are started.
  9. Download the ISO files for the upgrade and attach to the primary vRA appliance.
    • Remember to set the update settings in vRA VAMI to CD-ROM.
  10. Change load balancer timeout setting from the default to at least 10 minutes.
    • If using NSX, select the EdgeGW -> Load Balancer
    • Under Service Monitoring change the timeout to 600 seconds for each.
  11. Disable the load balancer
    • The release notes for vRA 7.4 states the following “No longer necessary to disable load balancer health checks during upgrade”. So I am leaving them just set to 600 seconds and enabled? A bit confusing since the documentation still says to disable both the Health Monitors and secondary nodes so that is what I am going to do.
    • If using NSX, select EdgeGW -> Load Balancer
    • Under Pools set the Monitors to none, and under members select the secondary and set the state to disable.
  12. Verify that the IaaS service hosted in Microsoft Internet Information Services (IIS) is running by performing the following steps:
  13. For this distributed IaaS Website, make sure traffic is only flowing through the primary.
  14. Verify Cluster last connected
    • On vRA VAMI select vRA Settings -> Cluster
    • The IaaS nodes in the table have a last connected time of fewer than 30 seconds.

    • The virtual appliance nodes have a last connected time of fewer than 10 minutes.

    • If orphaned nodes are listed in the table it must be deleted. An orphaned node is a duplicate node that is reported on the host but does not exist on the host.
  15. Delete any vRA Replicate appliances that are not part of the cluster anymore

 

vRA Upgrade process:

  1. On master vRA appliance, login to the VAMI
    • Verify all services are registered under Services tab
    • Verify this appliance is the master under vRA Settings -> Database
    • Verify update is accessible under Update tab, click Check updates.
    • On Update Tab, click Install Updates
  2. Upgrade has started
  3. To monitor the upgrade in detail open a console within vCenter Server to the master vRA appliance.
    • Tail -f the file /opt/vmware/var/log/vami/updatecli.log
    • Additional upgrade info available here:
      • /opt/vmware/var/log/vami/vami.log

      • /var/log/vmware/horizon/horizon.log

      • /var/log/bootstrap/*.log

  4. After some time you will see on the Update-Status windows the following message:
    • Screen Shot 2018-05-23 at 3.09.47 PM.png
    • Go to System tab and click reboot.
    • In a distributed vRA, all successfully upgraded replica appliance nodes reboot when you reboot the master appliance.
  5. When the system is initialized and all services are up and running the IaaS update starts.
    • Click Update > Status to observe the IaaS upgrade progress
    • Screen Shot 2018-05-23 at 3.47.53 PM.png
  6. When the IaaS update finishes
    • Screen Shot 2018-05-23 at 4.30.14 PM.png
    • Click Cluster in the VAMI and verify the version number is the current version for all IaaS nodes and components.

 

vRealize Orchestrator (vRO) upgrade process:

  1. Before any steps are taken perform the following tasks
    • Take a snapshot of all the vRO server nodes
    • Backup the vRO shared database
    • In vCenter Server verify memory of vRO appliances at least 6GB
    • In vCenter Server verify disk size for Disk1=7GB and Disk2=10GB
      • If not then increase the size
    • Verify root partition of vRO appliances at least 3GB free space
  2. With the options to either migrate or upgrade, I decided to keep my clustered external vRO instances and just perform an upgrade.
  3. Stop the vco-server and vco-configurator services on all vRO servers
    • service vco-server stop
    • service vco-configurator stop
  4. Upgrade only one of the vRO instances in the cluster
    1. Connect ISO update image to the vRO appliance
    2. Login to VAMI
    3. Select Update tab -> Settings
    4. Set Update repository to Use CDROM
    5. Select Settings
    6. Click Check Updates
    7. Verify update is available.
    8. Click Install
    9. Accept EULA
    10. To complete the update, restart the vRO appliance
    11. Log in to VAMI
      1. IMPORTANT: Because I have a clustered vRO, I need to reconfigure the host settings in Control Center.
        • On the Host Settings page in Control Center, click change.
        • Enter the hostname of the VIP name instead of the vRO appliance name.
      2. IMPORTANT: Unregister vRA authentication
        • Since I am using vRA as my authentication provider I need to unregister and register again.
        • Select Configure Authentication provider
        • Click Unregister
        • Type in the password for identity service again
        • Click Register
        • Add the Admin group
        • Click Save Changes
        • Test login to make sure it works!
      3. I rebooted my vRO but a restart of services should work as well.
      4. Run Validate Configuration.
  5. In vCenter Server deploy a new vRO appliance
    • Before I deploy the new vRO appliance, I power off the existing 2nd vRO node.
    • For the vRO Appliance, set the VM name in vCenter Server to something else than the existing 2nd node.
    • Configure the exact same hostname and IP address (network) as the existing vRO 7.2 appliance, which you have not yet upgraded.
    • Power on
    • Login to VAMI and the time server settings.
  6. Login to Control Center of the newly deploy vRO server
    • For Deployment Type select Clustered Orchestrator
    • Enter the hostname or local IP address of the first orchestrator server that was just upgraded (do not enter the VIP address or IP!)
    • Enter the username (root) and password of first orchestrator server
    • Click Join
  7. Both vRO server’s services should now restart
  8. Login to Control Center on either of servers and verify that the cluster is configured successfully.

 

vRA Post-upgrade process:

  1. On the Load Balancer, enable the health monitors and secondary nodes
  2. Upgrading Software Agents to TLS 1.2
    • Update existing templates so that the Software Agents use the TLS 1.2 protocol
      • Log in to vSphere.
      • Convert each virtual machine back to a template.

      • Import and run the software installer
        • Start browser and open https://vra-va-hostname.domain.name/software/
        • Follow instructions to install for Linux and Windows.
      • Convert each VM to a template
    • Update existing deployed Virtual machine
      1. Login to vRA
      2. Click Administration -> Health
      3. Click New Configuration
        1. Enter name “SW Agent verification”
        2. Enter description ” Locate software agents for upgrade to TLS 1.2″
        3. Product select vRealize Automation 7.4.0
        4. Schedule select none
        5. Click Next
      4. On test Suites page, select System Tests for vRealize Automation and Tenant Tests for vRealize Automation.
      5. Click Next
      6. On Configure Parameter page
        1. Enter public web server address
          1. With my distributed installation i enter the VIP
        2. Enter the SSH console address
          1. FQDN of the vRA appliance
        3. Enter rest of usernames and passwords required.
          1. Just one thing the Fabric Administrator Username should have both Tenant administrator and an IaaS administrator role
        4. Click Next
      7. Click Finish
      8. Click Run
      9. Screen Shot 2018-05-24 at 3.31.27 PM.png
      10. When completed, click on the card
      11. Filter by failed
      12. Review and remediate the displayed results
    • If you found any VMs on vSphere with outdated software agent then they need to be updated.
      1. Login to primary vRA VAMI
      2. Select vRA Settings -> SW Agents
      3. Click Toggle TLS 1.0, 1.1
        1. This might seems counter-intuitive but you need to enabled TLS 1.0, 1.1 to be able to communicate with them and upgrade the agent.
      4. Enter Tenant name
      5. Enter Tenant username and password
        1. Needs Software Architect role assigned
      6. Click Test connection
        1. Should see Success checkmark
      7. Click List Batches
      8. Click Show
      9. VMs in the state = Upgradable needs a software upgrade
      10. Enter the amount of VMs select for upgrade in Batch Size Field.
        1. Adjust parameters if you get failures
      11. Click Toggle TLS 1.0, 1.1
        1. This disabled TLS 1.0, 1.1 again
  3. Set the PostgreSQL replication mode back to synchronous
    • I have only a master and replica so this is not possible but just FYI
  4. Run test connections and verify endpoints
    • Click Infrastructure -> Endpoints -> Endpoints.
    • Edit a vSphere endpoint and click Test Connection.
    • If a certificate prompt appears, click OK to accept the certificate.
    • Click Infrastructure -> Compute Resources -> Compute Resources
    • Hover over compute resource and select Data collection
    • Click Request now for all
    • Verify you get a succeeded status.
  5. Run the NSX Network and Security Inventory Data Collection for the Endpoint.
  6. Configure the load balancer to pass traffic on port 8444 to the vRealize Automation appliance to support remote console features.
  7. This is not applicable to my installation, but for a vRA high-availability deployment, you must manually rejoin each target replica vRealize Automation appliance to the cluster to enable high-availability support for the embedded vRealize Orchestrator.
  8. Reconfigure the vRealize Automation external workflow timeout files because the upgrade process overwrites xmldb files.
    • Replace the xmldb files with the files that you backed up before migration
  9. I do not have any blueprints deploying an appliance, but in 7.4 you can enable the Connect to Remote Console Action for consumers
    • Edit the blueprint after you have upgraded the release and select the Connect to Remote Console action on the Action tab.
  10. Restore any changes you made before the upgrade to the app.config file
  11. Enable automatic Manager Service after upgrade
    • SSH to vRA Appliance
    • Change directories to /usr/lib/vcac/tools/vami/commands
    • To enable automatic Manager Service failover, run the following command
      • # python ./manager-service-automatic-failover ENABLE
    • To disable automatic failover throughout an IaaS deployment, run the following command.
      • # python ./manager-service-automatic-failover DISABLE
  12. Finally, deploy some catalog items to verify that everything is good.

 

This is a long and at times confusing upgrade process which can be complicated and daunting but stick with it and you will get through it.

I ran into some interesting issues with my upgrade:

  • Received an error “Unable to connect to port” in vRA when trying to connect to the upgraded external vRO cluster VIP address.
    • Screen Shot 2018-05-25 at 5.04.10 PM.png
    • Only when I shutdown a node, was I able to succesfully connect.
    • This issues was weird because in control center the validation configuration on both nodes came back successful as well as the orchestration cluster management page showed status running with no faults.
    • I manually removed the upgraded host from the vRO cluster and added it back. After a while this somehow got resolved.  Unfortunately I am sure if this was the actual fix for the problem and if you run into something similar I would recommend you open a case with VMware support.
  • I received a bunch of Lucene60 errors in my system log files
    • Have a look at this community post here for additional information and the KB here for resolution.

 

Here is also VMware community post with a list of some known issues identified by users before and after upgrading to vRA 7.4.

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s