vRealize Automation 8.x - Troubleshooting
With the introduction of vRealize Automation (vRA) 8.0, the traditional appliance VAMI page is gone. This is replaced with the vRA CLI and the kubernetes command line tools. This post will show some of the more common CLI commands you may need. To use all of the commands below, use SSH to connect to the appliance and log in with the root username and password.
- Check Pods / 'Services' Status
- Display vRA Cluster Status
- Verify the vRA Deployment Status
- Check Deployment Log File
- Generate a Log Bundle
- Stopping / Shut down vRA Cluster
- Starting vRA Cluster
- vRA 8.x Error - Bad Gateway
- vRA just not working...
- Remove VM from Inventory without deleting the VM
- Remove vRA integration with vRealize Log Insight
Check Pods / 'Services' Status
Although the traditional vRA services are replaced with kubernetes containers, you can still check the running status of them using the command below. This command will show the running status, the age and the number of restarts for each pod or 'service'.
1kubectl -n prelude get pods
Display vRA Cluster Status
Verify the vRA Deployment Status
The output of this command will be "Deployment not complete" if the appliance is still deploying / starting up, otherwise it will show as "Deployment complete".
1vracli status deploy
Check Deployment Log File
The deployment log file is located at the below location
1tail -f /var/log/deploy.log
Generate a Log Bundle
The command below will generate a log bundle and the output file can be found at \root\log-bundle-xxxxxxxxx.tar.xz. For my environment, the log bundle took around 20 minutes to complete and was 60MB in size, however HA environments are likely to take longer and be significantly larger. The --collector-timeout flag can be used to set a timeout for each log collection (default 1000 seconds). The --include-cold-storage may be requested by GSS if the issue you are troubleshooting was not recent as it will include older log files in the log-bundle, however collection will be slower and the output file will be larger.
Stopping / Shut down vRA Cluster
This command will shutdown vRealize Automation on all of the cluster nodes by stopping the services, sleep for 2 minutes and clean the current deployment before shutting down the appliance. Check the official docs here for up-to-date procedures.
1/opt/scripts/svc-stop.sh 2sleep 120 3/opt/scripts/deploy.sh --onlyClean 4shutdown -h now
Starting vRA Cluster
Power on each of the appliances and wait for them to boot completely before proceeding. Wait for the appliance console to show the blue welcome page. Ensure that all prerequisite servers are also started such as vRealize Identity Manager (vIDM). This command will run the deploy.sh script to deploy all prelude services and then the kubectl command will show the status of all the running pods or 'services'. This process can take 20+ minutes. If the appliance has insufficient memory, the timeout will occur at 30 minutes. Check the official docs here for up-to-date procedures.
1/opt/scripts/deploy.sh 2kubectl -n prelude get pods
vRA 8.x Error - Bad Gateway
After starting up your vRA appliances, you may find that the UI loads but shows an error of Bad Gateway. This is usually because the appliance is still starting up. Presuming the appliance has enough resources assigned to it, the UI will eventually load and as per above, the status of the deployment can be checked using the below command. Check the READY column and confirm that all pods are ready for use. Any pod with a READY value of 0/1 means that the pod is not available yet. Once all pods are listed as 1/1 or 2/2 then the UI will be available for use.
1kubectl -n prelude get pods
vRA just not working...
After trying all of the above, sometimes vRA just won't come back online after a failure. If this is the case, run the command above to check the status of the pods and if they are all online except the postgres database pod, try the below command to restart the kubelet service. Once this is run, let it sit for the next 30 minutes as vRA will restart itself and try to come back online cleanly.
1systemctl restart kubelet
Remove VM from Inventory without deleting the VM
Whilst this is 100% not supported, vautomation.dev provides a very useful article on how to remove VMs from inventory without deleting the underlying VM by accessing the internal vRA database.
Remove vRA integration with vRealize Log Insight
Run the below command to remove the integration between vRA and vRLI in addition to removing the configuration from the vRLI interface.
1vracli vrli unset
To add the integration back again, run the command below and substitute vrli8.homelab.local with the FQDN or URL of your vRLI instance.
1vracli vrli set vrli8.homelab.local