So, I was full of energy completing one of my NSX 6.3.5 deployments. Everything was straight forward deploying NSX Manager, Controllers, preparing hosts, configuring VXLAN, … etc, until I faced an issue deploying NSX Edges. This was supposed to be an easy task compared to the previous ones, however and from my experience in the field, I have learned not to under estimate any task as there are a lot to learn from each project.
Shortly, the issue was that ESG gets deployed to the vCenter, but the “Deploying OVF Template” progress bar got stuck at 0% where the installation failed and the ESG VM got deleted after about 8 minutes. The below error message appeared after the wizard failed:
Operation failed on VC. For more details, refer to the rootCauseString or the VC logs
The following was checked:
- DNS Forward/Reverse Records for NSX Manager, ESXi hosts, and vCenter.
- ESXi hosts firewall if it was configured to block any connectivity.
- Edge cluster resources if it was sufficient to accommodate the new VM deployment.
After about 2 hours of troubleshooting, I decided to check all the ports needed by NSX to operate between components. These are clearly available at this VMware KB article https://kb.vmware.com/s/article/2079386.
That being said, I noticed that TCP port 902 is responsible about provisioning and needs to be opened from NSX Manager to ESXi servers to be able to provision and deploy that ESG VM. Then I logged on to the NSX Manager via SSH and issued this command to check if the port is opened:
debug connection <ESXi IP Address>
And yes, the port was closed 🙁
It seemed that the security team cleaned and refined some ACL rules on their physical firewall and unfortunately deleted the rule to open TCP port 902 from NSX manager to ESXi servers in the Edge cluster.
After the port was opened again, I was able to deploy ESG edges without any issue.
Hope this post is helpful for everyone facing such an issue,