I ran into a very serious bug the other day which caused all our production Virtual Servers to loose network connectivity on a specific ESXi host without any notifications or alarms from VMware VCenter server. Actually VMware was completed unaware of this problem and just continued working as normal with all our servers offline.
Debugging the problem:
Reviewed all the physical ports and found that a particular NIC lost its VLAN’s and VMware did not recognize this so all VM’s was left in disconnected state. No network errors was detected on the Cisco physical switches as well as from VMware so no fail over took place
Quick fix:
When to teaming and failover within each Port Group where VM’s are affected and removed the problem NIC from Active uplinks.
VMware support case opened:
Dell PowerEdge R720
BIOS Version 1.3.6
Firmware Version 1.23.23 (Build 01)
Broadcom Gigabit Ethernet BCM5719
Family Firmware Version 7.2.20
tg3
NICs, then only disable NetQueue for the tg3
driver.