This handbook provides a complete, step-by-step health check procedure for ESXi 8.0 hosts deployed within a VCF 9.0 environment. Use this during:
| Category | Checks Performed |
|---|---|
| Hardware | Sensors, IPMI, memory errors, CPU, PCI, firmware |
| Storage | VMFS, SCSI, HBA, multipathing, NFS, latency, SMART |
| Networking | VMkernel, vSwitches, uplinks, drivers, CDP/LLDP, connectivity |
| Services | All ESXi services, hostd, vpxa, NTP, syslog |
| Security | SSH, lockdown mode, firewall, certificates, account lockout |
| Performance | CPU ready, co-stop, memory balloon/swap, overcommit |
| Boot/Patch | Boot device, boot banks, image profile, VIB compliance |
root on the ESXi host unless otherwise specified. Replace <esxi-host> with your actual hostname or IP.
| Access Type | Target | Credentials |
|---|---|---|
| SSH (22) | Each ESXi host | root / password |
| HTTPS (443) | Each ESXi host | root / password |
| vCenter API | vCenter Server | administrator@vsphere.local |
# Via DCUI: F2 → Troubleshooting Options → Enable SSH
# Via PowerCLI:
Get-VMHost -Name <esxi-host> | Get-VMHostService | Where {$_.Key -eq "TSM-SSH"} | Start-VMHostService
# Via esxcli (from vCenter or DCUI):
vim-cmd hostsvc/enable_ssh
vim-cmd hostsvc/start_ssh
# Set per-host or iterate across all hosts
export ESXI_HOSTS="192.168.1.201 192.168.1.202 192.168.1.203 192.168.1.204"
export ESXI_USER="root"
export ESXI_PASS="YourPassword123!"
# Loop template for multi-host checks
for HOST in $ESXI_HOSTS; do
echo "======== $HOST ========"
ssh root@$HOST '<command>'
done
| # | Check | Command | PASS | WARN | FAIL |
|---|---|---|---|---|---|
| 4.1 | Hardware Sensors | esxcli hardware ipmi sdr list |
All 0x01 (Normal) |
Any 0x02 (Warning) |
Any 0x04 (Critical) |
| 4.2 | Memory Errors | esxcli hardware memory get |
0 uncorrectable | Correctable ECC errors | Uncorrectable ECC errors |
| 5.1 | VMFS Datastores | esxcli storage vmfs extent list |
All mounted | Snapshot consolidation needed | Datastore unmounted |
| 5.3 | Multipathing | esxcli storage nmp path list |
All paths active |
Any path standby unexpected |
Any path dead |
| 5.5 | Disk Latency | esxtop (d for disk) |
< 20ms avg | 20-50ms avg | > 50ms avg |
| 6.1 | VMkernel NICs | esxcli network ip interface list |
All enabled, IPs assigned | MTU mismatch | VMkernel missing/down |
| 6.3 | Uplinks | esxcli network nic list |
All Up |
Any NIC not at expected speed | Any NIC Down |
| 7.2 | Critical Services | esxcli system process list |
hostd, vpxa running | Non-critical service stopped | hostd or vpxa not running |
| 8 | NTP | esxcli system ntp get |
Synchronized, drift < 1s | Drift 1-5s | NTP not configured or drift > 5s |
| 9 | Syslog | esxcli system syslog config get |
Remote target configured | Local-only logging | Syslog service not running |
| 10 | Scratch | vim-cmd hostsvc/advopt/get ScratchConfig.ConfiguredScratchLocation |
Persistent storage | Ramdisk (non-persistent) | Not configured |
| 11 | Core Dump | esxcli system coredump partition get |
Partition configured, active | Network dump only | No dump configured |
| 12.1 | SSH | esxcli system process list | grep SSH |
Disabled (production) | Enabled with timeout | Enabled, no timeout |
| 12.2 | Lockdown Mode | vim-cmd hostsvc/hostsummary | grep lockdownMode |
lockdownNormal or lockdownStrict |
lockdownDisabled (lab OK) |
lockdownDisabled (production) |
| 13.1 | CPU Ready | esxtop (c for CPU) |
< 5% %RDY | 5-10% %RDY | > 10% %RDY |
| 13.2 | Balloon/Swap | esxcli hardware memory get |
0 balloon, 0 swap | Balloon active | Swap active |
| 14 | Boot Banks | bootbank-util status |
Both banks healthy | Alt bank outdated | Primary bank corrupt |
| 15 | Patch Level | esxcli software profile get |
Matches VCF BOM | 1 patch behind | 2+ patches behind |
What: Read IPMI sensor data to verify temperatures, voltages, fan speeds, and power supply status.
Why: Hardware sensor warnings precede failures. Catching them early prevents unplanned outages.
esxcli hardware ipmi sdr list
Expected Output (Healthy):
Name Entity ID Sensor Type Reading Units Status
---------------------------------------------------------------------------
Inlet Temp 0x01.0x01 Temperature 22 C 0x01
Exhaust Temp 0x01.0x02 Temperature 35 C 0x01
CPU1 Temp 0x03.0x01 Temperature 48 C 0x01
CPU2 Temp 0x03.0x02 Temperature 45 C 0x01
FAN1 0x1d.0x01 Fan 6200 RPM 0x01
FAN2 0x1d.0x02 Fan 6100 RPM 0x01
P1-DIMMA1 Temp 0x20.0x01 Temperature 32 C 0x01
PSU1 Status 0x0a.0x01 Power Supply — — 0x01
PSU2 Status 0x0a.0x02 Power Supply — — 0x01
| Code | Meaning | Action |
|---|---|---|
0x01 |
Normal | No action |
0x02 |
Warning | Monitor closely |
0x04 |
Critical | Immediate attention |
0x08 |
Non-recoverable | Hardware replacement |
| Result | Criteria | Indicator |
|---|---|---|
| PASS | All sensors 0x01 (Normal) |
Hardware healthy |
| WARN | Any sensor 0x02 (Warning) |
Monitor / schedule maintenance |
| FAIL | Any sensor 0x04 or 0x08 |
Replace hardware component |
# Detailed hardware platform info
esxcli hardware platform get
# BMC/IPMI firmware info
esxcli hardware ipmi bmc get
What: Check for correctable (CE) and uncorrectable (UE) memory errors.
# Memory overview
esxcli hardware memory get
# Check DIMM status
esxcli hardware memory dimm list
Expected Output:
Physical Memory: 549,755,813,888 Bytes (512 GB)
Reliable Memory: 549,755,813,888 Bytes
DIMM Locator Bank Type Speed Size Status
----------------------------------------------------------
P1-DIMMA1 Bank 0 DDR5 4800 32768 MB ok
P1-DIMMB1 Bank 1 DDR5 4800 32768 MB ok
P1-DIMMC1 Bank 2 DDR5 4800 32768 MB ok
...
grep -i "machine check\|memory error\|ECC\|CECC\|UECC" /var/log/vmkernel.log | tail -20
| Result | Criteria | Indicator |
|---|---|---|
| PASS | All DIMMs ok, no errors in vmkernel log |
Healthy |
| WARN | Correctable ECC errors (CE) present | Schedule DIMM replacement |
| FAIL | Uncorrectable errors (UE) or DIMM status not ok |
Immediate replacement |
# CPU info
esxcli hardware cpu list | head -30
# CPU global stats
esxcli hardware cpu global get
Expected Output:
CPU Packages: 2
CPU Cores: 32
CPU Threads: 64
Hyperthreading Active: true
Hyperthreading Supported: true
# List all PCI devices
esxcli hardware pci list | grep -E "Device Name|Vendor Name|Address"
# BIOS info
esxcli hardware platform get
# Storage controller firmware
esxcli storage core adapter list
What: Verify all VMFS datastores are mounted, accessible, and have adequate free space.
# List VMFS extents
esxcli storage vmfs extent list
Expected Output:
Volume Name VMFS UUID Extent # Device Name Partition
-----------------------------------------------------------------------------------------------------------------
datastore1 61234567-abcdef01-2345-001122334455 0 naa.600508b4001234567890abcdef012345 3
vsanDatastore 71234567-abcdef01-2345-001122334455 0 — —
# Datastore capacity (via df equivalent)
esxcli storage filesystem list
Expected Output:
Mount Point Volume Name UUID Mounted Type Size Free
------------------------------------------------------------------------------------------------------------------
/vmfs/volumes/ds1 datastore1 61234567-abcdef01... true VMFS-6 1099511627776 549755813888
/vmfs/volumes/vsanDs vsanDatastore 71234567-abcdef01... true vsan 4398046511104 2199023255552
| Result | Criteria | Indicator |
|---|---|---|
| PASS | All datastores mounted, > 20% free | Healthy |
| WARN | Any datastore 10-20% free | Plan capacity expansion |
| FAIL | Any datastore < 10% free or unmounted | Critical — VMs may not power on |
# List all SCSI devices
esxcli storage core device list
# Check for devices in APD/PDL state
esxcli storage core device list | grep -E "Display Name|Status|Is Perennially Reserved"
Key Indicators:
| Status | Meaning |
|---|---|
on |
Device is online and accessible |
off |
Device is offline — APD or PDL |
APD |
All Paths Down — temporary, may recover |
PDL |
Permanent Device Loss — device is gone |
Misc.APDHandlingEnable and Misc.APDTimeout.
What: Verify all storage paths are active and HBAs are functioning.
# List HBAs
esxcli storage core adapter list
Expected Output:
HBA Name Driver Link State UID Description
------------------------------------------------------------------------------
vmhba0 lsi_mr3 link-up sas.5001636001234567 LSI Logic SAS3108
vmhba1 lsi_mr3 link-up sas.5001636001234568 LSI Logic SAS3108
vmhba32 iscsi_vmk online iqn.1998-01.com.vmware:esxi01 VMware iSCSI
# List all paths and their state
esxcli storage nmp path list
Expected Output:
Runtime Name: vmhba0:C0:T0:L0
Device: naa.600508b400123456...
Adapter: vmhba0
LUN: 0
State: active
Transport: sas
Runtime Name: vmhba1:C0:T0:L0
Device: naa.600508b400123456...
Adapter: vmhba1
LUN: 0
State: active
Transport: sas
| Result | Criteria | Indicator |
|---|---|---|
| PASS | All paths active, multiple paths per device |
Fully redundant |
| WARN | Any path standby when should be active |
Check path policy |
| FAIL | Any path dead or single path per device |
Path failure — no redundancy |
esxcli storage core adapter rescan --allesxcli storage core claiming reclaim -d naa.xxxesxcli storage nmp device list
# List NFS datastores
esxcli storage nfs list
Expected Output:
Volume Name Host Share Accessible Mounted Read-Only Hardware Acceleration
----------------------------------------------------------------------------------------------
nfs-backup 192.168.1.100 /exports true true false Supported
| Result | Criteria | Indicator |
|---|---|---|
| PASS | All NFS shares Accessible: true, Mounted: true |
Healthy |
| FAIL | Any NFS share Accessible: false |
NFS server unreachable |
What: Check real-time disk latency and SMART health data.
# Interactive mode — press 'd' for disk view
esxtop
# Batch mode (3 samples, 5-second intervals)
esxtop -b -d 5 -n 3 | grep -E "DAVG|KAVG|GAVG"
Key Metrics:
| Metric | Description | PASS | WARN | FAIL |
|---|---|---|---|---|
| DAVG/cmd | Device latency (physical disk) | < 20ms | 20-50ms | > 50ms |
| KAVG/cmd | Kernel latency (VMkernel queue) | < 2ms | 2-5ms | > 5ms |
| GAVG/cmd | Guest OS observed latency | < 25ms | 25-50ms | > 50ms |
# Check SMART status for local disks
esxcli storage core device smart get -d <device-naa-id>
What: Verify all VMkernel adapters have correct IPs, MTU, and enabled services.
# List VMkernel interfaces
esxcli network ip interface list
Expected Output:
Name MAC Address Enabled MTU Portgroup/DVPort Stack IPv4 Address
---------------------------------------------------------------------------------------
vmk0 00:50:56:01:aa:01 true 1500 Management Network defaultTcpipStack 192.168.1.201
vmk1 00:50:56:01:aa:02 true 9000 vMotion vmotion 192.168.10.201
vmk2 00:50:56:01:aa:03 true 9000 vSAN defaultTcpipStack 192.168.12.74
vmk10 00:50:56:01:aa:04 true 1600 nsx-overlay defaultTcpipStack 192.168.14.201
esxcli network ip interface tag get -i vmk0
esxcli network ip interface tag get -i vmk1
esxcli network ip interface tag get -i vmk2
| VMkernel | Expected Service Tags | MTU |
|---|---|---|
| vmk0 | Management | 1500 |
| vmk1 | vMotion | 9000 |
| vmk2 | vSAN | 9000 |
| vmk10 | NSX TEP (no tag — NSX managed) | 1600+ |
| Result | Criteria | Indicator |
|---|---|---|
| PASS | All vmknics enabled, correct IPs, correct MTU | Healthy |
| WARN | MTU mismatch or wrong service tag | Reconfigure |
| FAIL | VMkernel adapter missing or no IP | Connectivity loss |
# List standard vSwitches
esxcli network vswitch standard list
# List Distributed vSwitches
esxcli network vswitch dvs vmware list
Expected Output (DVS):
Name: DSwitch-Management
Configured Ports: 2048
Max Ports: 2048
MTU: 9000
CDP Status: both
Uplinks: vmnic0, vmnic1
Name: DSwitch-Compute
Configured Ports: 2048
Max Ports: 2048
MTU: 9000
Uplinks: vmnic2, vmnic3
What: Verify all physical NICs are connected, at expected speed, and link is up.
esxcli network nic list
Expected Output:
Name PCI Device Driver Admin Status Link Status Speed Duplex MTU MAC Address
-------------------------------------------------------------------------------------------
vmnic0 0000:3b:00 i40en Up Up 25000 Full 9000 00:50:56:01:aa:10
vmnic1 0000:3b:01 i40en Up Up 25000 Full 9000 00:50:56:01:aa:11
vmnic2 0000:5e:00 i40en Up Up 25000 Full 9000 00:50:56:01:aa:12
vmnic3 0000:5e:01 i40en Up Up 25000 Full 9000 00:50:56:01:aa:13
| Result | Criteria | Indicator |
|---|---|---|
| PASS | All NICs Link Status: Up, expected speed |
Healthy |
| WARN | NIC at lower speed than expected (e.g., 10G vs 25G) | Check cables/switch |
| FAIL | Any NIC Link Status: Down |
Cable/switch/NIC failure |
esxcli software vib install -v /path/to/driver.vib
# Driver details per NIC
esxcli network nic get -n vmnic0
Expected Output:
Advertised Auto Negotiation: true
Auto Negotiation: true
Driver Info:
Bus Info: 0000:3b:00.0
Driver: i40en
Firmware Version: 9.20
Version: 2.5.3.0
Link Detected: true
Link Status: Up
What: Discover the physical switch port each ESXi NIC is connected to.
# CDP (Cisco Discovery Protocol)
esxcli network vswitch dvs vmware lacp config get
# or via vim-cmd:
vim-cmd hostsvc/net/query_networkhint --pnic-name=vmnic0
# Python one-liner to get CDP info
python3 -c "
from pyVim.connect import SmartConnect
import ssl
# ... (connect to host and query PhysicalNicHintInfo)
"
# Management network
vmkping -I vmk0 <vcenter-ip>
# vMotion network (jumbo frame test)
vmkping -I vmk1 -d -s 8972 <other-host-vmotion-ip>
# vSAN network (jumbo frame test)
vmkping -I vmk2 -d -s 8972 <other-host-vsan-ip>
# NSX TEP (MTU 1600)
vmkping -I vmk10 -d -s 1572 <other-host-tep-ip>
| Test | PASS | FAIL |
|---|---|---|
| Management | 0% loss | Any loss |
| vMotion jumbo | 0% loss with -s 8972 | Loss or "packet too big" |
| vSAN jumbo | 0% loss with -s 8972 | Loss or MTU error |
| NSX TEP | 0% loss with -s 1572 | Loss or MTU error |
# List all services and their status
esxcli system process list | head -60
# Alternative — list via chkconfig
chkconfig --list | sort
| Service | Process | Function | Impact if Down |
|---|---|---|---|
| hostd | hostd |
ESXi host agent | Host unreachable from vCenter |
| vpxa | vpxa |
vCenter agent | Host disconnects from vCenter |
| vobd | vobd |
VMware Observability | Events/alarms not generated |
| fdm | fdm |
HA agent | HA not functional on host |
| ntpd | ntpd |
Time sync | Time drift, cert issues |
| sfcbd | sfcbd |
CIM broker | Hardware monitoring unavailable |
| lbtd | lbtd |
Load-based teaming | NIC load balancing inactive |
| nsx-mpa | nsx-mpa |
NSX management plane agent | NSX connectivity loss |
| nsx-proxy | nsx-proxy |
NSX proxy | NSX data plane issues |
/etc/init.d/hostd status
/etc/init.d/vpxa status
/etc/init.d/fdm status
/etc/init.d/ntpd status
/etc/init.d/nsx-mpa status
/etc/init.d/nsx-proxy status
Expected Output for each:
hostd is running.
/etc/init.d/hostd restart/etc/init.d/vpxa restart/sbin/services.sh restart/var/log/hostd.log, /var/log/vpxa.log
What: Verify NTP is configured, running, and the host clock is synchronized.
# Check NTP configuration
esxcli system ntp get
Expected Output:
Enabled: true
Loglevel: warning
Server:
- 192.168.1.1
- 192.168.1.2
# Check NTP service status
/etc/init.d/ntpd status
# Check time offset
esxcli system time get
ntpq -p
Expected ntpq Output:
remote refid st t when poll reach delay offset jitter
==============================================================================
*192.168.1.1 .GPS. 1 u 64 128 377 0.543 0.125 0.043
+192.168.1.2 .GPS. 1 u 32 128 377 0.621 0.213 0.051
| Result | Criteria | Indicator |
|---|---|---|
| PASS | NTP enabled, offset < 1000ms, reach > 0 |
Synchronized |
| WARN | Offset 1-5 seconds | Drifting |
| FAIL | NTP not configured, offset > 5s, or reach = 0 |
Unsynchronized |
What: Verify syslog is forwarding to a remote collector.
# Get syslog config
esxcli system syslog config get
Expected Output:
Default Network Retry Timeout: 180
Dropped Log File Rotation Size: 100
Dropped Log File Rotations: 10
Log Output: /scratch/log
Log To Unique Subdirectory: false
Remote Host: udp://loginsight.lab.local:514
# Check syslog service
esxcli system syslog mark --message="Health Check Test $(date)"
# Then verify it appears on remote collector
| Result | Criteria | Indicator |
|---|---|---|
| PASS | Remote host configured, syslog service running | Healthy |
| WARN | Local logging only (no remote) | Logs may be lost on failure |
| FAIL | Syslog service not running | No logging |
What: Verify the scratch partition points to persistent storage (not ramdisk).
vim-cmd hostsvc/advopt/get ScratchConfig.ConfiguredScratchLocation
vim-cmd hostsvc/advopt/get ScratchConfig.CurrentScratchLocation
Expected Output:
ConfiguredScratchLocation: /vmfs/volumes/datastore1/.locker-<hostname>
CurrentScratchLocation: /vmfs/volumes/datastore1/.locker-<hostname>
/tmp/scratch or empty, it's using ramdisk. Logs and coredumps will be lost on reboot. Set it to persistent storage.
# Check core dump partition
esxcli system coredump partition get
# Check network core dump
esxcli system coredump network get
Expected Output:
Active: true
Configured: true
Partition: naa.xxx:7
# Network dump:
Enabled: true
Host VmkNic: vmk0
Network Server IP: 192.168.1.50
Network Server Port: 6500
| Result | Criteria | Indicator |
|---|---|---|
| PASS | Partition configured and active, or network dump enabled | Can capture PSOD |
| WARN | Only network dump (no local partition) | Depends on network availability |
| FAIL | No dump target configured | PSOD data will be lost |
# Check if SSH is running
/etc/init.d/SSH status
# Check SSH timeout
esxcli system settings advanced list -o /UserVars/ESXiShellInteractiveTimeOut
esxcli system settings advanced list -o /UserVars/ESXiShellTimeOut
| Environment | Expected SSH State | Timeout |
|---|---|---|
| Production | Disabled (enable only for maintenance) | 300-900 seconds |
| Lab | Enabled acceptable | 900 seconds |
vim-cmd hostsvc/hostsummary | grep lockdownMode
Expected Output:
lockdownMode = "lockdownNormal"
| Mode | Description | Recommendation |
|---|---|---|
lockdownDisabled |
No lockdown | Lab only |
lockdownNormal |
Only vCenter can manage host | Production recommended |
lockdownStrict |
vCenter only, no DCUI | High-security environments |
# List all firewall rulesets
esxcli network firewall ruleset list
# Check specific rules
esxcli network firewall ruleset rule list --ruleset-id=sshServer
| Ruleset | Expected State | Purpose |
|---|---|---|
sshServer |
Enabled (maintenance) / Disabled (production) | SSH access |
webAccess |
Enabled | Host client UI |
vSphereClient |
Enabled | vCenter connectivity |
nsx |
Enabled | NSX communication |
ntpClient |
Enabled | NTP synchronization |
syslog |
Enabled | Log forwarding |
# Check the ESXi host certificate
openssl x509 -in /etc/vmware/ssl/rui.crt -noout -dates -subject
Expected Output:
notBefore=Jan 15 00:00:00 2026 GMT
notAfter=Jan 15 00:00:00 2028 GMT
subject=CN = esxi-01.lab.local, ...
| Result | Criteria | Indicator |
|---|---|---|
| PASS | Certificate > 30 days from expiry | Healthy |
| WARN | Certificate 7-30 days from expiry | Plan renewal |
| FAIL | Certificate expired or < 7 days | Renew immediately |
esxcli system account policy get
Expected Output:
Maximum Failed Login Attempts: 5
Unlock Time (seconds): 900
What: Check if VMs are experiencing CPU scheduling delays (CPU ready time) or NUMA/co-stop issues.
# Interactive — press 'c' for CPU view
esxtop
# Batch mode
esxtop -b -d 5 -n 3 > /tmp/esxtop-cpu.csv
Key Columns in CPU View:
| Column | Description | PASS | WARN | FAIL |
|---|---|---|---|---|
| %RDY | CPU Ready % | < 5% | 5-10% | > 10% |
| %CSTP | Co-Stop % | < 3% | 3-5% | > 5% |
| %USED | CPU Used % | < 80% | 80-90% | > 90% |
# Host memory summary
esxcli hardware memory get
# Check ballooning and swap via esxtop ('m' for memory view)
esxtop
Key Memory Metrics:
| Metric | Description | PASS | WARN | FAIL |
|---|---|---|---|---|
| MCTL (Balloon) | Memory reclaimed by balloon driver | 0 MB | > 0 but < 10% of VM memory | > 10% of VM memory |
| SWCUR (Swap used) | Memory swapped to disk | 0 MB | Any swap used | Swap actively in use (SWR/s > 0) |
| ZIP/s (Compression) | Memory compression rate | 0 | Low compression | High compression |
| CACHEUSD | Host cache used | — | — | > 80% of cache |
# Calculate overcommit ratio
# Total VM vCPUs / Physical threads
esxcli hardware cpu global get
vim-cmd vmsvc/getallvms | wc -l
# Total configured VM memory vs physical memory
esxcli hardware memory get
| Resource | PASS | WARN | FAIL |
|---|---|---|---|
| vCPU : pCPU | < 3:1 | 3:1 - 5:1 | > 5:1 |
| vRAM : pRAM | < 1.2:1 | 1.2:1 - 1.5:1 | > 1.5:1 |
What: Verify the boot device, boot banks, and acceptance level.
# Boot device
esxcli system boot device get
# Boot bank status
/bin/bootbank-util status
# Acceptance level
esxcli software acceptance get
Expected Output:
Boot Device: mpx.vmhba0:C0:T0:L0
Boot Filesystem UUID: 61234567-abcdef01...
Boot bank: /bootbank (valid)
Alt boot bank: /altbootbank (valid)
Acceptance Level: VMwareCertified
| Result | Criteria | Indicator |
|---|---|---|
| PASS | Both boot banks valid, acceptance VMwareCertified or VMwareAccepted |
Healthy |
| WARN | Alt boot bank outdated | Update after next patch |
| FAIL | Primary boot bank corrupt or CommunitySupported acceptance |
Remediation needed |
What: Verify the installed ESXi image profile matches the VCF BOM.
# Installed image profile
esxcli software profile get
Expected Output:
Name: (Updated) ESXi-8.0U3-12345678-standard
Vendor: VMware, Inc.
Creation Time: 2026-01-15T00:00:00
Modification Time: 2026-01-15T00:00:00
Stateless Ready: True
# List all installed VIBs
esxcli software vib list | head -30
# Check for specific VIB
esxcli software vib list | grep -i nsx
| Component | Expected Version |
|---|---|
| ESXi | 8.0 U3 (build 12345678+) |
| NSX VIBs | 4.2.x |
| vSAN | Included in ESXi |
| Drivers | Per HCL |
| Source | Port | Protocol | Purpose |
|---|---|---|---|
| vCenter | 443 | TCP | Host management |
| vCenter | 902 | TCP | VM console (MKS) |
| vCenter | 8080 | TCP | vSphere Update Manager |
| Admin | 22 | TCP | SSH (when enabled) |
| Admin | 443 | TCP | Host Client UI |
| Admin | 5989 | TCP | CIM/WBEM |
| NSX Manager | 443 | TCP | Host preparation |
| Syslog Collector | 514 | UDP/TCP | Syslog (if push) |
| SNMP Manager | 161 | UDP | SNMP queries |
| Destination | Port | Protocol | Purpose |
|---|---|---|---|
| vCenter | 443 | TCP | vpxa → VPXD |
| vCenter | 80 | TCP | Reverse proxy |
| NSX Manager | 1234 | TCP | MPA |
| NSX Manager | 1235 | TCP | Central CLI |
| NSX Manager | 5671 | TCP | Message bus |
| ESXi (other) | 8000 | TCP | vMotion |
| ESXi (other) | 902 | TCP | NFC (provisioning) |
| ESXi (other) | 2233 | TCP | vSAN transport |
| ESXi (other) | 4789 | UDP | Geneve overlay |
| NTP Server | 123 | UDP | Time sync |
| DNS Server | 53 | TCP/UDP | Name resolution |
| Syslog | 514/6514 | UDP/TCP | Log forwarding |
| Dump Collector | 6500 | TCP | Network core dump |
| Symptom | Likely Cause | Resolution |
|---|---|---|
| Purple screen on console | Kernel panic — driver/firmware bug, hardware failure, memory corruption | Collect core dump; check /var/core/; contact VMware support with dump |
| Recurring PSOD | Driver bug | Update driver/firmware per HCL; check VMware KB |
esxcli system coredump partition getvm-support --performance --output /vmfs/volumes/datastore1/| Symptom | Type | Resolution |
|---|---|---|
| VMs frozen, "APD timeout" in logs | All Paths Down | Check SAN fabric, zoning, HBA; paths auto-recover |
| VMs terminated, "PDL" in logs | Permanent Device Loss | Storage LUN is permanently gone; restore from backup |
| Symptom | Likely Cause | Resolution |
|---|---|---|
| Host shows disconnected in vCenter | Management network issue | Check vmk0 IP, switch port, VLAN; restart management agents |
| vSAN partition event | vSAN network failure | Check vmk2, switch MTU, VLAN trunking |
| vMotion failures | vMotion network | Check vmk1, MTU 9000, IP routing |
# On ESXi host — restart management agents
/sbin/services.sh restart
# If that doesn't work, restart only vpxa
/etc/init.d/vpxa restart
# Check hostd logs for errors
tail -100 /var/log/hostd.log | grep -i error
tail -100 /var/log/vpxa.log | grep -i error
| Symptom | Check | Resolution |
|---|---|---|
| High CPU Ready | esxtop %RDY > 10% |
Right-size VMs (reduce vCPUs), DRS rebalance |
| Memory ballooning | esxtop MCTLSZ > 0 |
Add RAM or migrate VMs |
| Storage latency | esxtop DAVG > 50ms |
Check SAN, move VM to faster storage |
| Network drops | esxcli network nic stats get -n vmnicX |
Check NIC errors, replace cable/NIC |
| Command | Purpose |
|---|---|
esxcli system version get |
ESXi version and build |
esxcli system hostname get |
Hostname and domain |
esxcli system uuid get |
System UUID |
esxcli system boot device get |
Boot device info |
esxcli system time get |
Current system time |
esxcli system stats uptime get |
Uptime in seconds |
esxcli hardware platform get |
Hardware platform info |
esxcli hardware cpu global get |
CPU count/threads |
esxcli hardware memory get |
Total physical memory |
| Command | Purpose |
|---|---|
esxcli storage filesystem list |
Datastores and capacity |
esxcli storage vmfs extent list |
VMFS extents |
esxcli storage core device list |
SCSI devices |
esxcli storage core adapter list |
HBA adapters |
esxcli storage nmp path list |
Multipath status |
esxcli storage nmp device list |
Path policy per device |
esxcli storage core adapter rescan --all |
Rescan all storage |
esxcli storage nfs list |
NFS datastores |
| Command | Purpose |
|---|---|
esxcli network nic list |
Physical NICs |
esxcli network nic get -n vmnicX |
NIC detail/driver |
esxcli network nic stats get -n vmnicX |
NIC statistics |
esxcli network ip interface list |
VMkernel adapters |
esxcli network ip interface ipv4 get |
VMkernel IPv4 |
esxcli network ip route ipv4 list |
Routing table |
esxcli network ip dns server list |
DNS servers |
esxcli network vswitch standard list |
Standard vSwitches |
esxcli network vswitch dvs vmware list |
Distributed vSwitches |
esxcli network firewall ruleset list |
Firewall rules |
vmkping -I vmkX <target> |
Ping from VMkernel |
| Command | Purpose |
|---|---|
/etc/init.d/hostd status |
Host agent status |
/etc/init.d/vpxa status |
vCenter agent status |
/etc/init.d/ntpd status |
NTP daemon status |
/etc/init.d/SSH status |
SSH service status |
/sbin/services.sh restart |
Restart all mgmt agents |
esxcli system process list |
Running processes |
chkconfig --list |
Service startup config |
| Command | Purpose |
|---|---|
esxcli software acceptance get |
Acceptance level |
esxcli system account policy get |
Lockout policy |
vim-cmd hostsvc/hostsummary | grep lockdown |
Lockdown mode |
esxcli network firewall get |
Firewall status |
openssl x509 -in /etc/vmware/ssl/rui.crt -noout -dates |
Cert expiry |
| Command | Purpose |
|---|---|
esxcli software profile get |
Current image profile |
esxcli software vib list |
All installed VIBs |
esxcli software vib install -v <path> |
Install VIB |
esxcli software sources profile list -d <depot> |
Available profiles |
| Command | Purpose |
|---|---|
esxtop |
Real-time performance |
esxtop -b -d 5 -n 3 > /tmp/out.csv |
Batch performance capture |
vm-support --performance |
Generate support bundle |
vmkbacktrace |
Stack trace |
vobd -e |
Event logging |
tail -f /var/log/vmkernel.log |
Kernel log (live) |
tail -f /var/log/hostd.log |
Host agent log |
tail -f /var/log/vpxa.log |
vCenter agent log |
ESXi Hosts Health Check Handbook Version 1.0 | March 2026 © 2026 Virtual Control LLC — All Rights Reserved