Your VCF Environment Deserves Better Than "It Seems Fine"
The Problem You Already Know About
You are running VMware Cloud Foundation. Your environment has vCenter, SDDC Manager, NSX, ESXi hosts, certificates, DNS records, datastores, vSAN clusters, and dozens of interconnected services — all of which need to be healthy for your infrastructure to function.
Right now, how do you know everything is healthy?
You check vCenter manually. You SSH into SDDC Manager. You open NSX and look around. You hope nothing slipped through the cracks. When something does break, you spend hours figuring out what went wrong, tracing it back through logs, and explaining to stakeholders why it happened.
A certificate expired silently. DNS stopped resolving for a management component. A vSAN disk group went degraded and nobody noticed. An SDDC Manager backup has not run in 47 days. A vCenter service crashed and is not running. ESXi hosts drifted out of compliance. Orphaned VM objects are consuming storage nobody can account for. Unclaimed disks are sitting idle while the team debates whether to buy more capacity.
These are not hypothetical problems. They are the problems that wake you up at 2 AM, cost your company thousands in downtime, and erode trust with the clients you serve.
What If You Could Check Everything In Under 5 Minutes
VCF Health Check connects to your entire VCF environment, runs over 130 individual checks across every major component, and gives you a single grade (A through F) with a detailed report — all in under five minutes.
One click. Under five minutes. Complete visibility.
No manual SSH sessions. No hopping between six management consoles. No spreadsheets of things to remember to check. VCF Health Check does it all automatically, every time, consistently, and tells you exactly what needs attention — with specific remediation steps for every issue found.
What Gets Checked
VCF Health Check is not a generic monitoring tool. It was built specifically for VMware Cloud Foundation environments and understands the relationships between components that matter. Every check was designed to catch a real problem that has caused a real outage somewhere.
Infrastructure and Network (14 Checks)
DNS forward resolution for every management component — vCenter, SDDC Manager, NSX VIP, and NSX Node. DNS reverse resolution (PTR record) consistency for the same four endpoints. SSL certificate expiry monitoring with configurable advance warnings (default: 30 days). Network latency tracking with TCP connect time measurement to vCenter, SDDC Manager, and NSX — reported in a color-coded latency table. ESXi host HTTPS reachability checks running in parallel for speed, plus per-host SSH port verification and coredump configuration validation. Clock drift detection compares time across all management endpoints using HTTP Date headers, warning at 5-second drift and failing at 30-second drift.
If a DNS record is wrong, you will know before it causes a deployment failure. If a certificate is expiring in 30 days, you will know before it causes an outage at 2 AM. If latency to a management endpoint is spiking, you will know before users start complaining.
vCenter Server (36 Checks)
The deepest coverage of any VCF health tool:
- API and Identity: API reachability and session creation, version and build detection, VCF BOM compliance validation
- System Health: Overall status (green/yellow/orange/red), CPU load, memory, storage, swap, database health — each as an independent check
- Inventory: VM count, ESXi host connection states (CONNECTED, DISCONNECTED, NOT_RESPONDING)
- Cluster Configuration: HA enabled/disabled per cluster, DRS configuration and mode per cluster
- Storage: Datastore capacity per-datastore with configurable warning (80%) and critical (90%) thresholds, vSAN cluster overall health, vSAN group-level test results, vSAN resync and rebuild status, vSAN disk group health per physical disk, storage policy compliance count, content library sync status
- New Checks: Orphaned VM object detection (orphaned, inaccessible, and invalid connection state VMs consuming storage), unclaimed vSAN-eligible disk identification (disks present but not claimed by any disk group), cluster CPU capacity threshold monitoring (configurable warn/critical percentages), cluster memory capacity threshold monitoring
- Time and Security: NTP synchronization mode and server reachability, root password expiry (days remaining), license expiry and evaluation license detection
- Services: Individual status checks for vpxd, STS, vpostgres, rhttpproxy, SCA, SPS, vsan-health, and vLCM
- Operational Health: Active alarm count and severity, VM snapshot aging (configurable, default 72 hours), VCSA disk partition utilization (warn 80%, critical 90%), 24-hour event log error scan, ESXi hardware sensor health (parallelized), ESXi scratch partition check, DRS activity and migration tracking
42 checks. Every one of them matters. Every one of them has caused a real outage somewhere.
New in v9.0: vCenter alarm summary queries all active alarms via PropertyCollector, reporting critical and warning counts. ESXi BOM compliance validates each host build number against a customizable `vcf-bom-reference.json` reference file. Password expiry monitoring covers SSO accounts and ESXi root credentials via the SDDC credential store. vSAN resync and rebuild detection catches active resync operations that indicate data movement. Clock drift detection via HTTP Date header comparison identifies time synchronization issues. SDDC upgrade readiness queries the `/v1/upgradables` API to detect available updates.
SDDC Manager (19 Checks)
- Connectivity: API reachability with retry and exponential backoff (automatic 5-second retry on first failure)
- Status: Version detection, NSX and vCenter status from SDDC Manager's perspective, management component health per-component deployment status
- Security: Certificate inventory with 60-day warning threshold, credential and password expiry monitoring
- Backup: Backup age verification (configurable, default 48 hours), backup success/failure status
- Lifecycle: System prechecks compliance, in-progress task count and stale task detection (stuck >24 hours), resource lock detection, host drift detection (ASSIGNED vs COMMISSIONED state), depot connectivity verification, lifecycle update availability
- Auto-Remediation: Optional automatic cancellation of stale tasks blocking the lifecycle queue
New in v9.0: upgrade readiness checking via the upgradables API, enhanced backup age and size tracking with configurable thresholds, and credential store password expiry monitoring.
SDDC Manager is the control plane of your entire VCF deployment. If it is unhealthy, everything downstream is at risk. A single stuck task can block every lifecycle operation in the environment.
NSX Manager (13 Checks)
API reachability with VIP-first and individual node fallback, version detection, control cluster stability, management cluster stability, transport node state (connected count vs total), critical and warning alarm counts, admin password expiry, edge cluster member health per edge cluster, distributed firewall rule count with configurable threshold (default 500), certificate validation, and transport zone membership count.
Network topology validation verifies transport zone and node state health. Transport zone membership counts confirm proper host bindings.
NSX problems cascade fast. A failing transport node or a degraded control cluster can take down your entire network overlay in minutes. A DFW rule count exceeding 500 signals a rule sprawl problem that will impact performance and make troubleshooting a nightmare.
VCF Operations and ARIA Suite (4 Checks)
Suite API reachability and token acquisition, node online status per cluster member, collector health verification, and adapter data receiving verification (active adapters vs total).
Fleet and vRSLCM (2 Checks)
API reachability (through VCF Operations proxy or direct endpoint) and environment deployment status.
Custom Plugin Checks
Need to check something specific to your environment? Drop a shell script in the plugin directory. VCF Health Check auto-discovers plugins, runs them, parses their output, and integrates results directly into the grading, scoring, and reporting system. No code changes required.
The Report That Changes How You Work
After every health check, VCF Health Check generates comprehensive reports automatically. Not a wall of text. Not a raw log dump. Professionally designed, interactive reports that you can hand to your CTO, your client, or your operations team.
Interactive HTML Report
A single self-contained HTML file with no external dependencies. Open it in any browser. Everything is inline — CSS, JavaScript, SVG charts, and all data.
Visualization:
- Donut SVG chart showing pass/warn/fail proportions at a glance
- Per-component trend line charts with grade annotations and trend statistics card showing improvement trajectories
- Score sparkline trend chart showing your health trajectory over runs
- 90-day heatmap calendar — a color-coded grid showing your daily health grade for the past three months
- SVG dependency map showing component relationships with health-colored nodes and edges
- SLA uptime tracker with per-component uptime percentages and mini timeline bars
- Network latency table with color-coded performance badges (good/warn/slow)
- Component health cards grid with grade, score, and pass/warn/fail counts per component
Navigation and Filtering:
- Dark mode and light mode toggle (button or keyboard shortcut D)
- Executive View (high-level summary for leadership) and Technical View (full details for engineers) — toggle with keyboard shortcut E
- Pass, Warning, and Failure filter buttons to focus on what matters
- Sidebar navigation with anchor links to every section
- Collapsible cards for each section to reduce scrolling
Comparison:
- Diff view to compare any two historical runs side by side and see exactly what changed, what improved, and what regressed
Actionable Output:
- Remediation checklist with checkboxes per issue, estimated time-to-fix badges per item, grouped by component
- Playbook export button (keyboard shortcut P) — downloads the entire remediation checklist as a text file ready to hand to an engineer
- Copy executive summary button for pasting into emails or tickets
- CSV export button for loading into spreadsheets
- Print button with print-optimized CSS stylesheet
Usability:
- Keyboard shortcuts panel for power users
- Mobile-responsive CSS with breakpoints at 768px and 480px for checking reports on phones and tablets
- Auto-remediation log section (when —fix was used)
All 10 Report Formats
| Format | Generated | Description |
|---|---|---|
| Interactive HTML | Always | Self-contained file with all features above |
| JSON | Always | Machine-readable structured data for integrations |
| Plain Text | Always | Terminal-friendly with ASCII trend line |
| Auto (if Chrome/Edge available) | Print-ready from the HTML report | |
| CSV | —csv flag | One row per check for spreadsheets |
| Markdown | —markdown flag | Tables grouped by component for wikis |
| Prometheus/OpenMetrics | —prometheus flag | Gauge metrics for Prometheus scraping |
| Ansible Inventory | —ansible flag | YAML groups with host variables |
| Multi-Environment Dashboard | —merge-reports flag | Consolidated grid of all environments |
| Config Backup | —backup-config flag | Topology and settings JSON (no credentials) |
Plus: a JSON Schema file (draft 2020-12) generated alongside every JSON report for downstream validation.
Alerts Where You Already Work
VCF Health Check integrates with ten notification channels:
| Channel | How It Works |
|---|---|
| Email (SMTP) | HTML report attached. Tries msmtp, sendmail, then curl SMTP. Grade and score in subject line. |
| Slack | Color-coded attachment block. Grade, score, failure list, environment name. |
| Microsoft Teams | Adaptive Card with fact rows for grade, score, failures, warnings, date. |
| PagerDuty | Events API v2. Auto-creates incidents on failures. Dedup key prevents duplicates. Auto-resolves when grade returns to A/B. |
| OpsGenie | Alerts API with priority mapping (F=P1 through A=P5). Tags include component names. |
| Syslog (RFC 5424) | UDP to configurable host:port. One message per failed check. Compatible with Splunk, ELK, and any SIEM. |
| Custom Webhook | HTTP POST with full JSON payload. Connect to ServiceNow, Grafana, Datadog, or anything with an API. |
| ServiceNow | The `—servicenow` flag auto-creates incidents from health check failures, mapping the health grade to ServiceNow priority levels. Configure with `SERVICENOW_INSTANCE`, `SERVICENOW_USER`, and `SERVICENOW_PASS` environment variables. |
| Jira | The `—jira` flag auto-creates Jira issues from failures. Configure with `JIRA_URL`, `JIRA_USER`, `JIRA_TOKEN`, and `JIRA_PROJECT` environment variables. |
| Email Digest | The `—email-report` flag sends the full HTML report as an email attachment with grade and score summary in the subject line. |
Threshold-based alerting: Set a grade threshold (default: C) and only get notified when the grade drops to that level or below. No noise. No alert fatigue. Just the signal you need.
Severity mapping: PagerDuty and OpsGenie automatically map the health grade to severity levels. An F grade pages the on-call as P1/Critical. A C grade creates a warning. A/B grades resolve existing incidents automatically.
A Desktop Application Built for Operations Teams
VCF Health Check is a professional desktop application — not a script you run in a terminal. It is built with Python and Tkinter, runs on Windows, macOS, and Linux, and provides a complete graphical interface for every feature.
Splash Screen and First Impressions
The application launches with an animated splash screen — gradient background, floating particle effects, progress bar, and fade-in branding. It is a small detail, but it sets the tone: this is a professional tool, not a weekend project.
Dashboard
A single screen showing your current health grade as a large color-coded badge (A green through F red), numeric score percentage, component-by-component health cards in a grid layout, executive summary text, previous grade comparison, and a score trend chart with grade zone shading. If you manage multiple profiles, a multi-environment overview section shows every environment's grade at a glance.
Two quick-action buttons: "Run New Check" and "View Reports."
Environment Configuration
Nine collapsible form sections covering every configuration option:
- Management Endpoints: vCenter, SDDC Manager, NSX VIP, NSX Node, VCF Operations, Fleet/vRSLCM
- Credentials: SSO, NSX, VCF Operations, Fleet, ESXi — each with show/hide password toggle
- ESXi Hosts: Space-separated list of host IPs/FQDNs
- Timeouts: Per-component configurable timeouts (Infrastructure, vCenter, SDDC, NSX, Operations, Fleet)
- Thresholds: Certificate warning days, datastore warning/critical %, task count, snapshot hours, backup hours, DFW rule count, cluster CPU warn/critical %, cluster memory warn/critical %
- Notifications: SMTP settings, Slack webhook, Teams webhook, generic webhook, notify threshold grade
- Incident Management: PagerDuty routing key, OpsGenie API key
- Scoring Weights: Per-component weight multiplier for health score calculation
- Customer Branding: Company name, logo file, contact email, environment label
Every field has a tooltip on hover. Input validation highlights errors (IP/FQDN format, email format, URL format, numeric ranges). Unsaved-changes detection warns before navigating away.
Run Options: Checkboxes for —fix, —cleanup-tasks, —diff, —csv, —markdown, —quiet, and —json-only.
Profile Management
Managing multiple VCF environments is first-class:
- Save / Load: Named profiles stored with encrypted credentials
- Clone: Duplicate a profile for a similar environment
- Import .env / Export .env: Exchange configurations with the bash engine
- Import JSON / Export JSON: Portable configuration backup
- Reset to Defaults: One-click return to default values
- Profile Dropdown: Switch between environments instantly from the sidebar
Operators get read-only access to profiles. Admins get full create/save/delete/clone/import/export.
Run Check
- "Run Full Health Check" button to execute all checks
- "Validate Only" button to test connectivity without running checks
- "Stop" button to terminate a running check
- Live terminal output with color-coded results (green=PASS, yellow=WARN, red=FAIL)
- Elapsed timer updating every second
- In-GUI Scheduling: Dropdown for Off / Every 30 Minutes / Every Hour / Every 4 Hours / Daily
- After completion: one-click buttons to open TXT, HTML, JSON, CSV, and Markdown reports
Reports View
Scrollable list of all historical report sets. Each row shows the grade badge, numeric score, date/time, and pass/warn/fail/total counts. One-click buttons to open HTML, JSON, TXT, or PDF reports. Admin-only "Cleanup Old Reports" button deletes reports older than the retention threshold.
Run History and Trends
- Statistics Card: Total runs, latest grade, trend arrow (improving/stable/declining), average score, best grade, worst grade
- Score Trend Chart: Line graph with grade zone bands (A green / B yellow-green / C orange / D dark-orange / F red), data points, area fill, date labels
- Grade Distribution: Count of runs at each grade level
Suppressions (Admin Only)
Manage known issues that should not count as failures. Each suppression rule has a regex pattern matched against check messages and a note field for the reason or ticket reference. Matched checks are recorded as SKIP and excluded from the failure count. Changes persist to known-issues.json.
User Management (Admin Only)
- Add users with username, role (admin/operator), and initial password
- Reset passwords for any user
- Delete users (cannot delete the last admin)
- User list shows username and role tag
Audit Log (Admin Only)
Terminal-style scrollable display of the last 500 audit entries. Every action is logged: login, logout, password changes, profile operations, health check runs, report access, user management, suppression changes, settings modifications, license activation.
Filter/search input, refresh button, clear log button, and CSV export for external analysis.
Settings (Admin Only)
- Script Paths: Health check script and bash interpreter with browse and auto-detect buttons
- Appearance: System, Light, or Dark theme (sidebar always stays dark)
- Security: Password expiry days and session timeout minutes
- License: License key input, activate button, status display (Active / Expired / Trial / Grace Period)
- LDAP/AD: Server URL, Base DN, Bind credentials, Enable checkbox (only visible if ldap3 package installed)
Help View
Built-in 13-section guide covering every feature: Getting Started, Dashboard, Environment, Run Check, Reports, Run History, Suppressions, Users, Audit Log, Settings, Scheduling, Troubleshooting, and System Requirements. No external documentation needed.
About View
Application version, copyright, license status with expiry date, system information (Python version, platform, Tkinter version, paths, profile count, user count), and contact information. Accessible via sidebar navigation or F1 keyboard shortcut.
Security Built for Enterprise
Credential Encryption
All profile credentials encrypted at rest using Fernet symmetric encryption (from the cryptography Python package). A machine-local encryption key is generated at ~/.vcf-hc-key with chmod 600 permissions. Legacy base64-only encoding is automatically detected and upgraded to Fernet on the next profile save. Exported .env files are created with chmod 600.
Password Hashing
User passwords hashed with PBKDF2-HMAC-SHA256 using 310,000 iterations and a 16-byte random salt unique to each password. Legacy hash formats (SHA-256 unsalted, lower-iteration PBKDF2) are automatically detected and upgraded after successful login.
Role-Based Access Control
Two roles: admin and operator. Admins have full access to all views and operations. Operators can run checks and view reports but cannot modify settings, manage users, edit suppressions, or delete profiles.
Authentication and Session Security
- Login required on application start
- Session timeout with automatic lock and re-authentication dialog (configurable minutes)
- Brute-force lockout: 5 failed attempts triggers a 15-minute account lock with timestamp tracking
- Password expiry with configurable days and login-time warnings
- Change Password available to any logged-in user
- Optional LDAP/Active Directory authentication for enterprise environments
Audit Trail
Every action logged to vcf-health-audit.log with ISO timestamp, username, role, action category, and detail. Exportable to CSV for compliance reporting. Admin-only access.
Shell Script Security
Lock file prevents concurrent execution. All temporary files written to mktemp-based directories. Credentials passed via environment variables, never as command-line arguments. SSL verification configurable with custom CA bundle support.
The Grading System
VCF Health Check does not give you a wall of data. It gives you a single letter grade that instantly communicates the state of your environment to anyone.
| Grade | Criteria | What It Means |
|---|---|---|
| A | Score >= 95%, zero failures, zero warnings | Everything is healthy. No action needed. |
| B+ | Score >= 90%, zero failures; or >= 85% with <= 1 failure | Minor issues only. Review at your convenience. |
| B | Zero failures at any score; or >= 80% with <= 3 failures | Warnings present but no critical problems. Plan remediation. |
| C | Score >= 70% with <= 5 failures | Multiple issues need attention this week. |
| D | Score >= 50% | Significant problems. Address immediately. |
| F | Score < 50% | Critical state. Environment at risk of outage. |
Weighted Scoring: Each component carries a configurable weight multiplier. By default, vCenter, SDDC Manager, and NSX are weighted 2x (because failures in these components cascade). Infrastructure, VCF Operations, and Fleet are weighted 1x. Adjust weights to match your environment's priorities.
Formula: Sum of (component_pass_checks component_weight) divided by sum of (component_total_checks component_weight).
Automation That Runs While You Sleep
OS-Level Scheduling
| Platform | How | One Command |
|---|---|---|
| Windows | Task Scheduler | `—schedule` generates XML and registers the task |
| Linux (cron) | crontab | `—cron daily` generates the cron entry |
| Linux (systemd) | Timer + Service | `—cron` generates both unit files |
| macOS | crontab | `—cron` generates the entry |
In-GUI Scheduling
The Run Check view includes a scheduling dropdown: Off, Every 30 Minutes, Every Hour, Every 4 Hours, or Daily. Select a frequency and the application runs health checks automatically in the background while the GUI stays open.
Auto-Remediation
Some problems have obvious, safe fixes. Enable the —fix flag and VCF Health Check will:
- Cancel stale SDDC Manager tasks blocking the lifecycle queue
- Apply safe configuration corrections for known drift conditions
- Acknowledge stale vCenter alarms older than 72 hours
- Clean up failed vCenter tasks older than 7 days
- Restart stopped NSX services
- Report a summary of all fix actions taken
For everything else, the remediation playbook tells you exactly what to do — step by step, with estimated time to fix each issue, grouped by component, with checkboxes you can tick off as you work through them.
Built for MSPs and Service Providers
If you manage VCF environments for multiple clients, VCF Health Check was designed with you in mind.
Branded Reports
Put your company logo and name on every health check report. Configure company name, logo file, and contact email per profile. Your clients see your brand, not ours. Every report — HTML, PDF, TXT — carries your branding.
Client Tracking
Organize environments by client using the optional client management module. Track usage per client. Generate billing reports per client. Know exactly how much monitoring work you are doing for each customer.
Usage Analytics and Billing
The optional usage tracker module shows run counts per client and environment, with CSV, JSON, and billing summary exports. Built-in data for invoicing without manual tracking.
White Label
At the Enterprise tier, completely rebrand VCF Health Check as your own product. Your name, your branding, your product — powered by our engine.
API Export
Pull health check data programmatically into your own dashboards. The JSON report, Prometheus metrics, and webhook integrations let you feed data into ServiceNow, Grafana, Datadog, or any system your operations team uses.
Multi-Environment Visibility
Managing production, staging, and development environments? Monitoring VCF deployments across multiple data centers or client sites?
Multi-Environment Dashboard
The —merge-reports flag takes a directory of JSON reports from multiple environments and generates a single dark-themed HTML dashboard. Each environment appears as a card showing: environment name, grade badge, numeric score, last run date, and pass/warn/fail counts. One page, every environment, instant visibility.
In-GUI Multi-Environment Overview
When you have more than one profile configured, the Dashboard view automatically shows a multi-environment section with cards for every profile — grade, score, and last run date. No extra configuration needed.
Configuration That Fits Your Environment
Configurable Thresholds
Every threshold is adjustable. Nothing is hardcoded:
| Threshold | Default | What It Controls |
|---|---|---|
| Certificate Warning | 30 days | How far in advance to warn about expiring certificates |
| Datastore Warning | 80% | Datastore capacity warning level |
| Datastore Critical | 90% | Datastore capacity critical level |
| Cluster CPU Warning | 70% | Cluster CPU utilization warning |
| Cluster CPU Critical | 85% | Cluster CPU utilization critical |
| Cluster Memory Warning | 70% | Cluster memory utilization warning |
| Cluster Memory Critical | 85% | Cluster memory utilization critical |
| Snapshot Warning | 72 hours | VM snapshot age before warning |
| Backup Warning | 48 hours | SDDC Manager backup age before warning |
| DFW Rule Warning | 500 rules | NSX firewall rule count before warning |
| Task Warning | 5 tasks | In-progress SDDC task count before warning |
| Report Retention | 30 days | How long to keep historical reports |
Scoring Weights
| Component | Default Weight | Why |
|---|---|---|
| Infrastructure | 1x | Foundation layer — DNS, certs, network |
| vCenter | 2x | Core compute management — failures cascade |
| SDDC Manager | 2x | Lifecycle control plane — failures block operations |
| NSX | 2x | Network overlay — failures isolate workloads |
| VCF Operations | 1x | Monitoring — failure reduces visibility but not operations |
| Fleet | 1x | Aria lifecycle — failure blocks updates but not operations |
Adjust weights to match your priorities. A financial services company might weight NSX at 3x because microsegmentation is compliance-critical. An MSP might weight vCenter at 3x because it is the primary client-facing component.
Expected Down Components
If a component is intentionally offline (maintenance, decommissioning, not yet deployed), use the —known-down flag or the EXPECTED_DOWN configuration. Checks against that component are suppressed so they do not drag down the grade.
CLI Power
| Flag | What It Does |
|---|---|
| `—only COMPONENT` | Run checks for one component only |
| `—skip COMPONENT` | Skip one component |
| `—known-down COMPONENT` | Suppress failures for an expected-down component |
| `—env FILE` | Load configuration from a .env file |
| `—fix` | Auto-remediate safe issues |
| `—cleanup-tasks` | Cancel stale SDDC Manager tasks |
| `—diff` | Compare current run to previous run |
| `—validate` | Test connectivity only, no checks |
| `—json-only` | Skip HTML/TXT, generate JSON only |
| `—csv` | Generate CSV report |
| `—markdown` | Generate Markdown report |
| `—prometheus` | Emit Prometheus metrics to stdout |
| `—ansible` | Generate Ansible inventory YAML |
| `—archive` | Compress report set into tar.gz |
| `—syslog HOST:PORT` | Override syslog destination |
| `—backup-config` | Export topology and settings JSON |
| `—merge-reports DIR` | Merge reports into multi-environment dashboard |
| `—quiet` | Suppress terminal output |
| `—schedule` | Register Windows Task Scheduler task |
| `—cron [daily/hourly]` | Generate cron entry and systemd timer |
| `—servicenow` | Create ServiceNow incidents from failures |
| `—jira` | Create Jira issues from failures |
| `—email-report` | Send HTML report as email attachment |
Architecture
VCF Health Check is a desktop application that runs entirely on your infrastructure. No cloud. No agents. No SaaS. It uses the existing management APIs that are already exposed by every VCF component.
How It Works
1. You configure your environment endpoints and credentials in the GUI or a .env file
2. You click "Run Full Health Check" (or the scheduler triggers it)
3. The bash engine authenticates against every component API and runs all checks
4. Infrastructure checks run sequentially. vCenter checks run sequentially (API session reuse). SDDC Manager, NSX, VCF Operations, and Fleet checks run simultaneously in parallel background subshells. ESXi per-host checks are parallelized with background process tracking. Python ESXi sensor checks use ThreadPoolExecutor with 8 concurrent workers.
5. The Python report generator reads all results and produces HTML, JSON, and text reports
6. Notifications fire based on the grade threshold
7. Results are stored for historical trending
Codebase
| File | Lines | Language | Role |
|---|---|---|---|
| vcf-health-check-gui.py | 6,389 | Python / Tkinter | Desktop GUI application |
| vcf-health-check.sh | 3,757 | Bash | Health check engine |
| vcf_checks.py | 1,053 | Python | Report generator and vCenter API checks |
| Total | 12,000+ |
267 automated tests across two test suites validate every feature.
System Requirements
| Requirement | Detail |
|---|---|
| Python | 3.8 or later |
| Bash | 4.0 or later (Git Bash on Windows) |
| curl | For all API calls |
| OpenSSL | For certificate expiry checks |
| nslookup | For DNS checks |
| Network | HTTPS (443) to all VCF management endpoints |
| Optional: cryptography | Python package for credential encryption |
| Optional: ldap3 | Python package for LDAP/AD authentication |
| Optional: Chrome/Edge | For PDF report generation |
Runs on Windows 10/11, macOS, and any Linux distribution. No installation required — run the files directly. No cloud dependency. All data stays on your infrastructure.
What Makes This Different
Built exclusively for VMware Cloud Foundation. Not adapted from a generic monitoring platform. Not a plugin for a larger tool. Every one of the 150+ checks was designed specifically for VCF components and the relationships between them. Orphaned VM detection. Unclaimed vSAN disk identification. Cluster capacity thresholds. SDDC Manager stale task detection. BOM compliance. These are VCF problems that generic tools do not know to check.
Runs in under 5 minutes. A complete health audit of your entire VCF environment in under five minutes. Parallel execution across components means five things are being checked simultaneously.
Gives you a grade. Not a wall of data. A single letter grade (A through F) that instantly communicates the state of your environment to anyone — from the engineer who needs to fix things to the executive who needs to report on them.
Tells you how to fix things. Not just what is wrong. Every failure and warning comes with specific remediation steps, estimated time to resolve, and a downloadable playbook checklist you can hand to an engineer and say "fix these."
Works offline. No cloud dependency. No data leaving your network. No SaaS subscription that requires internet access. Install it, run it, keep your data on your infrastructure.
Generates reports you can actually use. Ten report formats. Hand the HTML to a client. Feed the JSON to ServiceNow. Scrape the Prometheus metrics with Grafana. Generate an Ansible inventory from your live environment. Export a remediation playbook. Print the PDF and pin it to the wall. The reports are designed to be useful, not just complete.
Enterprise security from day one. Fernet-encrypted credentials, PBKDF2 password hashing with 310,000 iterations, role-based access control, brute-force lockout, session timeouts, full audit trail, optional LDAP integration. This is not a script with passwords in a text file.
Pricing
VCF Health Check is priced per environment per month. The more environments you monitor, the lower your per-environment cost. Annual billing saves 15%.
One-time $500 onboarding fee includes your setup call, configuration assistance, and branded report setup.
Per-Environment Rates
| Environments | Standard | Professional | Enterprise |
|---|---|---|---|
| 1-9 | $299/env/mo | $399/env/mo | $499/env/mo |
| 10-24 | $249/env/mo | $349/env/mo | $449/env/mo |
| 25-99 | $199/env/mo | $299/env/mo | $399/env/mo |
| 100+ | $149/env/mo | $249/env/mo | $349/env/mo |
Standard Edition
All 150+ health checks. All 10 report formats. All 10 notification channels. Desktop GUI application with encrypted credential storage, role-based access, and audit trail. Everything you need to monitor your VCF environments professionally.
Professional Edition
Everything in Standard, plus branded reports with your company logo, automated scheduling (Windows Task Scheduler, cron, systemd), and multi-environment dashboard. This is the edition for service providers who want to look professional and save time.
Enterprise Edition
Everything in Professional, plus API export for dashboard integrations (ServiceNow, Grafana, Datadog), full white-label capability to rebrand the product as your own, usage analytics and billing module, and priority support. This is the edition for organizations that want maximum flexibility and zero limitations.
Example Monthly Costs
A solo consultant monitoring 3 environments on Standard pays $897 per month. An MSP monitoring 25 environments on Professional pays $7,475 per month. A large partner monitoring 50 environments on Enterprise pays $19,950 per month. Volume pricing for 100+ environments is available on request.
Partner Program
VCF Health Check is available through channel partnerships for VMware/Broadcom solution providers, MSPs, and distributors. Whether you resell VCF solutions, manage client environments, or distribute VMware technology — we have a partnership model designed for your business.
Channel Partner Benefits
- Branded Reports — deliver health check reports with your company logo and name on every page
- White-Label Capability — rebrand VCF Health Check as your own product (Enterprise tier)
- Per-Client Tracking — usage analytics with billing summary exports for each of your customers
- Multi-Environment Dashboard — consolidated health view across all client environments
- Scheduling & Automation — automated health checks on your schedule with zero manual intervention
Distribution Partnerships
We offer distribution agreements with competitive reseller margins for technology distributors. One partnership provides access to dozens of downstream partner customers through your existing reseller network. White-label Enterprise licensing enables your partners to rebrand and resell under their own brand.
Become a Partner
We are actively onboarding VMware/Broadcom partners, MSPs, and distributors. To explore a partnership:
- Email: [mhayes@virtualcontrolllc.com](mailto:mhayes@virtualcontrolllc.com)
- Request a Partner Demo — see branded reports, white-label capability, and multi-environment dashboards in action
- Custom Pricing — volume and distribution pricing available for qualified partners
Get Started
Contact Virtual Control LLC to discuss your VCF monitoring needs and get your license key.
Every new installation includes a Setup Wizard that walks you through activation in under two minutes. Paste your key, confirm your paths, and you are running health checks immediately.
No lengthy onboarding. No professional services engagement. No training required. If you can manage a VCF environment, you can use VCF Health Check.
Your VCF environments are too important to monitor with hope and manual spot checks. Give your team — and your clients — the confidence that comes from knowing everything is healthy, every single day.
Copyright 2026 Virtual Control LLC. All rights reserved.
VCF Health Check is a product of Virtual Control LLC.
VMware and VMware Cloud Foundation are trademarks of Broadcom Inc.