Virtual Control LLC

VCF Health Check

Automated Health Monitoring & Compliance for VMware Cloud Foundation
One click. Under five minutes. Complete visibility.
150+Health Checks
<5 minFull Audit
10Report Formats
10Alert Channels

Your VCF Environment Deserves Better Than "It Seems Fine"


The Problem You Already Know About

You are running VMware Cloud Foundation. Your environment has vCenter, SDDC Manager, NSX, ESXi hosts, certificates, DNS records, datastores, vSAN clusters, and dozens of interconnected services — all of which need to be healthy for your infrastructure to function.

Right now, how do you know everything is healthy?

You check vCenter manually. You SSH into SDDC Manager. You open NSX and look around. You hope nothing slipped through the cracks. When something does break, you spend hours figuring out what went wrong, tracing it back through logs, and explaining to stakeholders why it happened.

A certificate expired silently. DNS stopped resolving for a management component. A vSAN disk group went degraded and nobody noticed. An SDDC Manager backup has not run in 47 days. A vCenter service crashed and is not running. ESXi hosts drifted out of compliance. Orphaned VM objects are consuming storage nobody can account for. Unclaimed disks are sitting idle while the team debates whether to buy more capacity.

These are not hypothetical problems. They are the problems that wake you up at 2 AM, cost your company thousands in downtime, and erode trust with the clients you serve.


What If You Could Check Everything In Under 5 Minutes

VCF Health Check connects to your entire VCF environment, runs over 130 individual checks across every major component, and gives you a single grade (A through F) with a detailed report — all in under five minutes.

One click. Under five minutes. Complete visibility.

No manual SSH sessions. No hopping between six management consoles. No spreadsheets of things to remember to check. VCF Health Check does it all automatically, every time, consistently, and tells you exactly what needs attention — with specific remediation steps for every issue found.


What Gets Checked

VCF Health Check is not a generic monitoring tool. It was built specifically for VMware Cloud Foundation environments and understands the relationships between components that matter. Every check was designed to catch a real problem that has caused a real outage somewhere.

Infrastructure and Network (14 Checks)

DNS forward resolution for every management component — vCenter, SDDC Manager, NSX VIP, and NSX Node. DNS reverse resolution (PTR record) consistency for the same four endpoints. SSL certificate expiry monitoring with configurable advance warnings (default: 30 days). Network latency tracking with TCP connect time measurement to vCenter, SDDC Manager, and NSX — reported in a color-coded latency table. ESXi host HTTPS reachability checks running in parallel for speed, plus per-host SSH port verification and coredump configuration validation. Clock drift detection compares time across all management endpoints using HTTP Date headers, warning at 5-second drift and failing at 30-second drift.

If a DNS record is wrong, you will know before it causes a deployment failure. If a certificate is expiring in 30 days, you will know before it causes an outage at 2 AM. If latency to a management endpoint is spiking, you will know before users start complaining.

vCenter Server (36 Checks)

The deepest coverage of any VCF health tool:

42 checks. Every one of them matters. Every one of them has caused a real outage somewhere.

New in v9.0: vCenter alarm summary queries all active alarms via PropertyCollector, reporting critical and warning counts. ESXi BOM compliance validates each host build number against a customizable `vcf-bom-reference.json` reference file. Password expiry monitoring covers SSO accounts and ESXi root credentials via the SDDC credential store. vSAN resync and rebuild detection catches active resync operations that indicate data movement. Clock drift detection via HTTP Date header comparison identifies time synchronization issues. SDDC upgrade readiness queries the `/v1/upgradables` API to detect available updates.

SDDC Manager (19 Checks)

New in v9.0: upgrade readiness checking via the upgradables API, enhanced backup age and size tracking with configurable thresholds, and credential store password expiry monitoring.

SDDC Manager is the control plane of your entire VCF deployment. If it is unhealthy, everything downstream is at risk. A single stuck task can block every lifecycle operation in the environment.

NSX Manager (13 Checks)

API reachability with VIP-first and individual node fallback, version detection, control cluster stability, management cluster stability, transport node state (connected count vs total), critical and warning alarm counts, admin password expiry, edge cluster member health per edge cluster, distributed firewall rule count with configurable threshold (default 500), certificate validation, and transport zone membership count.

Network topology validation verifies transport zone and node state health. Transport zone membership counts confirm proper host bindings.

NSX problems cascade fast. A failing transport node or a degraded control cluster can take down your entire network overlay in minutes. A DFW rule count exceeding 500 signals a rule sprawl problem that will impact performance and make troubleshooting a nightmare.

VCF Operations and ARIA Suite (4 Checks)

Suite API reachability and token acquisition, node online status per cluster member, collector health verification, and adapter data receiving verification (active adapters vs total).

Fleet and vRSLCM (2 Checks)

API reachability (through VCF Operations proxy or direct endpoint) and environment deployment status.

Custom Plugin Checks

Need to check something specific to your environment? Drop a shell script in the plugin directory. VCF Health Check auto-discovers plugins, runs them, parses their output, and integrates results directly into the grading, scoring, and reporting system. No code changes required.


The Report That Changes How You Work

After every health check, VCF Health Check generates comprehensive reports automatically. Not a wall of text. Not a raw log dump. Professionally designed, interactive reports that you can hand to your CTO, your client, or your operations team.

Interactive HTML Report

A single self-contained HTML file with no external dependencies. Open it in any browser. Everything is inline — CSS, JavaScript, SVG charts, and all data.

Visualization:

Navigation and Filtering:

Comparison:

Actionable Output:

Usability:

All 10 Report Formats

Format Generated Description
Interactive HTML Always Self-contained file with all features above
JSON Always Machine-readable structured data for integrations
Plain Text Always Terminal-friendly with ASCII trend line
PDF Auto (if Chrome/Edge available) Print-ready from the HTML report
CSV —csv flag One row per check for spreadsheets
Markdown —markdown flag Tables grouped by component for wikis
Prometheus/OpenMetrics —prometheus flag Gauge metrics for Prometheus scraping
Ansible Inventory —ansible flag YAML groups with host variables
Multi-Environment Dashboard —merge-reports flag Consolidated grid of all environments
Config Backup —backup-config flag Topology and settings JSON (no credentials)

Plus: a JSON Schema file (draft 2020-12) generated alongside every JSON report for downstream validation.


Alerts Where You Already Work

VCF Health Check integrates with ten notification channels:

Channel How It Works
Email (SMTP) HTML report attached. Tries msmtp, sendmail, then curl SMTP. Grade and score in subject line.
Slack Color-coded attachment block. Grade, score, failure list, environment name.
Microsoft Teams Adaptive Card with fact rows for grade, score, failures, warnings, date.
PagerDuty Events API v2. Auto-creates incidents on failures. Dedup key prevents duplicates. Auto-resolves when grade returns to A/B.
OpsGenie Alerts API with priority mapping (F=P1 through A=P5). Tags include component names.
Syslog (RFC 5424) UDP to configurable host:port. One message per failed check. Compatible with Splunk, ELK, and any SIEM.
Custom Webhook HTTP POST with full JSON payload. Connect to ServiceNow, Grafana, Datadog, or anything with an API.
ServiceNow The `—servicenow` flag auto-creates incidents from health check failures, mapping the health grade to ServiceNow priority levels. Configure with `SERVICENOW_INSTANCE`, `SERVICENOW_USER`, and `SERVICENOW_PASS` environment variables.
Jira The `—jira` flag auto-creates Jira issues from failures. Configure with `JIRA_URL`, `JIRA_USER`, `JIRA_TOKEN`, and `JIRA_PROJECT` environment variables.
Email Digest The `—email-report` flag sends the full HTML report as an email attachment with grade and score summary in the subject line.

Threshold-based alerting: Set a grade threshold (default: C) and only get notified when the grade drops to that level or below. No noise. No alert fatigue. Just the signal you need.

Severity mapping: PagerDuty and OpsGenie automatically map the health grade to severity levels. An F grade pages the on-call as P1/Critical. A C grade creates a warning. A/B grades resolve existing incidents automatically.


A Desktop Application Built for Operations Teams

VCF Health Check is a professional desktop application — not a script you run in a terminal. It is built with Python and Tkinter, runs on Windows, macOS, and Linux, and provides a complete graphical interface for every feature.

Splash Screen and First Impressions

The application launches with an animated splash screen — gradient background, floating particle effects, progress bar, and fade-in branding. It is a small detail, but it sets the tone: this is a professional tool, not a weekend project.

Dashboard

A single screen showing your current health grade as a large color-coded badge (A green through F red), numeric score percentage, component-by-component health cards in a grid layout, executive summary text, previous grade comparison, and a score trend chart with grade zone shading. If you manage multiple profiles, a multi-environment overview section shows every environment's grade at a glance.

Two quick-action buttons: "Run New Check" and "View Reports."

Environment Configuration

Nine collapsible form sections covering every configuration option:

Every field has a tooltip on hover. Input validation highlights errors (IP/FQDN format, email format, URL format, numeric ranges). Unsaved-changes detection warns before navigating away.

Run Options: Checkboxes for —fix, —cleanup-tasks, —diff, —csv, —markdown, —quiet, and —json-only.

Profile Management

Managing multiple VCF environments is first-class:

Operators get read-only access to profiles. Admins get full create/save/delete/clone/import/export.

Run Check

Reports View

Scrollable list of all historical report sets. Each row shows the grade badge, numeric score, date/time, and pass/warn/fail/total counts. One-click buttons to open HTML, JSON, TXT, or PDF reports. Admin-only "Cleanup Old Reports" button deletes reports older than the retention threshold.

Run History and Trends

Suppressions (Admin Only)

Manage known issues that should not count as failures. Each suppression rule has a regex pattern matched against check messages and a note field for the reason or ticket reference. Matched checks are recorded as SKIP and excluded from the failure count. Changes persist to known-issues.json.

User Management (Admin Only)

Audit Log (Admin Only)

Terminal-style scrollable display of the last 500 audit entries. Every action is logged: login, logout, password changes, profile operations, health check runs, report access, user management, suppression changes, settings modifications, license activation.

Filter/search input, refresh button, clear log button, and CSV export for external analysis.

Settings (Admin Only)

Help View

Built-in 13-section guide covering every feature: Getting Started, Dashboard, Environment, Run Check, Reports, Run History, Suppressions, Users, Audit Log, Settings, Scheduling, Troubleshooting, and System Requirements. No external documentation needed.

About View

Application version, copyright, license status with expiry date, system information (Python version, platform, Tkinter version, paths, profile count, user count), and contact information. Accessible via sidebar navigation or F1 keyboard shortcut.


Security Built for Enterprise

Credential Encryption

All profile credentials encrypted at rest using Fernet symmetric encryption (from the cryptography Python package). A machine-local encryption key is generated at ~/.vcf-hc-key with chmod 600 permissions. Legacy base64-only encoding is automatically detected and upgraded to Fernet on the next profile save. Exported .env files are created with chmod 600.

Password Hashing

User passwords hashed with PBKDF2-HMAC-SHA256 using 310,000 iterations and a 16-byte random salt unique to each password. Legacy hash formats (SHA-256 unsalted, lower-iteration PBKDF2) are automatically detected and upgraded after successful login.

Role-Based Access Control

Two roles: admin and operator. Admins have full access to all views and operations. Operators can run checks and view reports but cannot modify settings, manage users, edit suppressions, or delete profiles.

Authentication and Session Security

Audit Trail

Every action logged to vcf-health-audit.log with ISO timestamp, username, role, action category, and detail. Exportable to CSV for compliance reporting. Admin-only access.

Shell Script Security

Lock file prevents concurrent execution. All temporary files written to mktemp-based directories. Credentials passed via environment variables, never as command-line arguments. SSL verification configurable with custom CA bundle support.


The Grading System

VCF Health Check does not give you a wall of data. It gives you a single letter grade that instantly communicates the state of your environment to anyone.

Grade Criteria What It Means
A Score >= 95%, zero failures, zero warnings Everything is healthy. No action needed.
B+ Score >= 90%, zero failures; or >= 85% with <= 1 failure Minor issues only. Review at your convenience.
B Zero failures at any score; or >= 80% with <= 3 failures Warnings present but no critical problems. Plan remediation.
C Score >= 70% with <= 5 failures Multiple issues need attention this week.
D Score >= 50% Significant problems. Address immediately.
F Score < 50% Critical state. Environment at risk of outage.

Weighted Scoring: Each component carries a configurable weight multiplier. By default, vCenter, SDDC Manager, and NSX are weighted 2x (because failures in these components cascade). Infrastructure, VCF Operations, and Fleet are weighted 1x. Adjust weights to match your environment's priorities.

Formula: Sum of (component_pass_checks component_weight) divided by sum of (component_total_checks component_weight).


Automation That Runs While You Sleep

OS-Level Scheduling

Platform How One Command
Windows Task Scheduler `—schedule` generates XML and registers the task
Linux (cron) crontab `—cron daily` generates the cron entry
Linux (systemd) Timer + Service `—cron` generates both unit files
macOS crontab `—cron` generates the entry

In-GUI Scheduling

The Run Check view includes a scheduling dropdown: Off, Every 30 Minutes, Every Hour, Every 4 Hours, or Daily. Select a frequency and the application runs health checks automatically in the background while the GUI stays open.

Auto-Remediation

Some problems have obvious, safe fixes. Enable the —fix flag and VCF Health Check will:

For everything else, the remediation playbook tells you exactly what to do — step by step, with estimated time to fix each issue, grouped by component, with checkboxes you can tick off as you work through them.


Built for MSPs and Service Providers

If you manage VCF environments for multiple clients, VCF Health Check was designed with you in mind.

Branded Reports

Put your company logo and name on every health check report. Configure company name, logo file, and contact email per profile. Your clients see your brand, not ours. Every report — HTML, PDF, TXT — carries your branding.

Client Tracking

Organize environments by client using the optional client management module. Track usage per client. Generate billing reports per client. Know exactly how much monitoring work you are doing for each customer.

Usage Analytics and Billing

The optional usage tracker module shows run counts per client and environment, with CSV, JSON, and billing summary exports. Built-in data for invoicing without manual tracking.

White Label

At the Enterprise tier, completely rebrand VCF Health Check as your own product. Your name, your branding, your product — powered by our engine.

API Export

Pull health check data programmatically into your own dashboards. The JSON report, Prometheus metrics, and webhook integrations let you feed data into ServiceNow, Grafana, Datadog, or any system your operations team uses.


Multi-Environment Visibility

Managing production, staging, and development environments? Monitoring VCF deployments across multiple data centers or client sites?

Multi-Environment Dashboard

The —merge-reports flag takes a directory of JSON reports from multiple environments and generates a single dark-themed HTML dashboard. Each environment appears as a card showing: environment name, grade badge, numeric score, last run date, and pass/warn/fail counts. One page, every environment, instant visibility.

In-GUI Multi-Environment Overview

When you have more than one profile configured, the Dashboard view automatically shows a multi-environment section with cards for every profile — grade, score, and last run date. No extra configuration needed.


Configuration That Fits Your Environment

Configurable Thresholds

Every threshold is adjustable. Nothing is hardcoded:

Threshold Default What It Controls
Certificate Warning 30 days How far in advance to warn about expiring certificates
Datastore Warning 80% Datastore capacity warning level
Datastore Critical 90% Datastore capacity critical level
Cluster CPU Warning 70% Cluster CPU utilization warning
Cluster CPU Critical 85% Cluster CPU utilization critical
Cluster Memory Warning 70% Cluster memory utilization warning
Cluster Memory Critical 85% Cluster memory utilization critical
Snapshot Warning 72 hours VM snapshot age before warning
Backup Warning 48 hours SDDC Manager backup age before warning
DFW Rule Warning 500 rules NSX firewall rule count before warning
Task Warning 5 tasks In-progress SDDC task count before warning
Report Retention 30 days How long to keep historical reports

Scoring Weights

Component Default Weight Why
Infrastructure 1x Foundation layer — DNS, certs, network
vCenter 2x Core compute management — failures cascade
SDDC Manager 2x Lifecycle control plane — failures block operations
NSX 2x Network overlay — failures isolate workloads
VCF Operations 1x Monitoring — failure reduces visibility but not operations
Fleet 1x Aria lifecycle — failure blocks updates but not operations

Adjust weights to match your priorities. A financial services company might weight NSX at 3x because microsegmentation is compliance-critical. An MSP might weight vCenter at 3x because it is the primary client-facing component.

Expected Down Components

If a component is intentionally offline (maintenance, decommissioning, not yet deployed), use the —known-down flag or the EXPECTED_DOWN configuration. Checks against that component are suppressed so they do not drag down the grade.

CLI Power

Flag What It Does
`—only COMPONENT` Run checks for one component only
`—skip COMPONENT` Skip one component
`—known-down COMPONENT` Suppress failures for an expected-down component
`—env FILE` Load configuration from a .env file
`—fix` Auto-remediate safe issues
`—cleanup-tasks` Cancel stale SDDC Manager tasks
`—diff` Compare current run to previous run
`—validate` Test connectivity only, no checks
`—json-only` Skip HTML/TXT, generate JSON only
`—csv` Generate CSV report
`—markdown` Generate Markdown report
`—prometheus` Emit Prometheus metrics to stdout
`—ansible` Generate Ansible inventory YAML
`—archive` Compress report set into tar.gz
`—syslog HOST:PORT` Override syslog destination
`—backup-config` Export topology and settings JSON
`—merge-reports DIR` Merge reports into multi-environment dashboard
`—quiet` Suppress terminal output
`—schedule` Register Windows Task Scheduler task
`—cron [daily/hourly]` Generate cron entry and systemd timer
`—servicenow` Create ServiceNow incidents from failures
`—jira` Create Jira issues from failures
`—email-report` Send HTML report as email attachment

Architecture

VCF Health Check is a desktop application that runs entirely on your infrastructure. No cloud. No agents. No SaaS. It uses the existing management APIs that are already exposed by every VCF component.

How It Works

1. You configure your environment endpoints and credentials in the GUI or a .env file

2. You click "Run Full Health Check" (or the scheduler triggers it)

3. The bash engine authenticates against every component API and runs all checks

4. Infrastructure checks run sequentially. vCenter checks run sequentially (API session reuse). SDDC Manager, NSX, VCF Operations, and Fleet checks run simultaneously in parallel background subshells. ESXi per-host checks are parallelized with background process tracking. Python ESXi sensor checks use ThreadPoolExecutor with 8 concurrent workers.

5. The Python report generator reads all results and produces HTML, JSON, and text reports

6. Notifications fire based on the grade threshold

7. Results are stored for historical trending

Codebase

File Lines Language Role
vcf-health-check-gui.py 6,389 Python / Tkinter Desktop GUI application
vcf-health-check.sh 3,757 Bash Health check engine
vcf_checks.py 1,053 Python Report generator and vCenter API checks
Total 12,000+

267 automated tests across two test suites validate every feature.

System Requirements

Requirement Detail
Python 3.8 or later
Bash 4.0 or later (Git Bash on Windows)
curl For all API calls
OpenSSL For certificate expiry checks
nslookup For DNS checks
Network HTTPS (443) to all VCF management endpoints
Optional: cryptography Python package for credential encryption
Optional: ldap3 Python package for LDAP/AD authentication
Optional: Chrome/Edge For PDF report generation

Runs on Windows 10/11, macOS, and any Linux distribution. No installation required — run the files directly. No cloud dependency. All data stays on your infrastructure.


What Makes This Different

Built exclusively for VMware Cloud Foundation. Not adapted from a generic monitoring platform. Not a plugin for a larger tool. Every one of the 150+ checks was designed specifically for VCF components and the relationships between them. Orphaned VM detection. Unclaimed vSAN disk identification. Cluster capacity thresholds. SDDC Manager stale task detection. BOM compliance. These are VCF problems that generic tools do not know to check.

Runs in under 5 minutes. A complete health audit of your entire VCF environment in under five minutes. Parallel execution across components means five things are being checked simultaneously.

Gives you a grade. Not a wall of data. A single letter grade (A through F) that instantly communicates the state of your environment to anyone — from the engineer who needs to fix things to the executive who needs to report on them.

Tells you how to fix things. Not just what is wrong. Every failure and warning comes with specific remediation steps, estimated time to resolve, and a downloadable playbook checklist you can hand to an engineer and say "fix these."

Works offline. No cloud dependency. No data leaving your network. No SaaS subscription that requires internet access. Install it, run it, keep your data on your infrastructure.

Generates reports you can actually use. Ten report formats. Hand the HTML to a client. Feed the JSON to ServiceNow. Scrape the Prometheus metrics with Grafana. Generate an Ansible inventory from your live environment. Export a remediation playbook. Print the PDF and pin it to the wall. The reports are designed to be useful, not just complete.

Enterprise security from day one. Fernet-encrypted credentials, PBKDF2 password hashing with 310,000 iterations, role-based access control, brute-force lockout, session timeouts, full audit trail, optional LDAP integration. This is not a script with passwords in a text file.


Pricing

VCF Health Check is priced per environment per month. The more environments you monitor, the lower your per-environment cost. Annual billing saves 15%.

One-time $500 onboarding fee includes your setup call, configuration assistance, and branded report setup.

Per-Environment Rates

Environments Standard Professional Enterprise
1-9 $299/env/mo $399/env/mo $499/env/mo
10-24 $249/env/mo $349/env/mo $449/env/mo
25-99 $199/env/mo $299/env/mo $399/env/mo
100+ $149/env/mo $249/env/mo $349/env/mo

Standard Edition

All 150+ health checks. All 10 report formats. All 10 notification channels. Desktop GUI application with encrypted credential storage, role-based access, and audit trail. Everything you need to monitor your VCF environments professionally.

Professional Edition

Everything in Standard, plus branded reports with your company logo, automated scheduling (Windows Task Scheduler, cron, systemd), and multi-environment dashboard. This is the edition for service providers who want to look professional and save time.

Enterprise Edition

Everything in Professional, plus API export for dashboard integrations (ServiceNow, Grafana, Datadog), full white-label capability to rebrand the product as your own, usage analytics and billing module, and priority support. This is the edition for organizations that want maximum flexibility and zero limitations.

Example Monthly Costs

A solo consultant monitoring 3 environments on Standard pays $897 per month. An MSP monitoring 25 environments on Professional pays $7,475 per month. A large partner monitoring 50 environments on Enterprise pays $19,950 per month. Volume pricing for 100+ environments is available on request.


Partner Program

VCF Health Check is available through channel partnerships for VMware/Broadcom solution providers, MSPs, and distributors. Whether you resell VCF solutions, manage client environments, or distribute VMware technology — we have a partnership model designed for your business.

Channel Partner Benefits

Distribution Partnerships

We offer distribution agreements with competitive reseller margins for technology distributors. One partnership provides access to dozens of downstream partner customers through your existing reseller network. White-label Enterprise licensing enables your partners to rebrand and resell under their own brand.

Become a Partner

We are actively onboarding VMware/Broadcom partners, MSPs, and distributors. To explore a partnership:


Get Started

Contact Virtual Control LLC to discuss your VCF monitoring needs and get your license key.

Every new installation includes a Setup Wizard that walks you through activation in under two minutes. Paste your key, confirm your paths, and you are running health checks immediately.

No lengthy onboarding. No professional services engagement. No training required. If you can manage a VCF environment, you can use VCF Health Check.

Your VCF environments are too important to monitor with hope and manual spot checks. Give your team — and your clients — the confidence that comes from knowing everything is healthy, every single day.