Virtual Control

VMware Cloud Foundation Solutions

Health Check Handbook

VCF Operations
Health Check Handbook

Analytics cluster health validation covering node status, adapters, collection, certificates, Cassandra, alert engine, and remote collectors.

AnalyticsAdaptersCollectorsAlertsCassandra

VCF 9.0

VMware Cloud Foundation

Proprietary & Confidential

VCF Operations Health Check Handbook

Comprehensive Health Verification for VCF Operations in VCF 9

Author: Virtual Control LLC Date: March 2026 Version: 1.0 Classification: Internal Use Platform: VMware Cloud Foundation 9.0 / VCF Operations 8.18+

1. Overview & Purpose

This handbook provides a complete health check procedure for VCF Operations (formerly VMware Aria Operations / vRealize Operations) deployed within a VCF 9.0 environment. VCF Operations provides:

Performance monitoring — Real-time and historical analytics for all VCF components
Capacity management — Predictive capacity analysis and right-sizing recommendations
Alerting — Proactive notification of health, risk, and efficiency issues
Compliance — Regulatory and best-practice compliance dashboards
Troubleshooting — Root cause analysis and workload optimization

When to Run

Trigger	Priority
After deployment / node addition	Critical
Before/after VCF upgrades	Critical
Weekly routine health check	Recommended
When dashboards show stale data	Troubleshooting
When alerts are not firing	Troubleshooting

Environment Variables:
$OPS = VCF Operations FQDN (e.g., vcf-ops.lab.local)
$OPS_USER = admin
$OPS_PASS = VCF Operations admin password
$OPS_TOKEN = Suite API auth token

2. Prerequisites

Required Access

Access Type	Target	Credentials
HTTPS (443)	VCF Operations VIP	admin / password
SSH (22)	Each VCF Operations node	root / password
Suite API (443)	VCF Operations VIP	admin / auth token
CASA Admin	VCF Operations master node	root / admin

Token Acquisition (Suite API)

export OPS="vcf-ops.lab.local"
export OPS_USER="admin"
export OPS_PASS="YourPassword123!"

# Acquire auth token
OPS_TOKEN=$(curl -sk -X POST \
  "https://$OPS/suite-api/api/auth/token/acquire" \
  -H "Content-Type: application/json" \
  -d "{
    \"username\":\"$OPS_USER\",
    \"password\":\"$OPS_PASS\",
    \"authSource\":\"local\"
  }" | jq -r '.token')

echo "Token: ${OPS_TOKEN:0:20}..."

# Convenience function
ops_api() {
  curl -sk -H "Authorization: vRealizeOpsToken $OPS_TOKEN" \
    -H "Content-Type: application/json" \
    "https://$OPS/suite-api$1" 2>/dev/null
}

Token Expiry: Suite API tokens expire after 6 hours by default. Re-acquire if you receive 401 responses.

3. Quick Reference — All Checks Summary

#	Check	Method	PASS	WARN	FAIL
4.1	Cluster State	CASA/SSH	`RUNNING` / `INITIALIZED`	`STARTING`	`OFFLINE` / `ERROR`
4.2	Slice Status	CASA	All `ONLINE`	Any `STARTING`	Any `OFFLINE`
5.1	Node Status	Suite API	All nodes `ONLINE`	Any `STARTING`	Any `OFFLINE`
5.2	Node CPU	SSH	< 70%	70-85%	> 85%
5.3	Node Memory	SSH	< 75%	75-90%	> 90%
6.1	Adapters	Suite API	All `COLLECTING`	Any `NOT_COLLECTING` (non-critical)	vCenter adapter not collecting
6.2	Collection	Suite API	Last collection < 10 min	10-30 min gap	> 30 min gap
8	Certificates	SSH/openssl	> 30 days to expiry	7-30 days	< 7 days / expired
9	License	Suite API	Valid, objects < capacity	> 80% capacity	Expired or over capacity
10	Disk	SSH	< 70% all partitions	70-85%	> 85%
11	Active Alerts	Suite API	0 critical	Warning alerts	Critical alerts
12	Collectors	Suite API	All `ONLINE`	Any `UNKNOWN`	Any `OFFLINE`
15	Suite API	curl	Response < 2s	2-5s	> 5s or error

4. Analytics Cluster Status

4.1 Cluster State via CASA

What: Verify the VCF Operations analytics cluster is fully initialized and running.

Why: A cluster not in RUNNING state means data collection, alerting, and dashboards may be stale or non-functional.

SSH Method (on master node)

ssh root@$OPS

# Check cluster status via CASA admin
$VMWARE_PYTHON_PATH/bin/python \
  /usr/lib/vmware-vcops/tools/opscli/admin-cli.py \
  getClusterStatus

Expected Output (Healthy):

Cluster Status: RUNNING
Cluster Uptime: 15 days 8 hours
Master Node: vcf-ops-01.lab.local (ONLINE)
Data Node: vcf-ops-02.lab.local (ONLINE)
Data Node: vcf-ops-03.lab.local (ONLINE)
Remote Collector: rc-01.lab.local (ONLINE)

Alternative — CASA API

curl -sk "https://$OPS/casa/cluster/status" \
  -u "admin:$OPS_PASS" | jq .

Expected Output:

{
  "cluster_status": "RUNNING",
  "slice_status": "ONLINE",
  "node_statuses": [
    {"node_name": "vcf-ops-01", "status": "ONLINE", "role": "MASTER"},
    {"node_name": "vcf-ops-02", "status": "ONLINE", "role": "DATA"},
    {"node_name": "vcf-ops-03", "status": "ONLINE", "role": "DATA"}
  ]
}

Pass / Warn / Fail

Result	Criteria	Indicator
PASS	Cluster `RUNNING`, all nodes `ONLINE`	Fully operational
WARN	Cluster `STARTING` or any node `STARTING`	Coming online
FAIL	Cluster `OFFLINE` or `ERROR`	Data collection stopped

Remediation:
1. Bring cluster online: Use CASA admin UI (https://<master>/casa) → Cluster Operations → Start
2. Via CLI: $VMWARE_PYTHON_PATH/bin/python /usr/lib/vmware-vcops/tools/opscli/admin-cli.py bringClusterOnline
3. Check cluster logs: /storage/log/vcops/casa/casa.log

4.2 Slice Status

What: Verify all analytics slices are online.

# Via CASA API
curl -sk "https://$OPS/casa/slice/status" \
  -u "admin:$OPS_PASS" | jq .

Expected Output:

{
  "slices": [
    {"slice_id": 0, "status": "ONLINE", "node": "vcf-ops-01"},
    {"slice_id": 1, "status": "ONLINE", "node": "vcf-ops-02"},
    {"slice_id": 2, "status": "ONLINE", "node": "vcf-ops-03"}
  ]
}

4.3 Node Roles

Role	Description	Count
MASTER	Primary analytics node, cluster coordinator	1
MASTER_REPLICA	Failover for master	1 (if HA)
DATA	Analytics processing and storage	1+
REMOTE_COLLECTOR	Remote data collection proxy	0+

5. Node Health

5.1 Individual Node Status

# List all nodes via Suite API
ops_api "/api/deployment/node" | jq '.nodeList[] | {
  name: .name,
  ip: .ip,
  role: .role,
  status: .status,
  version: .version
}'

5.2 Resource Utilization per Node

ssh root@$OPS

# CPU
top -b -n 1 | head -5

# Memory
free -m

# Disk (critical partitions)
df -h /storage /storage/db /storage/log /storage/core

Critical Partitions

Partition	Purpose	PASS	WARN	FAIL
`/storage`	Analytics data	< 70%	70-85%	> 85%
`/storage/db`	Cassandra / xDB	< 70%	70-85%	> 85%
`/storage/log`	Log files	< 70%	70-85%	> 85%
`/` (root)	OS	< 70%	70-85%	> 85%

5.3 Heartbeat Verification

# Check last heartbeat per node
ops_api "/api/deployment/node" | jq '.nodeList[] | {
  name: .name,
  lastHeartbeat: .lastHeartbeat,
  heartbeatStatus: .heartbeatStatus
}'

6. Adapter Health

6.1 Adapter Instances

What: Verify all configured adapter instances are collecting data.

# List all adapters
ops_api "/api/adapters" | jq '.adapterInstancesInfoDto[] | {
  id: .id,
  adapterKind: .resourceKey.adapterKindKey,
  name: .resourceKey.name,
  collectorId: .collectorId,
  collectionState: .collectionState,
  collectionStatus: .collectionStatus
}'

Expected Output:

{
  "id": "abc123",
  "adapterKind": "VMWARE",
  "name": "vCenter - vcenter.lab.local",
  "collectorId": "1",
  "collectionState": "COLLECTING",
  "collectionStatus": "DATA_RECEIVING"
}

Key Adapters to Verify

Adapter Kind	Name Pattern	Critical
`VMWARE`	vCenter adapter	Yes
`NSXTAdapter`	NSX-T adapter	Yes
`VsanAdapter`	vSAN adapter	Yes
`SDDCHealthAdapter`	SDDC Health	Yes
`PythonRemediationVcenterAdapter`	Automation	No
`LogInsightAdapter`	Log Insight integration	No

Result	Criteria	Indicator
PASS	All critical adapters `COLLECTING`	Data flowing
WARN	Non-critical adapter not collecting	Limited functionality
FAIL	vCenter or NSX adapter not collecting	Stale data / no monitoring

6.2 Collection Status

# Check last collection time for a specific adapter
ADAPTER_ID="<adapter-id>"
ops_api "/api/adapters/$ADAPTER_ID" | jq '{
  name: .resourceKey.name,
  collectionState: .collectionState,
  lastCollected: .lastCollected,
  numberOfMetricsCollected: .numberOfMetricsCollected,
  numberOfResourcesCollected: .numberOfResourcesCollected
}'

6.3 Credential Validation

# List credentials
ops_api "/api/credentials" | jq '.credentialInstances[] | {
  id: .id,
  name: .name,
  adapterKind: .adapterKindKey
}'

# Test credential (trigger validation)
curl -sk -X POST \
  -H "Authorization: vRealizeOpsToken $OPS_TOKEN" \
  -H "Content-Type: application/json" \
  "https://$OPS/suite-api/api/adapters/$ADAPTER_ID/monitoringstate/start"

Remediation for adapter not collecting:
1. Verify credential: Update password if changed on target
2. Test connectivity: curl -sk https://<target>:443 from OPS node
3. Restart adapter: Suite API → POST /api/adapters/<id>/monitoringstate/stop then /start
4. Check adapter logs: /storage/log/vcops/adapterkind/<adapter-kind>/

7. Collection Status

What: Verify data collection is current and no gaps exist.

# Get collection stats
ops_api "/api/resources?adapterKind=VMWARE&resourceKind=VirtualMachine&pageSize=5" | jq '{
  totalCount: .totalCount,
  resources: [.resourceList[].resourceKey.name]
}'

Check for Collection Gaps

# Recent collection cycles on the node
ssh root@$OPS
grep "Collection completed" /storage/log/vcops/analytics/analytics.log | tail -10

Result	Criteria	Indicator
PASS	Last collection < 10 minutes ago	Current data
WARN	Last collection 10-30 minutes ago	Slight delay
FAIL	Last collection > 30 minutes ago	Stale data

8. Certificate Health

# Check web certificate
echo | openssl s_client -connect $OPS:443 2>/dev/null | \
  openssl x509 -noout -dates -subject

# Check all certificates on the node
ssh root@$OPS
find /storage/vcops/user/conf/ssl -name "*.pem" -exec \
  sh -c 'echo "=== $1 ===" && openssl x509 -in "$1" -noout -enddate' _ {} \;

Result	Criteria	Indicator
PASS	All certificates > 30 days from expiry	Healthy
WARN	Any certificate 7-30 days from expiry	Plan renewal
FAIL	Any certificate < 7 days or expired	Immediate action

9. Capacity & Licensing

# Check license status
ops_api "/api/deployment/licenses" | jq '.licenseDetails[] | {
  licenseKey: .licenseKey[0:8],
  edition: .edition,
  capacity: .capacity,
  usage: .usage,
  expirationDate: .expirationDate
}'

Expected Output:

{
  "licenseKey": "XXXXX-XX",
  "edition": "Enterprise",
  "capacity": 500,
  "usage": 320,
  "expirationDate": "2027-03-01"
}

Result	Criteria	Indicator
PASS	License valid, usage < 80% of capacity	Healthy
WARN	Usage > 80% capacity or < 60 days to expiry	Plan expansion
FAIL	License expired or usage > capacity	Functionality limited

10. Disk & Database Health

Disk Usage

ssh root@$OPS
df -h /storage /storage/db /storage/log

Cassandra / xDB Health

# Check Cassandra status
ssh root@$OPS
/opt/vmware/vcops/cassandra/apache-cassandra/bin/nodetool status

Expected Output:

Datacenter: vrops
==========================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address         Load       Tokens  Owns    Host ID
UN  192.168.1.77    15.2 GiB   256     100.0%  abc123...
UN  192.168.1.78    14.8 GiB   256     100.0%  def456...
UN  192.168.1.79    15.0 GiB   256     100.0%  ghi789...

Status	Meaning
`UN`	Up, Normal — healthy
`DN`	Down, Normal — node offline
`UL`	Up, Leaving — decommissioning
`UJ`	Up, Joining — bootstrapping

Data Retention

# Check retention settings
ops_api "/api/deployment/retention" | jq .

Remediation for disk full:
1. Reduce retention: Lower data retention period via Administration → Global Settings
2. Clean old logs: find /storage/log -name "*.gz" -mtime +14 -delete
3. Cassandra compaction: /opt/vmware/vcops/cassandra/apache-cassandra/bin/nodetool compact
4. Expand disk: Power off node → expand VMDK → extend partition

11. Alert Engine Health

# Count active alerts by criticality
ops_api "/api/alerts?status=ACTIVE&criticality=CRITICAL" | jq '.totalCount'
ops_api "/api/alerts?status=ACTIVE&criticality=IMMEDIATE" | jq '.totalCount'
ops_api "/api/alerts?status=ACTIVE&criticality=WARNING" | jq '.totalCount'

Check Alert Plugins (Notifications)

ops_api "/api/alertplugins" | jq '.notificationPluginInstances[] | {
  id: .id,
  name: .name,
  pluginType: .pluginTypeId,
  enabled: .enabled
}'

Test SMTP Notification

# Verify SMTP relay
ssh root@$OPS
echo "Test" | mail -s "VCF Ops Health Check Test" admin@lab.local

Result	Criteria	Indicator
PASS	0 critical alerts, notifications working	Healthy
WARN	Warning alerts present	Review and tune
FAIL	Critical alerts or notifications broken	Immediate review

12. Remote Collectors

# List remote collectors
ops_api "/api/collectors" | jq '.collector[] | {
  id: .id,
  name: .name,
  ip: .ip,
  status: .state,
  version: .version,
  usingVRealize: .usingVRealize
}'

Expected Output:

{
  "id": "1",
  "name": "vcf-ops-rc-01",
  "ip": "192.168.1.80",
  "status": "ONLINE",
  "version": "8.18.0.12345678"
}

Result	Criteria	Indicator
PASS	All collectors `ONLINE`	Healthy
WARN	Any collector `UNKNOWN`	Communication issue
FAIL	Any collector `OFFLINE`	Data collection impacted

13. Management Packs

# List installed management packs (solutions)
ops_api "/api/solutions" | jq '.solution[] | {
  id: .id,
  name: .name,
  version: .version,
  adapterKind: .adapterKindKeys
}'

14. Integration Health

vCenter Adapter

# Check vCenter adapter specifically
ops_api "/api/adapters?adapterKindKey=VMWARE" | jq '.adapterInstancesInfoDto[] | {
  name: .resourceKey.name,
  collectionState: .collectionState,
  lastCollected: .lastCollected
}'

NSX Adapter

ops_api "/api/adapters?adapterKindKey=NSXTAdapter" | jq '.adapterInstancesInfoDto[] | {
  name: .resourceKey.name,
  collectionState: .collectionState
}'

vSAN Adapter

ops_api "/api/adapters?adapterKindKey=VsanAdapter" | jq '.adapterInstancesInfoDto[] | {
  name: .resourceKey.name,
  collectionState: .collectionState
}'

SDDC Health Adapter

ops_api "/api/adapters?adapterKindKey=SDDCHealthAdapter" | jq '.adapterInstancesInfoDto[] | {
  name: .resourceKey.name,
  collectionState: .collectionState
}'

15. API Health (Suite API)

Token Acquisition Test

time curl -sk -X POST \
  "https://$OPS/suite-api/api/auth/token/acquire" \
  -H "Content-Type: application/json" \
  -d "{\"username\":\"$OPS_USER\",\"password\":\"$OPS_PASS\",\"authSource\":\"local\"}" \
  | jq -r '.token' > /dev/null

Result	Criteria	Indicator
PASS	Token acquired in < 2 seconds	API responsive
WARN	2-5 seconds	API slow
FAIL	> 5 seconds or failed	API issue

Endpoint Responsiveness

ENDPOINTS="/api/deployment/node /api/adapters /api/resources?pageSize=1 /api/alerts?pageSize=1"
for EP in $ENDPOINTS; do
  START=$(date +%s%N)
  HTTP=$(curl -sk -o /dev/null -w "%{http_code}" \
    -H "Authorization: vRealizeOpsToken $OPS_TOKEN" \
    "https://$OPS/suite-api$EP")
  END=$(date +%s%N)
  MS=$(( (END - START) / 1000000 ))
  echo "$EP: HTTP $HTTP (${MS}ms)"
done

16. NTP & DNS

ssh root@$OPS
# NTP
timedatectl status
chronyc tracking

# DNS
nslookup vcenter.lab.local
nslookup nsx-vip.lab.local
cat /etc/resolv.conf

17. Backup Configuration

# Check via CASA
curl -sk "https://$OPS/casa/deployment/backup/schedule" \
  -u "admin:$OPS_PASS" | jq .

Result	Criteria	Indicator
PASS	Backup configured, recent success	Protected
WARN	> 24h since last backup	Check schedule
FAIL	No backup configured	Data at risk

18. Resource Utilization

ssh root@$OPS
# CPU and Load
uptime
top -b -n 1 | head -5

# Memory
free -m

# Disk
df -h

# Java heap (analytics process)
ps aux | grep analytics | grep -v grep | awk '{print $6/1024 " MB"}'

Resource	PASS	WARN	FAIL
CPU	< 70%	70-85%	> 85%
Memory	< 75%	75-90%	> 90%
Disk (any partition)	< 70%	70-85%	> 85%
Java Heap	< 80% allocated	80-90%	> 90% (OOM risk)

19. Port Reference Table

Inbound Ports

Source	Port	Protocol	Purpose
Admin Browser	443	TCP	Web UI / Suite API
Admin	22	TCP	SSH
Admin	443	TCP	CASA admin interface
Remote Collector	443	TCP	Collector → cluster
vCenter	443	TCP	Webhook notifications

Outbound Ports

Destination	Port	Protocol	Purpose
vCenter	443	TCP	Data collection (vSphere API)
NSX Manager	443	TCP	NSX data collection
ESXi Hosts	443	TCP	Host metrics
SDDC Manager	443	TCP	SDDC Health data
VCF Ops Logs	443/9543	TCP	Log integration
SMTP Server	25/587	TCP	Email notifications
DNS Server	53	TCP/UDP	Name resolution
NTP Server	123	UDP	Time synchronization

Inter-Node Ports

Port	Protocol	Purpose
443	TCP	HTTPS / Suite API
3091	TCP	Cluster communication
3092	TCP	Cluster communication
7000	TCP	Cassandra inter-node
7001	TCP	Cassandra SSL inter-node
9042	TCP	Cassandra native transport
9160	TCP	Cassandra Thrift

20. Common Issues & Remediation

20.1 Cluster Offline

Symptom	Likely Cause	Resolution
CASA shows cluster `OFFLINE`	Node crash or network partition	Bring cluster online via CASA admin
Cluster won't start	Disk full on master node	Free disk space, then start cluster
Split-brain between nodes	Network connectivity loss	Restore network, restart cluster

Bring cluster online:
1. CASA UI: https://<master>/casa → Cluster → Bring Online
2. CLI: $VMWARE_PYTHON_PATH/bin/python /usr/lib/vmware-vcops/tools/opscli/admin-cli.py bringClusterOnline
3. Force start (last resort): admin-cli.py forceClusterOnline

20.2 Slice Degraded

Symptom	Likely Cause	Resolution
Slice `OFFLINE` on one node	Node resource exhaustion	Check disk/memory, restart node slice
Multiple slices offline	Cluster issue	Restart entire cluster

20.3 Adapter Failures

Symptom	Likely Cause	Resolution
`NOT_COLLECTING`	Credential change	Update credential in VCF Ops
`COLLECTING` but stale data	Target unreachable	Check network connectivity
Adapter crash	Memory issue	Increase adapter memory, restart

20.4 Disk Full

# Quick disk cleanup
ssh root@$OPS
# 1. Clean old logs
find /storage/log -name "*.gz" -mtime +14 -delete
# 2. Check large files
du -sh /storage/* | sort -rh | head -10
# 3. Compact Cassandra
/opt/vmware/vcops/cassandra/apache-cassandra/bin/nodetool compact

20.5 Collection Gaps

Symptom	Likely Cause	Resolution
Dashboards show gaps	Collection cycle missed	Restart adapter
Historical data missing	Retention policy deleted it	Adjust retention
New objects not appearing	Discovery cycle pending	Wait for next cycle or force discovery

20.6 Certificate Expiry

Impact	Resolution
Suite API returns TLS errors	Replace certificate via CASA admin
Remote collectors disconnect	Replace collector certificate, re-register
Browser security warnings	Install custom CA certificate

21. CLI Quick Reference Card

CASA Admin CLI

Command	Purpose
`admin-cli.py getClusterStatus`	Cluster status
`admin-cli.py bringClusterOnline`	Start cluster
`admin-cli.py takeClusterOffline`	Stop cluster
`admin-cli.py forceClusterOnline`	Force start cluster
`admin-cli.py getNodeStatus`	Node status
`admin-cli.py getSliceStatus`	Slice status

CLI Path: $VMWARE_PYTHON_PATH/bin/python /usr/lib/vmware-vcops/tools/opscli/admin-cli.py <command>

System Commands

Command	Purpose
`df -h /storage /storage/db /storage/log`	Disk usage
`free -m`	Memory usage
`top -b -n 1 \| head -5`	CPU / load
`timedatectl`	Time sync
`chronyc tracking`	NTP details
`systemctl status vmware-vcops-analytics`	Analytics service
`systemctl status vmware-vcops-collector`	Collector service
`systemctl status vmware-vcops-web`	Web service
`systemctl status vmware-vcops-casa`	CASA service

Cassandra Commands

Command	Purpose
`nodetool status`	Cassandra cluster status
`nodetool info`	Local node info
`nodetool compactionstats`	Active compactions
`nodetool compact`	Trigger compaction
`nodetool repair`	Repair data
`nodetool describecluster`	Cluster schema

Log Locations

Log	Path
Analytics	`/storage/log/vcops/analytics/analytics.log`
Collector	`/storage/log/vcops/collector/collector.log`
CASA	`/storage/log/vcops/casa/casa.log`
Web	`/storage/log/vcops/web/web.log`
Adapter (per-type)	`/storage/log/vcops/adapterkind/<kind>/`
Cassandra	`/storage/log/vcops/cassandra/system.log`

22. API Quick Reference (Suite API)

Authentication

# Acquire token
curl -sk -X POST "https://$OPS/suite-api/api/auth/token/acquire" \
  -H "Content-Type: application/json" \
  -d '{"username":"admin","password":"password","authSource":"local"}'

# Release token
curl -sk -X POST "https://$OPS/suite-api/api/auth/token/release" \
  -H "Authorization: vRealizeOpsToken $OPS_TOKEN" \
  -H "Content-Type: application/json"

Key Endpoints

Endpoint	Method	Purpose
`/api/auth/token/acquire`	POST	Get auth token
`/api/auth/token/release`	POST	Release auth token
`/api/deployment/node`	GET	List cluster nodes
`/api/deployment/licenses`	GET	License info
`/api/deployment/retention`	GET	Data retention config
`/api/adapters`	GET	List all adapters
`/api/adapters/<id>`	GET	Adapter details
`/api/adapters?adapterKindKey=VMWARE`	GET	Filter by adapter kind
`/api/adapters/<id>/monitoringstate/start`	POST	Start adapter
`/api/adapters/<id>/monitoringstate/stop`	POST	Stop adapter
`/api/credentials`	GET	List credentials
`/api/resources`	GET	List resources
`/api/resources/<id>/stats/latest`	GET	Latest metrics
`/api/alerts`	GET	List alerts
`/api/alerts?status=ACTIVE`	GET	Active alerts only
`/api/alerts?criticality=CRITICAL`	GET	Critical alerts only
`/api/alertplugins`	GET	Notification plugins
`/api/collectors`	GET	Remote collectors
`/api/solutions`	GET	Management packs
`/api/reports`	GET	Report definitions

Common Query Parameters

Parameter	Example	Purpose
`pageSize`	`?pageSize=100`	Results per page
`page`	`?page=0`	Page number
`adapterKind`	`?adapterKind=VMWARE`	Filter by adapter
`resourceKind`	`?resourceKind=VirtualMachine`	Filter by resource type
`status`	`?status=ACTIVE`	Alert status filter
`criticality`	`?criticality=CRITICAL`	Alert criticality

VCF Operations Health Check Handbook

Table of Contents

1. Overview & Purpose

When to Run

2. Prerequisites

Required Access

Token Acquisition (Suite API)

3. Quick Reference — All Checks Summary

4. Analytics Cluster Status

4.1 Cluster State via CASA

SSH Method (on master node)

Alternative — CASA API

Pass / Warn / Fail

4.2 Slice Status

4.3 Node Roles

5. Node Health

5.1 Individual Node Status

5.2 Resource Utilization per Node

Critical Partitions

5.3 Heartbeat Verification

6. Adapter Health

6.1 Adapter Instances

Key Adapters to Verify

6.2 Collection Status

6.3 Credential Validation

7. Collection Status

Check for Collection Gaps

8. Certificate Health

9. Capacity & Licensing

10. Disk & Database Health

Disk Usage

Cassandra / xDB Health

Data Retention

11. Alert Engine Health

Check Alert Plugins (Notifications)

Test SMTP Notification

12. Remote Collectors

13. Management Packs

14. Integration Health

vCenter Adapter

NSX Adapter

vSAN Adapter

SDDC Health Adapter

15. API Health (Suite API)

Token Acquisition Test

Endpoint Responsiveness

16. NTP & DNS

17. Backup Configuration

18. Resource Utilization

19. Port Reference Table

Inbound Ports

Outbound Ports

Inter-Node Ports

20. Common Issues & Remediation

20.1 Cluster Offline

20.2 Slice Degraded

20.3 Adapter Failures

20.4 Disk Full

20.5 Collection Gaps

20.6 Certificate Expiry

21. CLI Quick Reference Card

CASA Admin CLI

System Commands

Cassandra Commands

Log Locations

22. API Quick Reference (Suite API)

Authentication

Key Endpoints

Common Query Parameters