- Introduces HyperPod troubleshooting skills for AI coding assistants (e.g., Claude Code, Cursor, Kiro) to perform natural‑language diagnostics of cluster health, GPU faults, NCCL communication, and performance bottlenecks.
- Automates evidence collection via AWS Systems Manager, analyzes logs, and provides actionable recommendations without requiring changes to existing Slurm or Amazon EKS HyperPod deployments.