Analyze pod crashes
This guide requires an Upbound control plane instance running UXP v2.0 or later. Upbound SaaS coming soon.
Upbound Crossplane is capable of running Intelligent Control Planes, which define AI-augmented functions to perform tasks.
In this guide, you'll use AI to analyze and remediate common app deployment
issues like out-of-memory errors and crashloopbackoff
This guide walks through
Prerequisites
Before you begin make sure you have:
- An Upbound Account
- The
upCLI installed - An Anthropic API key
- An AWS account
Set up your environment
Clone the repository upbound/configuration-deployment-analysis to your machine:
git clone git@github.com:upbound/configuration-deployment-analysis.git
This repository contains a control plane project that defines watch operations. These watch operations define workflows for:
- watching for events emitted by pods in a cluster
- analyzing them using LLMs and suggesting remediations
- the remediations are gated by a human-in-the-loop approval
- if the suggested remediation is approved, it gets applied to address the issues
Configure credentials and runtime settings
In the project directory, edit the operations/init-operation/operation.yaml and update the function's input.spec.anthropicApiKey to provide an API key from Anthropic. This operation runs whenever you launch the control plane and configures all the required runtime settings.
Launch the local UXP cluster
In the root of the project directory, launch the control plane locally:
up project run --local
Apply a ClusterRole to grant Crossplane admin access after creating the control
plane. This object is already defined in the examples folder of the project
directory:
kubectl apply -f examples/admin.yaml
Apply example deployments and watch for issues
Apply the examples to demonstrate flows for catching and remediating out-of-memory and crashloopbackoff issues:
kubectl apply -f examples/oomkilled.yaml -f crashloopbackoff.yaml
These workloads intentionally cause the deployed pods to exhibit respective errors.
Observe analyses and remediations
The operation function pipeline creates Analysis and Remediation resources based on observed behaviors. Fetch an analysis object to observe suggestions for remediation made by the LLM:
kubectl get Analysis -n crossplane-system
kubectl describe Analysis -n crossplane-system <your-analysis-object>
Clean up
Clean up the local control plane to prevent it from continuing to invoke your LLM. Run the following command:
up project stop
Next steps
Read the concept documentation for Intelligent Control Planes to learn more about using AI-powered functions in your function pipelines.