Troubleshooting Guide

Solutions to common issues and debugging strategies

Common Issues

Pods Not Starting

Symptoms:

  • Pods stuck in Pending or CrashLoopBackOff
  • ImagePullBackOff errors

Diagnosis:

# Check pod status
kubectl get pods -n composio

# Describe problematic pod
kubectl describe pod <pod-name> -n composio

# Check pod logs
kubectl logs <pod-name> -n composio

# Check events
kubectl get events -n composio --sort-by='.lastTimestamp'

Solutions:

  • ImagePullBackOff: Verify ECR credentials are correct
    # Recreate ECR secret
    kubectl delete secret ecr-secret -n composio
    helm upgrade composio ./composio -n composio \
        --set externalSecrets.ecr.token="$(aws ecr get-login-password --region us-east-1)"
  • Insufficient Resources: Check cluster has enough CPU/memory
    # Check node resources
    kubectl top nodes
    kubectl describe nodes
  • Secret Issues: Verify all secrets exist
    # Check secrets
    kubectl get secrets -n composio
    kubectl describe secret composio-secrets -n composio
    kubectl describe secret external-postgres-secret -n composio

Service Not Accessible

Symptoms:

  • Cannot connect to Apollo API
  • Connection timeouts

Diagnosis:

# Check service status
kubectl get svc -n composio

# Test service connectivity from within cluster
kubectl run test-pod --rm -i --tty --image=curlimages/curl --restart=Never -- \
    curl http://composio-apollo:9900/api/v1/health

# Check endpoints
kubectl get endpoints -n composio

Solutions:

  • Verify pods are running and ready
  • Check service selectors match pod labels
  • Verify network policies aren't blocking traffic
  • Use port-forward for debugging:
    kubectl port-forward -n composio svc/composio-apollo 8080:9900

Knative Issues

Knative Components Not Starting

Diagnosis:

# Check Knative installation
kubectl get pods -n knative-serving

# Check Knative setup job
kubectl get jobs -n composio -l app.kubernetes.io/component=knative-setup
kubectl logs -n composio job/knative-setup-<revision>

# Verify CRDs
kubectl get crd | grep knative

Solutions:

  • Reinstall Knative components:
    kubectl apply -f https://github.com/knative/serving/releases/download/knative-v1.15.0/serving-crds.yaml
    kubectl apply -f https://github.com/knative/serving/releases/download/knative-v1.15.0/serving-core.yaml
    kubectl apply -f https://github.com/knative/net-kourier/releases/download/knative-v1.15.0/kourier.yaml
    
    kubectl patch configmap/config-network \
        --namespace knative-serving \
        --type merge \
        --patch '{"data":{"ingress-class":"kourier.ingress.networking.knative.dev"}}'
  • Ensure cluster has sufficient resources
  • Check for conflicting ingress controllers

Mercury Service Not Ready

Diagnosis:

# Check Knative service status
kubectl get ksvc -n composio
kubectl describe ksvc composio-mercury -n composio

# Check Mercury pods
kubectl get pods -n composio -l serving.knative.dev/service=composio-mercury

# View Mercury logs
kubectl logs -n composio -l serving.knative.dev/service=composio-mercury

Solutions:

  • Check autoscaler annotations are correct
  • Verify resource requests don't exceed cluster capacity
  • Check Mercury container logs for application errors
  • Ensure Knative activator is running

Database Issues

Database Connection Failures

Symptoms:

  • Apollo/Thermos pods failing to start
  • Error messages about database connectivity

Diagnosis:

# Check database secret
kubectl get secret external-postgres-secret -n composio -o yaml

# Decode database URL
kubectl get secret external-postgres-secret -n composio \
    -o jsonpath='{.data.url}' | base64 -d

# Test connectivity from pod
kubectl run test-db --rm -i --tty --image=postgres:15 --restart=Never -- \
    psql "$(kubectl get secret external-postgres-secret -n composio -o jsonpath='{.data.url}' | base64 -d)"

# Check Apollo logs for connection errors
kubectl logs -n composio deployment/composio-apollo | grep -i "database\|postgres"

Solutions:

  • Verify POSTGRES_URL format is correct: postgresql://user:password@host:5432/database?sslmode=require
  • Check network connectivity to database
  • Verify database firewall rules allow cluster IPs
  • Ensure SSL mode matches database configuration
  • Re-run secret setup script:
    export POSTGRES_URL="your-connection-string"
    ./secret-setup.sh -r composio -n composio

Database Initialization Job Failures

Diagnosis:

# Check DB init jobs
kubectl get jobs -n composio | grep db-init

# Check job logs
kubectl logs -n composio job/composio-apollo-db-init
kubectl logs -n composio job/composio-thermos-db-init

Solutions:

  • Check database permissions for migrations
  • Verify database schema compatibility
  • Delete and recreate failed jobs:
    kubectl delete job composio-apollo-db-init -n composio
    helm upgrade composio ./composio -n composio

GKE-Specific Issues

GKE Autopilot Resource Issues

Symptoms:

  • Pods rejected with resource limit errors
  • Autopilot adjusting resource requests

Diagnosis:

# Check Autopilot events
kubectl get events -n composio --sort-by='.lastTimestamp' | grep autopilot

# Check resource quotas
kubectl describe quota -n composio

# View Autopilot recommendations
gcloud container clusters describe $CLUSTER_NAME --region=$REGION \
    --format="value(autopilot.workloadPolicyConfig)"

Solutions:

  • Ensure resource requests are within Autopilot limits
  • Disable privileged containers and sysctls (Autopilot restriction)
  • Use appropriate QoS classes for pods
  • Review and adjust resource limits in values.yaml

Cloud SQL Connectivity

Diagnosis:

# Test Cloud SQL connectivity
kubectl run -it --rm debug --image=postgres:15 --restart=Never -- \
    psql "postgresql://postgres:your_password@${POSTGRES_IP}:5432/composio?sslmode=require"

# Check Cloud SQL logs
gcloud logging read \
    "resource.type=gce_instance AND resource.labels.instance_id:composio-postgres" \
    --limit 50 --format json

# Verify instance status
gcloud sql instances describe composio-postgres

Solutions:

  • Ensure Cloud SQL instance allows connections from GKE cluster
  • Check VPC network connectivity
  • Verify SSL certificates are valid
  • Use Cloud SQL Proxy for enhanced security:
    # Install Cloud SQL Proxy sidecar
    # See: https://cloud.google.com/sql/docs/postgres/connect-kubernetes-engine

Resource Issues

Out of Memory (OOM)

Diagnosis:

# Check for OOM kills
kubectl get events -n composio | grep OOM

# Check current resource usage
kubectl top pods -n composio
kubectl top nodes

# Describe pod to see resource limits
kubectl describe pod <pod-name> -n composio

Solutions:

  • Increase memory limits in values.yaml
  • Enable HPA to distribute load
  • Check for memory leaks in application logs
  • Optimize application configuration

Persistent Volume Issues

Diagnosis:

# Check PVC status
kubectl get pvc -n composio

# Check PV status
kubectl get pv

# Describe PVC
kubectl describe pvc <pvc-name> -n composio

Solutions:

  • Multi-Attach Error: Minio uses RollingUpdate strategy with maxUnavailable=1
  • Verify storage class is available
  • Check cluster has storage provisioner
  • Delete pod to force remount if stuck

Debugging Tools & Commands

Essential Debug Commands

# Complete cluster overview
kubectl get all -n composio

# Pod logs (follow)
kubectl logs -f <pod-name> -n composio

# Previous pod logs (after crash)
kubectl logs <pod-name> -n composio --previous

# Multi-container pod logs
kubectl logs <pod-name> -c <container-name> -n composio

# Execute commands in pod
kubectl exec -it <pod-name> -n composio -- /bin/sh

# Port forward for local access
kubectl port-forward -n composio svc/composio-apollo 8080:9900

# Watch resource changes
watch kubectl get pods -n composio

# Get pod YAML
kubectl get pod <pod-name> -n composio -o yaml

# Check pod events
kubectl describe pod <pod-name> -n composio

# Network debugging
kubectl run netshoot --rm -i --tty --image=nicolaka/netshoot --restart=Never

# DNS debugging
kubectl run dnsutils --rm -i --tty --image=gcr.io/kubernetes-e2e-test-images/dnsutils:1.3 --restart=Never -- nslookup composio-apollo.composio.svc.cluster.local

Monitoring Commands

# Resource usage
kubectl top pods -n composio
kubectl top nodes

# HPA status
kubectl get hpa -n composio
kubectl describe hpa <hpa-name> -n composio

# Service endpoints
kubectl get endpoints -n composio

# ConfigMaps and Secrets
kubectl get cm -n composio
kubectl get secrets -n composio

# Knative services
kubectl get ksvc -n composio
kubectl get revisions -n composio

Support Bundle Collection

For complex issues, collect a comprehensive support bundle:

# Enable support bundle in values.yaml
supportBundle:
  enabled: true

# Upgrade deployment
helm upgrade composio ./composio -n composio

# Collect support bundle
kubectl support-bundle ./composio/support-bundle-spec.yaml -n composio

# Or manually collect key information
mkdir support-bundle
kubectl get all -n composio -o yaml > support-bundle/all-resources.yaml
kubectl get events -n composio --sort-by='.lastTimestamp' > support-bundle/events.txt
kubectl logs -n composio --all-containers=true --prefix=true > support-bundle/all-logs.txt
kubectl describe nodes > support-bundle/nodes.txt

Getting Help

Discord Community

Get real-time help from the community and maintainers

Join Discord

GitHub Issues

Report bugs and request features

Open Issue

Documentation

Explore comprehensive guides and API references

Read Docs