Autoscaling

This guide covers configuring Horizontal Pod Autoscaling (HPA) for AutoCom to automatically scale based on CPU and memory metrics.

Overview

Kubernetes Horizontal Pod Autoscaler automatically adjusts the number of pod replicas based on observed metrics. AutoCom uses HPA for:

API - Scale based on CPU and memory usage
Frontend - Scale based on CPU and memory usage
Nginx - Scale based on CPU usage

Prerequisites

Metrics Server

HPA requires metrics-server to collect resource metrics:

# Check if metrics-server is installed
kubectl get deployment metrics-server -n kube-system

# Install metrics-server (if not present)
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

For Docker Desktop, you may need to patch metrics-server:

kubectl patch deployment metrics-server -n kube-system --type='json' -p='[
  {"op": "add", "path": "/spec/template/spec/containers/0/args/-", "value": "--kubelet-insecure-tls"}
]'

Verify Metrics

# Check node metrics
kubectl top nodes

# Check pod metrics
kubectl top pods -n autocom

HPA Configuration

API Autoscaler

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-hpa
  namespace: autocom
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api
  minReplicas: 2
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
        - type: Percent
          value: 50
          periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
        - type: Percent
          value: 100
          periodSeconds: 15
        - type: Pods
          value: 4
          periodSeconds: 15
      selectPolicy: Max

Frontend Autoscaler

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: frontend-hpa
  namespace: autocom
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: frontend
  minReplicas: 2
  maxReplicas: 8
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
        - type: Percent
          value: 50
          periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
        - type: Percent
          value: 100
          periodSeconds: 15

Nginx Autoscaler

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: nginx-hpa
  namespace: autocom
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: nginx
  minReplicas: 2
  maxReplicas: 6
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
    scaleUp:
      stabilizationWindowSeconds: 0

Scaling Behavior

Scale Up Behavior

behavior:
  scaleUp:
    stabilizationWindowSeconds: 0  # Scale up immediately
    policies:
      - type: Percent
        value: 100              # Double pods
        periodSeconds: 15       # Every 15 seconds
      - type: Pods
        value: 4                # Or add 4 pods
        periodSeconds: 15       # Every 15 seconds
    selectPolicy: Max           # Use the larger of the two

Explanation:

stabilizationWindowSeconds: 0 - No delay before scaling up
selectPolicy: Max - Choose the policy that adds the most pods
Can double pods OR add 4, whichever is greater

Scale Down Behavior

behavior:
  scaleDown:
    stabilizationWindowSeconds: 300  # Wait 5 minutes
    policies:
      - type: Percent
        value: 50                # Remove 50% of pods
        periodSeconds: 60        # Every 60 seconds

Explanation:

stabilizationWindowSeconds: 300 - Wait 5 minutes before scaling down
Prevents flapping during temporary load drops
Removes at most 50% of pods per minute

Resource Limits

For HPA to work correctly, deployments must have resource requests:

resources:
  requests:
    cpu: 200m      # 0.2 CPU cores
    memory: 384Mi  # 384 MiB
  limits:
    cpu: 1000m     # 1 CPU core
    memory: 768Mi  # 768 MiB

Recommended Values

Component	CPU Request	CPU Limit	Memory Request	Memory Limit
API	200m	1000m	384Mi	768Mi
Nginx	100m	500m	128Mi	256Mi
Frontend	100m	500m	256Mi	512Mi

Monitoring HPA

Check HPA Status

kubectl get hpa -n autocom

Example output:

NAME           REFERENCE             TARGETS                        MINPODS   MAXPODS   REPLICAS
api-hpa        Deployment/api        cpu: 1%/70%, memory: 4%/80%    2         10        2
frontend-hpa   Deployment/frontend   cpu: 4%/70%, memory: 23%/80%   2         8         2
nginx-hpa      Deployment/nginx      cpu: 1%/70%                    2         6         2

Detailed Status

kubectl describe hpa api-hpa -n autocom

Shows:

Current metrics
Scaling events
Conditions

Watch Scaling Events

kubectl get hpa -n autocom -w

Testing Autoscaling

Generate Load

Using hey load testing tool:

# Install hey
go install github.com/rakyll/hey@latest

# Generate load (1000 requests, 50 concurrent)
hey -n 1000 -c 50 http://localhost:8080/api/v1/health

Using kubectl run

# Create a load generator pod
kubectl run load-generator --image=busybox -n autocom -- /bin/sh -c "while true; do wget -q -O- http://nginx/api/v1/health; done"

# Watch HPA
kubectl get hpa -n autocom -w

# Clean up
kubectl delete pod load-generator -n autocom

Expected Behavior

Load increases → CPU usage rises
CPU exceeds 70% threshold
HPA triggers scale up
New pods start (within 30 seconds)
Load distributed across more pods
CPU usage decreases per pod
After load stops, wait 5 minutes
HPA triggers scale down

Tuning Tips

For Bursty Traffic

behavior:
  scaleUp:
    stabilizationWindowSeconds: 0
    policies:
      - type: Pods
        value: 6         # Add more pods quickly
        periodSeconds: 10

For Steady Traffic

behavior:
  scaleUp:
    stabilizationWindowSeconds: 30  # Wait before scaling
    policies:
      - type: Percent
        value: 50         # Scale more gradually
        periodSeconds: 30

For Cost Optimization

behavior:
  scaleDown:
    stabilizationWindowSeconds: 600  # Wait 10 minutes
    policies:
      - type: Percent
        value: 25         # Remove only 25% at a time
        periodSeconds: 120

Vertical Pod Autoscaler (VPA)

For automatic resource request tuning:

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: api-vpa
  namespace: autocom
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api
  updatePolicy:
    updateMode: "Off"  # Just recommend, don't auto-update

Note: VPA requires the VPA controller to be installed separately.

Troubleshooting

HPA Shows "Unknown" Metrics

# Check if metrics-server is running
kubectl get pods -n kube-system | grep metrics-server

# Check metrics-server logs
kubectl logs -n kube-system deployment/metrics-server

Pods Not Scaling

Check resource requests are set
Verify metrics-server is working
Check HPA events: kubectl describe hpa api-hpa -n autocom

Scaling Too Aggressively

Increase stabilizationWindowSeconds
Reduce scale up value percentage
Add scaleDown policies

Scaling Too Slowly

Decrease stabilizationWindowSeconds
Increase scale up value
Add more aggressive scaleUp policies

Summary

Setting	API	Frontend	Nginx
Min Replicas	2	2	2
Max Replicas	10	8	6
CPU Target	70%	70%	70%
Memory Target	80%	80%	-
Scale Up Delay	0s	0s	0s
Scale Down Delay	300s	300s	300s

Next Steps

Kubernetes Deployment for full cluster setup
PHP Optimization for performance tuning