Autoscaling
This guide covers configuring Horizontal Pod Autoscaling (HPA) for AutoCom to automatically scale based on CPU and memory metrics.
Overview
Kubernetes Horizontal Pod Autoscaler automatically adjusts the number of pod replicas based on observed metrics. AutoCom uses HPA for:
- API - Scale based on CPU and memory usage
- Frontend - Scale based on CPU and memory usage
- Nginx - Scale based on CPU usage
Prerequisites
Metrics Server
HPA requires metrics-server to collect resource metrics:
# Check if metrics-server is installed
kubectl get deployment metrics-server -n kube-system
# Install metrics-server (if not present)
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
For Docker Desktop, you may need to patch metrics-server:
kubectl patch deployment metrics-server -n kube-system --type='json' -p='[
{"op": "add", "path": "/spec/template/spec/containers/0/args/-", "value": "--kubelet-insecure-tls"}
]'
Verify Metrics
# Check node metrics
kubectl top nodes
# Check pod metrics
kubectl top pods -n autocom
HPA Configuration
API Autoscaler
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: api-hpa
namespace: autocom
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: api
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 50
periodSeconds: 60
scaleUp:
stabilizationWindowSeconds: 0
policies:
- type: Percent
value: 100
periodSeconds: 15
- type: Pods
value: 4
periodSeconds: 15
selectPolicy: Max
Frontend Autoscaler
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: frontend-hpa
namespace: autocom
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: frontend
minReplicas: 2
maxReplicas: 8
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 50
periodSeconds: 60
scaleUp:
stabilizationWindowSeconds: 0
policies:
- type: Percent
value: 100
periodSeconds: 15
Nginx Autoscaler
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: nginx-hpa
namespace: autocom
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: nginx
minReplicas: 2
maxReplicas: 6
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
behavior:
scaleDown:
stabilizationWindowSeconds: 300
scaleUp:
stabilizationWindowSeconds: 0
Scaling Behavior
Scale Up Behavior
behavior:
scaleUp:
stabilizationWindowSeconds: 0 # Scale up immediately
policies:
- type: Percent
value: 100 # Double pods
periodSeconds: 15 # Every 15 seconds
- type: Pods
value: 4 # Or add 4 pods
periodSeconds: 15 # Every 15 seconds
selectPolicy: Max # Use the larger of the two
Explanation:
stabilizationWindowSeconds: 0- No delay before scaling upselectPolicy: Max- Choose the policy that adds the most pods- Can double pods OR add 4, whichever is greater
Scale Down Behavior
behavior:
scaleDown:
stabilizationWindowSeconds: 300 # Wait 5 minutes
policies:
- type: Percent
value: 50 # Remove 50% of pods
periodSeconds: 60 # Every 60 seconds
Explanation:
stabilizationWindowSeconds: 300- Wait 5 minutes before scaling down- Prevents flapping during temporary load drops
- Removes at most 50% of pods per minute
Resource Limits
For HPA to work correctly, deployments must have resource requests:
resources:
requests:
cpu: 200m # 0.2 CPU cores
memory: 384Mi # 384 MiB
limits:
cpu: 1000m # 1 CPU core
memory: 768Mi # 768 MiB
Recommended Values
| Component | CPU Request | CPU Limit | Memory Request | Memory Limit |
|---|---|---|---|---|
| API | 200m | 1000m | 384Mi | 768Mi |
| Nginx | 100m | 500m | 128Mi | 256Mi |
| Frontend | 100m | 500m | 256Mi | 512Mi |
Monitoring HPA
Check HPA Status
kubectl get hpa -n autocom
Example output:
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS
api-hpa Deployment/api cpu: 1%/70%, memory: 4%/80% 2 10 2
frontend-hpa Deployment/frontend cpu: 4%/70%, memory: 23%/80% 2 8 2
nginx-hpa Deployment/nginx cpu: 1%/70% 2 6 2
Detailed Status
kubectl describe hpa api-hpa -n autocom
Shows:
- Current metrics
- Scaling events
- Conditions
Watch Scaling Events
kubectl get hpa -n autocom -w
Testing Autoscaling
Generate Load
Using hey load testing tool:
# Install hey
go install github.com/rakyll/hey@latest
# Generate load (1000 requests, 50 concurrent)
hey -n 1000 -c 50 http://localhost:8080/api/v1/health
Using kubectl run
# Create a load generator pod
kubectl run load-generator --image=busybox -n autocom -- /bin/sh -c "while true; do wget -q -O- http://nginx/api/v1/health; done"
# Watch HPA
kubectl get hpa -n autocom -w
# Clean up
kubectl delete pod load-generator -n autocom
Expected Behavior
- Load increases → CPU usage rises
- CPU exceeds 70% threshold
- HPA triggers scale up
- New pods start (within 30 seconds)
- Load distributed across more pods
- CPU usage decreases per pod
- After load stops, wait 5 minutes
- HPA triggers scale down
Tuning Tips
For Bursty Traffic
behavior:
scaleUp:
stabilizationWindowSeconds: 0
policies:
- type: Pods
value: 6 # Add more pods quickly
periodSeconds: 10
For Steady Traffic
behavior:
scaleUp:
stabilizationWindowSeconds: 30 # Wait before scaling
policies:
- type: Percent
value: 50 # Scale more gradually
periodSeconds: 30
For Cost Optimization
behavior:
scaleDown:
stabilizationWindowSeconds: 600 # Wait 10 minutes
policies:
- type: Percent
value: 25 # Remove only 25% at a time
periodSeconds: 120
Vertical Pod Autoscaler (VPA)
For automatic resource request tuning:
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: api-vpa
namespace: autocom
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: api
updatePolicy:
updateMode: "Off" # Just recommend, don't auto-update
Note: VPA requires the VPA controller to be installed separately.
Troubleshooting
HPA Shows "Unknown" Metrics
# Check if metrics-server is running
kubectl get pods -n kube-system | grep metrics-server
# Check metrics-server logs
kubectl logs -n kube-system deployment/metrics-server
Pods Not Scaling
- Check resource requests are set
- Verify metrics-server is working
- Check HPA events:
kubectl describe hpa api-hpa -n autocom
Scaling Too Aggressively
- Increase
stabilizationWindowSeconds - Reduce scale up
valuepercentage - Add
scaleDownpolicies
Scaling Too Slowly
- Decrease
stabilizationWindowSeconds - Increase scale up
value - Add more aggressive
scaleUppolicies
Summary
| Setting | API | Frontend | Nginx |
|---|---|---|---|
| Min Replicas | 2 | 2 | 2 |
| Max Replicas | 10 | 8 | 6 |
| CPU Target | 70% | 70% | 70% |
| Memory Target | 80% | 80% | - |
| Scale Up Delay | 0s | 0s | 0s |
| Scale Down Delay | 300s | 300s | 300s |
Next Steps
- Kubernetes Deployment for full cluster setup
- PHP Optimization for performance tuning