Performance & Load Testing
Performance Journey
AutoCom went through a systematic optimization process, benchmarked with k6 at each step:
| Stage | Config | Users | req/s | p50 | p95 |
|---|---|---|---|---|---|
| Baseline | PHP-FPM, 1 Postgres | 50 | 27 | 343ms | 2.44s |
| + Octane | FrankenPHP, 1 Postgres | 50 | 102 | 4.75ms | 12ms |
| + Stress | FrankenPHP, 1 Postgres | 200 | 144 | 59ms | 442ms |
| + 500 users | FrankenPHP, 1 Postgres | 500 | 201 | 933ms | 2.61s |
| + CloudNativePG | FrankenPHP, 3 Postgres | 500 | 456 | 122ms | 1.43s |
Total improvement: 17x throughput, 10x concurrent user capacity.
What Made the Difference
Laravel Octane + FrankenPHP (5-10x)
PHP-FPM boots Laravel from scratch on every request — 30-50ms overhead for config loading, service container, module registration. Octane boots once, keeps everything in memory.
PHP-FPM: Request → Boot Laravel (30ms) → Your Code (5ms) → Response = 35ms
Octane: Request → Your Code (5ms) → Response = 5ms
See Octane Deployment Guide for setup and trade-offs.
CloudNativePG Read/Write Split (2.3x)
Single Postgres was the bottleneck at 200+ users. CloudNativePG with 1 primary + 2 replicas distributes reads:
- All
SELECTqueries → replicas (load balanced) - All
INSERT/UPDATE/DELETE→ primary sticky => trueprevents stale reads within same requestMarkRecentWritemiddleware forces primary reads for 5 seconds after writes
See Backup & Recovery for CloudNativePG configuration.
HPA Autoscaling
Horizontal Pod Autoscaler scales API pods from 1 to 5 based on CPU:
kubectl autoscale deployment api --min=1 --max=5 --cpu-percent=60
Under the 500-user stress test, HPA scaled from 1 → 5 pods within 30 seconds.
Load Testing with k6
Load tests live in a separate GitLab repo: autocommerce/load-tests.
Setup
brew install k6
git clone git@gitlab.wexron.io:autocommerce/load-tests.git
cd load-tests
Running Tests
# Get an auth token
TOKEN=$(kubectl exec -n autocom deployment/api -- php artisan tinker --execute="
\$t = \App\Models\Tenant::first(); tenancy()->initialize(\$t);
echo trim(\App\Models\User::first()->createToken('k6')->accessToken);
" 2>&1 | grep "eyJ" | tr -d '\n\r ')
# Smoke test (1 user, verify everything works)
k6 run scripts/smoke.js -e BASE_URL=http://localhost:30080 -e TENANT_ID=demo -e AUTH_TOKEN="$TOKEN"
# Load test (50 users, 4 minutes)
k6 run scripts/load.js -e BASE_URL=http://localhost:30080 -e TENANT_ID=demo -e AUTH_TOKEN="$TOKEN"
# Stress test (ramp to 200 users)
k6 run scripts/stress.js -e BASE_URL=http://localhost:30080 -e TENANT_ID=demo -e AUTH_TOKEN="$TOKEN"
# Extreme stress (500 users)
k6 run scripts/stress-500.js -e BASE_URL=http://localhost:30080 -e TENANT_ID=demo -e AUTH_TOKEN="$TOKEN"
Test Scenarios
| Script | Users | Duration | Purpose |
|---|---|---|---|
smoke.js |
1 | 30s | Verify endpoints work |
load.js |
50 | 4m | Normal production load |
stress.js |
10→200 | 9m | Find degradation point |
stress-500.js |
50→500 | 9m | Extreme stress test |
Thresholds
Tests use these default thresholds:
| Metric | Load test | Stress test |
|---|---|---|
| p95 response | < 500ms | < 2000ms |
| p99 response | < 1000ms | < 3000ms |
| Error rate | < 1% | < 10% |
Optimization Checklist
When performance degrades, check in this order:
- Is Octane running? —
kubectl logs -n autocom deployment/api --tail=1should show FrankenPHP - How many workers? —
kubectl exec deployment/api -- php artisan octane:status - Is caching working? — Check
X-Cache: HITheader on GET responses - Database connections? —
kubectl exec deployment/postgres -- psql -c "SELECT count(*) FROM pg_stat_activity" - Replication lag? —
kubectl get cluster autocom-db -n autocom - Pod resources? —
kubectl top pods -n autocom - HPA scaled? —
kubectl get hpa -n autocom
Response Caching
The CacheApiResponse middleware auto-caches all GET requests with smart TTLs:
| Endpoint | TTL | Reason |
|---|---|---|
/modules, /roles, /permissions |
60-120s | Rarely changes |
/orders, /customers, /products |
5s | Fresh but absorbs burst |
/notifications, /dashboard |
10s | Semi-real-time |
/translate/languages, /presets |
5min-1hr | Static data |
/auth/*, /install/* |
Never | Mutations/security |
Skip cache with Cache-Control: no-cache header.
Results Tracking
Load test results are tracked in a dedicated GitLab repo: autocommerce/load-tests.
The RESULTS.md file documents every run with:
- Date, config, and changes made
- Side-by-side metric comparisons (throughput, p50, p95, error rate)
- Analysis of what improved or degraded
- Next action items
This provides a historical record of performance decisions and their measured impact.
Infrastructure Stack
| Component | Technology | Purpose |
|---|---|---|
| App server | Laravel Octane + FrankenPHP | Zero-bootstrap API serving |
| Database | CloudNativePG (1 primary + 2 replicas) | Read/write split, auto-failover |
| Cache/Queue | Valkey 8 (Redis-compatible, BSD-3 license) | Sessions, cache, queues |
| Proxy | Nginx | CORS, gzip, reverse proxy to Octane |
| Autoscaling | K8s HPA | 3→10 API pods based on CPU |
| Backups | MinIO + CloudNativePG WAL archiving | 6-hour snapshots, 7-day PITR |
Why Valkey over Redis?
AutoCom uses Valkey instead of Redis for open-source compliance:
- License: BSD-3 (fully open source) vs Redis's RSAL/SSPL dual license
- Compatibility: Drop-in replacement — same protocol, same commands, same Laravel driver
- Performance: On par with Redis 7, with additional improvements in Valkey 8
- Governance: Linux Foundation project, community-driven