Deployment Rings

Deployment rings are how AutoCom safely runs different module versions for different tenants on the same shared platform. Each ring is an independent slice of the cluster (its own pods, its own pinned module versions) and tenants are routed to exactly one ring at a time.

This page explains the design, the operator runbook, and the trade-offs. For background on the underlying versioning system that rings sit on top of, read Module Versioning Overview first.

The Problem Rings Solve

PHP class loading is process-global. Once Composer's autoloader binds Modules\Orders\OrderController to a file, that mapping is locked for the life of the PHP process — and Octane processes live for hours, serving thousands of requests across many tenants. You physically cannot have two versions of the same class in the same process.

So per-request, per-tenant version swapping is not a thing in PHP. The constraint has to move to a higher boundary: different processes for different versions.

That's what rings are. Each ring is its own set of api/horizon/nginx pods running its own pinned module versions. Tenants are routed to a specific ring's pods at the ingress layer based on a single column in the central database.

Architecture

                    ┌──────────────────────────────────┐
                    │  Shared (single instance)         │
                    │  ├── PostgreSQL (per-tenant DBs)  │
                    │  ├── Redis                        │
                    │  ├── MinIO / S3                   │
                    │  ├── Frontend (one Next.js image) │
                    │  ├── Docs                         │
                    │  └── indian-post (microservice)   │
                    └──────────────────────────────────┘
                                  ▲
                                  │ central DB has tenants.module_ring_id
                                  │
        ┌─────────────────┬───────┴────────┬─────────────────┐
        │                 │                │                 │
   ring: stable      ring: edge       ring: canary    ring: experimental
   ─────────────     ─────────────    ─────────────   ─────────────
   k8s namespace:    autocom-edge     autocom-canary  autocom-experimental
   autocom

   manifest:         manifest:        manifest:       manifest:
   stable.lock.json  edge.lock.json   canary.lock.    experimental.lock
                                      json            .json

   Pods:             Pods:            Pods:           Pods:
   api × N           api × M          api × 2         api × 1
   horizon × N       horizon × M      horizon × 1     horizon × 1
   nginx × 2         nginx × 2        nginx × 1       nginx × 1

   Tenants:          Tenants:         Tenants:        Tenants:
   - acme            - widgets        - test-canary   - internal-qa
   - foo             - bar
   - …               - …

Per-ring vs shared services

Component Per-ring Shared Why
api Loads module code at boot — needs ring-specific code
horizon Same Octane workers, same module code
nginx Routes to the ring's api pods specifically
frontend Same UI for everyone; API hostname is ring-aware
docs Same documentation for everyone
redis Cache; tenant-isolated by key prefix
postgres (CNPG) Tenant data; per-tenant DBs inside one cluster
indian-post Stateless microservice
minio Storage; tenant-isolated by bucket prefix

The shared layer is shared because it's either stateful (DB, MinIO) or version-irrelevant (frontend, docs, microservices). A tenant's data lives in its tenant DB regardless of which ring serves it.

K8s manifest layout

The base manifests are split into two subdirectories so ring overlays can pull in only the per-ring services without duplicating shared infra:

k8s/
├── base/
│   ├── kustomization.yaml          ← meta-kustomization that includes both subdirs
│   ├── namespace.yaml
│   ├── per-ring/                   ← one set per ring
│   │   ├── kustomization.yaml
│   │   ├── api.yaml
│   │   ├── horizon.yaml
│   │   ├── nginx.yaml              ← uses nginxinc/nginx-unprivileged for PSA restricted
│   │   └── network-policies.yaml
│   ├── shared/                     ← single instance regardless of ring count
│   │   ├── kustomization.yaml
│   │   ├── frontend.yaml
│   │   ├── docs.yaml
│   │   ├── redis.yaml
│   │   └── indian-post.yaml
│   └── resource-quota.yaml
└── overlays/
    ├── shared/                     ← deploy the shared layer once on cluster bootstrap
    │   └── kustomization.yaml
    └── rings/
        ├── _shared-aliases/        ← kustomize component with ExternalName services
        │   ├── kustomization.yaml  ← bridges non-stable rings to shared services
        │   └── aliases.yaml
        ├── stable/                 ← deploys into 'autocom' (no aliases needed)
        ├── edge/                   ← deploys into 'autocom-edge' (uses _shared-aliases)
        └── canary/                 ← deploys into 'autocom-canary' (uses _shared-aliases)

The existing single-deployment overlays (local, production, staging) reference k8s/base/ and pull in both layers — backward-compatible. Ring overlays reference k8s/base/per-ring/ only.

The _shared-aliases component creates ExternalName services in each non-stable ring's namespace that point at the real services in autocom. So redis, autocom-db-rw, frontend, docs, indian-post, minio etc. are reachable from a canary pod by short name — no app-config changes needed.

Environment variables that steer a pod to its ring

Each per-ring pod is shaped by two env vars injected from the ring's overlay configmap-ring.yaml:

Variable Purpose Resolution
RING_NAME Ring identity reported by ModuleLoaderService::ringName() and logged on every module load Defaults to stable if unset
MODULE_LOCK_PATH Explicit path to the lock file this pod should read (takes precedence over the per-ring default) Defaults to modules/manifests/{RING_NAME}.lock.json, then falls back to the legacy modules/manifest.lock.json

ModuleLoaderService::ringLockPath() resolves in this order:

  1. env('MODULE_LOCK_PATH') if set — explicit override wins
  2. modules/manifests/{ringName()}.lock.json if that file exists
  3. modules/manifest.lock.json as a final fallback

Set MODULE_LOCK_PATH directly when you need a pod to read a non-default lock file — e.g. to test a promotion before applying it, or to run an ephemeral ring off a one-off manifest.

Hostname Convention

Ring routing happens at the ingress layer based on hostname:

acme.stable.acme.io       → stable ring (autocom namespace)
acme.edge.acme.io         → edge ring (autocom-edge namespace)
acme.canary.acme.io       → canary ring (autocom-canary namespace)
acme.acme.io              → backward-compat alias for stable ring

The shorter <tenant>.acme.io form is preserved as a backward-compatible alias for the stable ring so existing tenant URLs keep working. New ring-specific hostnames are issued only when a tenant moves off stable.

To verify which ring a request is hitting:

curl http://acme.canary.acme.io/api/health
{
  "status": "ok",
  "ring": {
    "name": "canary",
    "lock_path": "modules/manifests/canary.lock.json"
  },
  "services": { "database": "up", "redis": "up" }
}

The ring.name field in /api/health is the operator's primary sanity check.

Data Model

module_rings table (central DB)

module_rings
├── id
├── name                  unique short identifier (stable, edge, canary)
├── display_name          UI label
├── description
├── manifest_path         path to this ring's lock file
├── k8s_namespace         which k8s namespace this ring lives in
├── k8s_service           in-cluster nginx service hostname
├── hostname_segment      ring segment for public URLs (e.g. "canary")
├── promotion_order       0=most conservative, higher=more bleeding edge
├── is_active             whether tenants can be assigned here
├── is_default            true for the default ring (exactly one)
└── timestamps

Seeded automatically with one row: stable. New rings are created via:

php artisan tinker
# ModuleRing::create([
#   'name' => 'edge',
#   'display_name' => 'Edge',
#   'manifest_path' => 'modules/manifests/edge.lock.json',
#   'k8s_namespace' => 'autocom-edge',
#   'k8s_service' => 'nginx.autocom-edge.svc.cluster.local',
#   'hostname_segment' => 'edge',
#   'promotion_order' => 10,
#   'is_active' => true,
# ]);

tenants.module_ring_id column

A foreign key into module_rings. Default for all existing tenants: the stable ring. New tenants default to the ring marked is_default = true.

Operator Runbook

See current state

# All rings + tenant counts
php artisan ring:list

# Detail for a specific ring + the tenants in it
php artisan ring:show stable --tenants

# Detail + the modules pinned in this ring's manifest
php artisan ring:show edge --modules

Move a tenant to a different ring

php artisan ring:assign acme canary
# (interactive confirmation)

# Skip confirmation
php artisan ring:assign acme canary --force

# Add an audit reason (logged to the operation history)
php artisan ring:assign acme canary --reason="opted into beta program"

The change takes effect on the next request from that tenant. In-flight requests on the old ring complete normally. Long-running connections (Reverb websockets) drop and reconnect to the new ring.

Promote a module version from one ring to another

# Promote orders @ whatever-version-edge-has → stable
php artisan ring:promote orders --from=edge --to=stable

# Pin a specific version directly (no source ring lookup)
php artisan ring:promote orders --with-version=1.5.0 --to=edge

# Skip confirmation
php artisan ring:promote orders --from=edge --to=stable --force

This rewrites the destination ring's manifests/<ring>.lock.json file with the new entry. It does NOT trigger any deploy itself — you commit the change to git, push to main, and CI rolls out the destination ring on the next pipeline run.

Add a new ring

End-to-end runbook for spinning up a new ring (e.g. canary). Steps 1–5 are one-time-per-ring; step 6 is for every tenant move thereafter.

Step 1: Register the ring in the database

kubectl exec -n autocom deployment/api -- php artisan tinker --execute='
  \App\Models\ModuleRing::create([
    "name" => "canary",
    "display_name" => "Canary",
    "description" => "Bleeding edge — internal QA tenants only",
    "manifest_path" => "modules/manifests/canary.lock.json",
    "k8s_namespace" => "autocom-canary",
    "k8s_service" => "nginx.autocom-canary.svc.cluster.local",
    "hostname_segment" => "canary",
    "promotion_order" => 20,
    "is_active" => true,
    "is_default" => false,
  ]);
'

Step 2: Initialize the canary lock file

cp modules/manifests/stable.lock.json modules/manifests/canary.lock.json
git add modules/manifests/canary.lock.json
git commit -m "rings: initialize canary lock file from stable"
git push gitlab-main main

Step 3: Create the namespace + copy required secrets

The ring's pods need the same image-pull, app, and OAuth secrets that exist in the autocom namespace:

kubectl create namespace autocom-canary

# Copy gitlab-registry image pull secret
kubectl get secret gitlab-registry -n autocom -o yaml | \
  sed 's/namespace: autocom/namespace: autocom-canary/' | \
  kubectl apply -f -

# Copy autocom-secrets (APP_KEY, DB credentials, registry token, etc.)
kubectl get secret autocom-secrets -n autocom -o yaml | \
  sed 's/namespace: autocom/namespace: autocom-canary/' | \
  kubectl apply -f -

# Copy passport-keys for OAuth
kubectl get secret passport-keys -n autocom -o yaml | \
  sed 's/namespace: autocom/namespace: autocom-canary/' | \
  kubectl apply -f -

# Copy the autocom-config ConfigMap (DB host, Redis host, app settings)
kubectl get configmap autocom-config -n autocom -o yaml | \
  sed 's/namespace: autocom/namespace: autocom-canary/' | \
  kubectl apply -f -

This copy is one-time per ring. Nothing syncs it. If you rotate APP_KEY, add a new secret, or update the config map in autocom, the canary ring keeps serving the old values until an operator manually re-copies every object above. There is no auto-sync (no External Secrets, no Sealed Secrets wiring, no Reflector). If you add a second or third ring, write yourself a short shell script that does the copy for all of them at once so you don't forget one during a rotation.

Step 4: Apply the canary ring overlay

kubectl apply -k k8s/overlays/rings/canary

This creates 3 deployments (api, horizon, nginx) tagged with ring: canary, plus the 8 ExternalName service aliases that bridge the canary namespace to the shared services (db, redis, minio, frontend, docs, indian-post) running in autocom.

Step 5: Verify the ring is alive and reporting itself

kubectl rollout status deployment/api -n autocom-canary --timeout=120s

# Hit the canary api directly — should report ring=canary
kubectl exec -n autocom-canary deployment/api -- \
  curl -s http://localhost:8000/api/health | jq .ring

# Compare to stable
kubectl exec -n autocom deployment/api -- \
  curl -s http://localhost:8000/api/health | jq .ring

If both return the same ring name, the overlay didn't apply correctly. Most common cause: the api pod fell back to env-default stable because RING_NAME wasn't injected — check kubectl describe pod -n autocom-canary -l app=api | grep RING_NAME.

Step 6: Move tenants in

php artisan ring:assign acme canary
# or with audit reason:
php artisan ring:assign acme canary --reason="opted into beta program" --force

The change takes effect on the tenant's next request. No data migration, no downtime, no app restart.

Promote via CI (audit-trail-friendly)

For changes that need an audit trail, use the manual CI promotion pipeline instead of the artisan command:

  1. In the GitLab UI, go to autocommerce/main → Pipelines → Run Pipeline
  2. Set variables:
    • PROMOTE_ALIAS=orders
    • PROMOTE_FROM=canary
    • PROMOTE_TO=edge
  3. Click Run

The ring-promote job rewrites the destination manifest, commits with a message like promote(edge): orders 1.5.0 → 1.6.0, and pushes to main. The git history becomes your full audit log.

Ring Promotion Strategy

A typical flow for shipping a module version safely:

Developer ships Orders v1.6.0
         │
         ▼
CI release pipeline produces ShippingIndiaPost-1.6.0.zip
         │
         ▼
Operator: ring:promote orders --to=canary --with-version=1.6.0
         │
         ▼
Canary ring deploys → 1-3 internal tenants run it for 24-48h
         │
         ▼
Operator: ring:promote orders --from=canary --to=edge
         │
         ▼
Edge ring deploys → opt-in customers run it for ~1 week
         │
         ▼
Operator: ring:promote orders --from=edge --to=stable
         │
         ▼
Stable ring deploys → all remaining tenants run it

Each promotion step is reversible: roll the destination ring's manifest back, redeploy. The package registry retains every released version forever, so rollback is just a manifest-file edit.

Trade-offs

What rings give you

  • Real isolation. A bad version on canary cannot crash, OOM, or starve tenants on stable — they're literally in different OS processes.
  • Independent scaling. Hot tenants on one ring don't affect cold tenants on another. Each ring scales on its own metrics.
  • Trivially incremental. Day 1 has exactly one ring (stable), which is what you have today. Add a canary ring only when a customer needs it.
  • Tenant promotion is one DB row update. Move a tenant from stable → canary → next request lands on the new ring.
  • Industry-standard pattern. Microsoft 365 (Insider/Beta/Current/Monthly Enterprise/Semi-Annual), GitHub Enterprise, GitLab.com, Notion, Linear — every major SaaS doing safe rollouts uses rings.

What rings do NOT solve

  • Cross-ring data sharing in real time. Tenants on different rings can't directly call each other's modules. They share the same Postgres so their data is reachable, but their code is isolated. This is by design — that's where the safety comes from.

  • In-place tenant promotion without a request gap. Moving a tenant from stable → edge takes effect on the next request. There's a tiny window where in-flight requests on the old ring complete normally and new requests start going to the new ring. Fine for everything except long-running websockets, which drop and reconnect.

  • Different DB schemas per ring. Tenant DBs are shared across rings. If Orders 1.5.0 adds a column, that column has to be backward-compatible (additive only) so Orders 1.4.0 on the stable ring doesn't choke. This is a real discipline cost — it's how every SaaS doing rolling upgrades operates.

  • Frontend versioning. The frontend is single-version. If you need different UI per ring, that's a separate problem (feature flags or per-ring frontend deploys).

  • Operational cost. N rings means N×(api+horizon+nginx) pods. Day-1 cost is one ring (zero overhead). Adding a second ring roughly doubles the per-tenant pod budget. Plan capacity accordingly.

Verifying Ring Assignment

The /api/health endpoint exposes the current ring. Operators can verify routing in 5 seconds:

# Hit the same tenant via different hostnames — different rings respond
$ curl http://acme.stable.acme.io/api/health  | jq .ring
{ "name": "stable", "lock_path": "modules/manifests/stable.lock.json" }

$ curl http://acme.canary.acme.io/api/health | jq .ring
{ "name": "canary", "lock_path": "modules/manifests/canary.lock.json" }

If both return the same ring name, your ingress routing isn't wired up correctly. Check:

  • kubectl get ingress -A — is the canary ring's ingress present?
  • kubectl get pods -n autocom-canary — are the canary pods running?
  • kubectl logs -n autocom-canary deployment/api | grep ringName — what RING_NAME did the pod boot with?

Quick Reference

# Inspect
php artisan ring:list                            # all rings
php artisan ring:show stable --tenants --modules # one ring, full detail

# Assign tenants
php artisan ring:assign acme canary
php artisan ring:assign acme canary --reason="opted in"

# Promote module versions across rings
php artisan ring:promote orders --from=canary --to=edge
php artisan ring:promote orders --with-version=1.5.0 --to=edge --force

# Generate per-ring manifests
php artisan module:lock --ring=stable
php artisan module:lock --ring=edge
php artisan module:verify --ring=canary

# Apply ring overlays to the cluster
kubectl apply -k k8s/overlays/rings/stable
kubectl apply -k k8s/overlays/rings/edge
kubectl apply -k k8s/overlays/rings/canary

# Verify which ring a request is hitting
curl http://<tenant>.<ring>.acme.io/api/health | jq .ring

Known Pitfalls

Lessons from the rings rollout that aren't obvious until they bite.

PSA restricted + image filesystem writes

If a cluster enforces Pod Security Admission restricted, every pod must run as a non-root user. Two services in the per-ring manifests hit this and need specific fixes:

  • nginx: the default nginx:alpine runs its entrypoint as root and chowns /var/cache/nginx/* during boot. Under PSA restricted that fails with chown: Operation not permitted. k8s/base/per-ring/nginx.yaml uses nginxinc/nginx-unprivileged:alpine instead (listens on 8080, not 80). Don't swap it back.
  • api / horizon (Octane): backend/Dockerfile.octane chowns the whole /app tree to UID 1000 at build time. The first attempt only chowned storage/ and bootstrap/cache/, which looked complete but missed /app/public/frankenphp-worker.php — Octane's InstallsFrankenPhpDependencies writes that file on first boot and fails with Permission denied otherwise. If you fork the image, keep the chown -R 1000:1000 /app line in place.

Octane memory ceiling scales with module count

Horizon loads every module's service provider at boot. With 23 modules (AI, Workflows, WMS, Reseller*, all the themes…) the steady-state footprint peaks around 900Mi–1Gi. The per-ring manifest sets limits.memory: 1536Mi with that headroom; earlier values of 384Mi and 768Mi both tripped OOMKilled during module bootstrap. If you add another ~10 modules, re-profile and bump again.

Secrets don't sync between rings

See the note under Step 3 of "Add a new ring" — every secret and config map you need in a new ring namespace is copied manually and stays stale until you copy it again. This is the single biggest operational gotcha for multi-ring setups. A follow-up to wire External Secrets or the Reflector controller into the shared overlay would close this.

No auto-rollback on failed hooks

php artisan module:install and module:upgrade do NOT auto-rollback if the module's onInstall / onUpgrade hook throws. The content swap commits before the hook runs, so a failed hook leaves the module installed in an inconsistent state. The operator has to manually run module:rollback <alias> after investigating. See Module Lifecycle → Error handling for the full semantics.

See Also