Chat & Streaming

The chat endpoint runs a PydanticAI agent that can call tools, remember past conversations, and stream responses token-by-token.

Basic Chat

POST /api/v1/ai/agent/chat
X-Tenant: your-tenant-id
Authorization: Bearer your-token

{
  "message": "Where is my order ORD-20260315-A1B2?",
  "role": "support"
}

Response:

{
  "conversation_id": "uuid",
  "message": {
    "role": "assistant",
    "content": "Your order ORD-20260315-A1B2 is currently shipped and out for delivery. Tracking: https://track.delhivery.com/DLV123456789. Expected delivery by tomorrow."
  },
  "rag_sources": [],
  "usage": {
    "provider": "openai",
    "model": "gpt-4o",
    "input_tokens": 245,
    "output_tokens": 52,
    "estimated_cost": 0.001132,
    "latency_ms": 1823
  }
}

Behind the scenes:

  1. Laravel builds TenantContext from the tenant's AI config
  2. Python creates a PydanticAI agent with the support role's system prompt and tools
  3. Agent decides to call orders.get tool → callback to Laravel → ModuleApiBus
  4. Agent receives order data, composes response
  5. Conversation turn saved to tenant DB

Continuing a Conversation

Pass conversation_id to continue a conversation with memory:

POST /api/v1/ai/agent/chat
{
  "message": "Can you cancel it?",
  "conversation_id": "uuid-from-previous-response"
}

The agent loads the previous messages and knows "it" refers to the order discussed earlier.

How Memory Works

Request with conversation_id
  → Python calls Laravel: POST /internal/agent/memory/load
  → Laravel loads messages from ai_conversation_messages (tenant DB)
  → Last 10 messages injected as context into the agent prompt
  → Agent responds with full conversation awareness
  → New turn saved: POST /internal/agent/memory/save

Memory is stored in the tenant's PostgreSQL database via two tables:

  • ai_conversations — tracks session, role, customer phone, total tokens/cost
  • ai_conversation_messages — individual messages with role, content, tool calls

SSE Streaming

For real-time token-by-token responses:

POST /api/v1/ai/agent/chat/stream
Content-Type: application/json

{
  "message": "Summarize my sales this week",
  "role": "analytics"
}

Response: Server-Sent Events stream

event: token
data: {"text": "This"}

event: token
data: {"text": " week"}

event: token
data: {"text": " your"}

event: token
data: {"text": " total"}

...

event: usage
data: {"provider": "openai", "model": "gpt-4o", "input_tokens": 180, "output_tokens": 95, "estimated_cost": 0.0014, "latency_ms": 2150}

event: done
data: {"conversation_id": "uuid", "finish_reason": "stop"}

Event Types

Event Data When
token {"text": "..."} Each token as it's generated
usage {provider, model, tokens, cost} After generation completes
done {conversation_id, finish_reason} Stream complete
error {"message": "..."} On failure

Frontend Integration

const response = await fetch('/api/v1/ai/agent/chat/stream', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    'Authorization': `Bearer ${token}`,
    'X-Tenant': tenantId,
  },
  body: JSON.stringify({ message: userInput, role: 'support' }),
});

const reader = response.body.getReader();
const decoder = new TextDecoder();

while (true) {
  const { done, value } = await reader.read();
  if (done) break;

  const text = decoder.decode(value);
  // Parse SSE events
  for (const line of text.split('\n')) {
    if (line.startsWith('data: ')) {
      const data = JSON.parse(line.slice(6));
      // Append to UI
    }
  }
}

Request Parameters

Parameter Type Default Description
message string required User's message (max 10,000 chars)
conversation_id string null Continue existing conversation
role string "support" Agent role: support, operations, analytics, or custom
include_rag boolean true Whether to search KB for context

Usage Tracking

Every chat request is logged to ai_usage_logs in the tenant database with:

  • Provider and model used
  • Token counts (input/output)
  • Estimated cost in USD
  • Latency in milliseconds
  • Operation type (e.g., agent_chat:support)
  • Conversation ID for correlation

View usage via the existing GET /api/v1/ai/usage endpoint.