Chat & Streaming

The chat endpoint runs a PydanticAI agent that can call tools, remember past conversations, and stream responses token-by-token.

Basic Chat

POST /api/v1/ai/agent/chat
X-Tenant: your-tenant-id
Authorization: Bearer your-token

{
  "message": "Where is my order ORD-20260315-A1B2?",
  "role": "support"
}

Response:

{
  "conversation_id": "uuid",
  "message": {
    "role": "assistant",
    "content": "Your order ORD-20260315-A1B2 is currently shipped and out for delivery. Tracking: https://track.delhivery.com/DLV123456789. Expected delivery by tomorrow."
  },
  "rag_sources": [],
  "usage": {
    "provider": "openai",
    "model": "gpt-4o",
    "input_tokens": 245,
    "output_tokens": 52,
    "estimated_cost": 0.001132,
    "latency_ms": 1823
  }
}

Behind the scenes:

Laravel builds TenantContext from the tenant's AI config
Python creates a PydanticAI agent with the support role's system prompt and tools
Agent decides to call orders.get tool → callback to Laravel → ModuleApiBus
Agent receives order data, composes response
Conversation turn saved to tenant DB

Continuing a Conversation

Pass conversation_id to continue a conversation with memory:

POST /api/v1/ai/agent/chat
{
  "message": "Can you cancel it?",
  "conversation_id": "uuid-from-previous-response"
}

The agent loads the previous messages and knows "it" refers to the order discussed earlier.

How Memory Works

Request with conversation_id
  → Python calls Laravel: POST /internal/agent/memory/load
  → Laravel loads messages from ai_conversation_messages (tenant DB)
  → Last 10 messages injected as context into the agent prompt
  → Agent responds with full conversation awareness
  → New turn saved: POST /internal/agent/memory/save

Memory is stored in the tenant's PostgreSQL database via two tables:

ai_conversations — tracks session, role, customer phone, total tokens/cost
ai_conversation_messages — individual messages with role, content, tool calls

SSE Streaming

For real-time token-by-token responses:

POST /api/v1/ai/agent/chat/stream
Content-Type: application/json

{
  "message": "Summarize my sales this week",
  "role": "analytics"
}

Response: Server-Sent Events stream

event: token
data: {"text": "This"}

event: token
data: {"text": " week"}

event: token
data: {"text": " your"}

event: token
data: {"text": " total"}

...

event: usage
data: {"provider": "openai", "model": "gpt-4o", "input_tokens": 180, "output_tokens": 95, "estimated_cost": 0.0014, "latency_ms": 2150}

event: done
data: {"conversation_id": "uuid", "finish_reason": "stop"}

Event Types

Event	Data	When
`token`	`{"text": "..."}`	Each token as it's generated
`usage`	`{provider, model, tokens, cost}`	After generation completes
`done`	`{conversation_id, finish_reason}`	Stream complete
`error`	`{"message": "..."}`	On failure

Frontend Integration

const response = await fetch('/api/v1/ai/agent/chat/stream', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    'Authorization': `Bearer ${token}`,
    'X-Tenant': tenantId,
  },
  body: JSON.stringify({ message: userInput, role: 'support' }),
});

const reader = response.body.getReader();
const decoder = new TextDecoder();

while (true) {
  const { done, value } = await reader.read();
  if (done) break;

  const text = decoder.decode(value);
  // Parse SSE events
  for (const line of text.split('\n')) {
    if (line.startsWith('data: ')) {
      const data = JSON.parse(line.slice(6));
      // Append to UI
    }
  }
}

Request Parameters

Parameter	Type	Default	Description
`message`	string	required	User's message (max 10,000 chars)
`conversation_id`	string	null	Continue existing conversation
`role`	string	`"support"`	Agent role: `support`, `operations`, `analytics`, or custom
`include_rag`	boolean	`true`	Whether to search KB for context

Usage Tracking

Every chat request is logged to ai_usage_logs in the tenant database with:

Provider and model used
Token counts (input/output)
Estimated cost in USD
Latency in milliseconds
Operation type (e.g., agent_chat:support)
Conversation ID for correlation

View usage via the existing GET /api/v1/ai/usage endpoint.