Chat & Streaming
The chat endpoint runs a PydanticAI agent that can call tools, remember past conversations, and stream responses token-by-token.
Basic Chat
POST /api/v1/ai/agent/chat
X-Tenant: your-tenant-id
Authorization: Bearer your-token
{
"message": "Where is my order ORD-20260315-A1B2?",
"role": "support"
}
Response:
{
"conversation_id": "uuid",
"message": {
"role": "assistant",
"content": "Your order ORD-20260315-A1B2 is currently shipped and out for delivery. Tracking: https://track.delhivery.com/DLV123456789. Expected delivery by tomorrow."
},
"rag_sources": [],
"usage": {
"provider": "openai",
"model": "gpt-4o",
"input_tokens": 245,
"output_tokens": 52,
"estimated_cost": 0.001132,
"latency_ms": 1823
}
}
Behind the scenes:
- Laravel builds
TenantContextfrom the tenant's AI config - Python creates a PydanticAI agent with the
supportrole's system prompt and tools - Agent decides to call
orders.gettool → callback to Laravel → ModuleApiBus - Agent receives order data, composes response
- Conversation turn saved to tenant DB
Continuing a Conversation
Pass conversation_id to continue a conversation with memory:
POST /api/v1/ai/agent/chat
{
"message": "Can you cancel it?",
"conversation_id": "uuid-from-previous-response"
}
The agent loads the previous messages and knows "it" refers to the order discussed earlier.
How Memory Works
Request with conversation_id
→ Python calls Laravel: POST /internal/agent/memory/load
→ Laravel loads messages from ai_conversation_messages (tenant DB)
→ Last 10 messages injected as context into the agent prompt
→ Agent responds with full conversation awareness
→ New turn saved: POST /internal/agent/memory/save
Memory is stored in the tenant's PostgreSQL database via two tables:
ai_conversations— tracks session, role, customer phone, total tokens/costai_conversation_messages— individual messages with role, content, tool calls
SSE Streaming
For real-time token-by-token responses:
POST /api/v1/ai/agent/chat/stream
Content-Type: application/json
{
"message": "Summarize my sales this week",
"role": "analytics"
}
Response: Server-Sent Events stream
event: token
data: {"text": "This"}
event: token
data: {"text": " week"}
event: token
data: {"text": " your"}
event: token
data: {"text": " total"}
...
event: usage
data: {"provider": "openai", "model": "gpt-4o", "input_tokens": 180, "output_tokens": 95, "estimated_cost": 0.0014, "latency_ms": 2150}
event: done
data: {"conversation_id": "uuid", "finish_reason": "stop"}
Event Types
| Event | Data | When |
|---|---|---|
token |
{"text": "..."} |
Each token as it's generated |
usage |
{provider, model, tokens, cost} |
After generation completes |
done |
{conversation_id, finish_reason} |
Stream complete |
error |
{"message": "..."} |
On failure |
Frontend Integration
const response = await fetch('/api/v1/ai/agent/chat/stream', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': `Bearer ${token}`,
'X-Tenant': tenantId,
},
body: JSON.stringify({ message: userInput, role: 'support' }),
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const text = decoder.decode(value);
// Parse SSE events
for (const line of text.split('\n')) {
if (line.startsWith('data: ')) {
const data = JSON.parse(line.slice(6));
// Append to UI
}
}
}
Request Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
message |
string | required | User's message (max 10,000 chars) |
conversation_id |
string | null | Continue existing conversation |
role |
string | "support" |
Agent role: support, operations, analytics, or custom |
include_rag |
boolean | true |
Whether to search KB for context |
Usage Tracking
Every chat request is logged to ai_usage_logs in the tenant database with:
- Provider and model used
- Token counts (input/output)
- Estimated cost in USD
- Latency in milliseconds
- Operation type (e.g.,
agent_chat:support) - Conversation ID for correlation
View usage via the existing GET /api/v1/ai/usage endpoint.