POST /api/arena/chat/stream
Stream AI model responses in real-time using Server-Sent Events. Tokens are delivered as they are generated.
POST
Streaming Chat Stable
Stream AI model responses token-by-token using . Ideal for real-time chat interfaces.Authentication Required
Request Body
User message textValidation (Line 452):
Model name or
"auto" for random selectionSelection Logic (Lines 468-470):System prompt for chat contextDefault: Implementation-defined and not guaranteed (provider-specific)
Maximum tokens in responseProvider Limits: Model-dependent
Sampling temperature (0.0-2.0)Default: Not specified by API contract (provider-defined)
Thread UUID (not persisted in streaming mode)Note: Streaming endpoints do NOT write to
thread_messages tableResponse
Content-Type:text/event-stream (Line 450)
Headers (implicit in ASP.NET SSE):
Event Types
Incremental text chunk (Lines 486-502)Format:Emitted: 0+ times during stream
Stream completionFormat:Emitted: Once at end of successful stream
Stream error (Lines 456-463, 512-519)Format:Error Codes:
INVALID_REQUEST: Prompt validation failed (Line 459)STREAM_ERROR: Provider or network error (Line 515)
Side Effects
Database Mutations: NONE for streaming endpoints Logging: Internal only (not persisted to user-facing tables) Internal Audit:message_logs table write not enforced by server contract (implementation detail)
Behavior
SSE Implementation (Lines 450, 486-502):- Content-Type set immediately (Line 450)
- Each event written as
data: {json}\n\n(Line 495) - Immediate flush after each event (Line 496)
- Cancellation token monitors client disconnect (Line 504)
- Passed to provider’s
StreamAsyncmethod - Provider monitors token and terminates stream on cancellation
- No error written to client (they already disconnected)
- If stream already started, error event written
- If stream closed (client disconnect), write attempt silently fails
- No 500 status code returned (SSE already in progress)
- Null values omitted
- camelCase property names
- Single-line JSON (no embedded newlines)
Event Lifecycle
Successful Stream:Constraints
No Dual-Chat Streaming: Only single-chat mode supports streaming No Database Write: Streaming responses NOT persisted tothread_messages table
Reason: Partial streams cannot be meaningfully stored
Workaround: Use non-streaming /api/arena/chat if persistence required
UTF-8 Encoding: All events UTF-8 encoded (Line 464 decoder implicit)
Single-Line JSON: Event payloads MUST NOT contain newlines (breaks SSE format)
Provider Support: Only providers implementing StreamAsync method support streaming
SSE Format Specification
Event Format:data:prefix (required)- Space after colon
- JSON object (single line)
\n\ndouble newline (event terminator)
- Read lines until
\n\n - Extract line starting with
data: - Parse JSON from position 6 onward
- Handle event based on
objectfield
Error Conditions
| Code | Format | Cause | Controller Line |
|---|---|---|---|
INVALID_REQUEST | SSE event | Prompt null/whitespace | 452-463 |
STREAM_ERROR | SSE event | Provider exception | 506-525 |
| N/A | Silent | Client disconnect | 504 (token) |
Edge Cases
- Empty prompt: Error event written, stream terminates (Lines 452-463)
- Client disconnect mid-stream: Provider cancels, no error event
- Provider timeout: Exception caught, error event written if possible
- Invalid model name: Fallback to Groq provider (Lines 476-484)
- JSON serialization failure: Caught and logged, event skipped (Lines 498-501)
- Stream already flushed: Error write attempt fails silently (Lines 520-524)
Performance Characteristics
First Token Latency: Provider-dependent- Groq: 100-300ms typical
- Bytez: Higher latency
- Groq: High frequency (near real-time)
- Some providers: Batched chunks
- Provider-level timeouts apply
- Client can cancel anytime
Rate Limits
No explicit rate limiting. Provider limits apply:- Groq free tier: 30 req/min
- Streaming counts same as non-streaming
Client Implementation Notes
Fetch API Recommended: EventSource doesn’t support POST or custom headers Backpressure: Client must read stream continuously or buffer will fill Reconnection: Not automatic, client must implement retry logic Error Handling: Parseai.error events and display to user