Skip to main content
POST
/
api
/
arena
/
chat
/
stream
const response = await fetch('http://localhost:5079/api/arena/chat/stream', {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${jwt}`,
    'Content-Type': 'application/json',
    'Accept': 'text/event-stream'
  },
  body: JSON.stringify({
    prompt: 'Explain quantum computing',
    model: 'llama-3.3-70b-versatile'
  })
});

const reader = response.body.getReader();
const decoder = new TextDecoder();
let fullResponse = '';

while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  
  const chunk = decoder.decode(value);
  const lines = chunk.split('\n\n');
  
  for (const line of lines) {
    if (line.startsWith('data: ')) {
      const event = JSON.parse(line.substring(6));
      
      if (event.object === 'ai.stream.delta') {
        fullResponse += event.delta.text;
        console.log(fullResponse); // Update UI
      } else if (event.object === 'ai.stream.done') {
        console.log('Stream complete');
      } else if (event.object === 'ai.error') {
        console.error(event.message);
      }
    }
  }
}
data: {"object":"ai.stream.delta","delta":{"type":"output_text","text":"Quantum"}}\n\n
data: {"object":"ai.stream.delta","delta":{"type":"output_text","text":" computing"}}\n\n
data: {"object":"ai.stream.delta","delta":{"type":"output_text","text":" uses"}}\n\n
data: {"object":"ai.stream.delta","delta":{"type":"output_text","text":" quantum"}}\n\n
...
data: {"object":"ai.stream.done","finish_reason":"stop","usage":{"totalTokens":135}}\n\n

Streaming Chat Stable

Stream AI model responses token-by-token using . Ideal for real-time chat interfaces.

Authentication Required

Request Body

prompt
string
required
User message textValidation (Line 452):
if (request == null || string.IsNullOrWhiteSpace(request.Prompt)) {
    // Write SSE error event
    await Response.WriteAsync($"data: {errorEvent}\\n\\n");
    return;
}
model
string
Model name or "auto" for random selectionSelection Logic (Lines 468-470):
var selectedModel = string.IsNullOrWhiteSpace(request.Model) || request.Model == "auto"
    ? await _modelSelector.GetRandomModelAsync()
    : request.Model;
system
string
System prompt for chat contextDefault: Implementation-defined and not guaranteed (provider-specific)
maxTokens
integer
Maximum tokens in responseProvider Limits: Model-dependent
temperature
number
Sampling temperature (0.0-2.0)Default: Not specified by API contract (provider-defined)
threadId
string
Thread UUID (not persisted in streaming mode)Note: Streaming endpoints do NOT write to thread_messages table

Response

Content-Type: text/event-stream (Line 450) Headers (implicit in ASP.NET SSE):
Content-Type: text/event-stream
Cache-Control: no-cache
Connection: keep-alive

Event Types

ai.stream.delta
object
Incremental text chunk (Lines 486-502)Format:
{
  "object": "ai.stream.delta",
  "delta": {
    "type": "output_text",
    "text": "chunk"
  }
}
Emitted: 0+ times during stream
ai.stream.done
object
Stream completionFormat:
{
  "object": "ai.stream.done",
  "finish_reason": "stop",
  "usage": {
    "totalTokens": 135
  }
}
Emitted: Once at end of successful stream
ai.error
object
Stream error (Lines 456-463, 512-519)Format:
{
  "object": "ai.error",
  "code": "STREAM_ERROR",
  "message": "Stream failed: timeout"
}
Error Codes:
  • INVALID_REQUEST: Prompt validation failed (Line 459)
  • STREAM_ERROR: Provider or network error (Line 515)
const response = await fetch('http://localhost:5079/api/arena/chat/stream', {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${jwt}`,
    'Content-Type': 'application/json',
    'Accept': 'text/event-stream'
  },
  body: JSON.stringify({
    prompt: 'Explain quantum computing',
    model: 'llama-3.3-70b-versatile'
  })
});

const reader = response.body.getReader();
const decoder = new TextDecoder();
let fullResponse = '';

while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  
  const chunk = decoder.decode(value);
  const lines = chunk.split('\n\n');
  
  for (const line of lines) {
    if (line.startsWith('data: ')) {
      const event = JSON.parse(line.substring(6));
      
      if (event.object === 'ai.stream.delta') {
        fullResponse += event.delta.text;
        console.log(fullResponse); // Update UI
      } else if (event.object === 'ai.stream.done') {
        console.log('Stream complete');
      } else if (event.object === 'ai.error') {
        console.error(event.message);
      }
    }
  }
}
data: {"object":"ai.stream.delta","delta":{"type":"output_text","text":"Quantum"}}\n\n
data: {"object":"ai.stream.delta","delta":{"type":"output_text","text":" computing"}}\n\n
data: {"object":"ai.stream.delta","delta":{"type":"output_text","text":" uses"}}\n\n
data: {"object":"ai.stream.delta","delta":{"type":"output_text","text":" quantum"}}\n\n
...
data: {"object":"ai.stream.done","finish_reason":"stop","usage":{"totalTokens":135}}\n\n

Side Effects

Database Mutations: NONE for streaming endpoints Logging: Internal only (not persisted to user-facing tables) Internal Audit: message_logs table write not enforced by server contract (implementation detail)

Behavior

SSE Implementation (Lines 450, 486-502):
Response.ContentType = "text/event-stream";

Func<AIStreamEvent, Task> onEvent = async (e) => {
    var json = JsonConvert.SerializeObject(e, ...);
    await Response.WriteAsync($"data: {json}\\n\\n");
    await Response.Body.FlushAsync();
};

await provider.StreamAsync(request, onEvent, HttpContext.RequestAborted);
Key Characteristics:
  1. Content-Type set immediately (Line 450)
  2. Each event written as data: {json}\n\n (Line 495)
  3. Immediate flush after each event (Line 496)
  4. Cancellation token monitors client disconnect (Line 504)
Provider Selection (Lines 472-484):
var info = _modelSelector.GetModelInfo(selectedModel);
var providerName = info?.Provider ?? "groq";

IChatProvider provider;
try {
    provider = _chatProviderFactory.GetProvider(providerName);
} catch (Exception ex) {
    provider = _chatProviderFactory.GetGroqProvider(); // Fallback
}
Client Disconnect Handling (Line 504):
HttpContext.RequestAborted // CancellationToken
  • Passed to provider’s StreamAsync method
  • Provider monitors token and terminates stream on cancellation
  • No error written to client (they already disconnected)
Error Handling (Lines 506-525):
try {
    await provider.StreamAsync(...);
} catch (Exception ex) {
    // Try to write error event if stream still open
    try {
        await Response.WriteAsync($"data: {errorEvent}\\n\\n");
        await Response.Body.FlushAsync();
    } catch { /* Ignore */ }
}
Error Recovery:
  • If stream already started, error event written
  • If stream closed (client disconnect), write attempt silently fails
  • No 500 status code returned (SSE already in progress)
JSON Serialization (Lines 490-494):
var json = JsonConvert.SerializeObject(e, new JsonSerializerSettings {
    NullValueHandling = NullValueHandling.Ignore,
    ContractResolver = new CamelCasePropertyNamesContractResolver()
});
  • Null values omitted
  • camelCase property names
  • Single-line JSON (no embedded newlines)

Event Lifecycle

Successful Stream:
1. Response.ContentType set to "text/event-stream"
2. Provider starts streaming
3. For each chunk:
   - ai.stream.delta event
   - Immediate flush to client
4. Stream completes:
   - ai.stream.done event
5. Connection closes
Error During Stream:
1. Response.ContentType set
2. Provider starts streaming
3. Several ai.stream.delta events
4. Provider throws exception
5. ai.error event (if stream still open)
6. Connection closes
Client Disconnect:
1. Response.ContentType set
2. Provider starts streaming
3. Several ai.stream.delta events
4. Client closes connection
5. HttpContext.RequestAborted signaled
6. Provider terminates stream
7. No error event (client gone)

Constraints

No Dual-Chat Streaming: Only single-chat mode supports streaming No Database Write: Streaming responses NOT persisted to thread_messages table Reason: Partial streams cannot be meaningfully stored Workaround: Use non-streaming /api/arena/chat if persistence required UTF-8 Encoding: All events UTF-8 encoded (Line 464 decoder implicit) Single-Line JSON: Event payloads MUST NOT contain newlines (breaks SSE format) Provider Support: Only providers implementing StreamAsync method support streaming

SSE Format Specification

Event Format:
data: {json}\n\n
Field:
  • data: prefix (required)
  • Space after colon
  • JSON object (single line)
  • \n\n double newline (event terminator)
Client Parsing:
  1. Read lines until \n\n
  2. Extract line starting with data:
  3. Parse JSON from position 6 onward
  4. Handle event based on object field

Error Conditions

CodeFormatCauseController Line
INVALID_REQUESTSSE eventPrompt null/whitespace452-463
STREAM_ERRORSSE eventProvider exception506-525
N/ASilentClient disconnect504 (token)
No HTTP Status Codes: Once SSE starts, errors communicated via events (status already 200)

Edge Cases

  1. Empty prompt: Error event written, stream terminates (Lines 452-463)
  2. Client disconnect mid-stream: Provider cancels, no error event
  3. Provider timeout: Exception caught, error event written if possible
  4. Invalid model name: Fallback to Groq provider (Lines 476-484)
  5. JSON serialization failure: Caught and logged, event skipped (Lines 498-501)
  6. Stream already flushed: Error write attempt fails silently (Lines 520-524)

Performance Characteristics

First Token Latency: Provider-dependent
  • Groq: 100-300ms typical
  • Bytez: Higher latency
Chunk Frequency: Provider-dependent
  • Groq: High frequency (near real-time)
  • Some providers: Batched chunks
Total Stream Time: No timeout enforced in controller
  • Provider-level timeouts apply
  • Client can cancel anytime

Rate Limits

No explicit rate limiting. Provider limits apply:
  • Groq free tier: 30 req/min
  • Streaming counts same as non-streaming
Streaming Advantage: User sees partial response even if rate-limited mid-stream

Client Implementation Notes

Fetch API Recommended: EventSource doesn’t support POST or custom headers Backpressure: Client must read stream continuously or buffer will fill Reconnection: Not automatic, client must implement retry logic Error Handling: Parse ai.error events and display to user