POST

api

arena

chat

stream

const response = await fetch('http://localhost:5079/api/arena/chat/stream', {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${jwt}`,
    'Content-Type': 'application/json',
    'Accept': 'text/event-stream'
  },
  body: JSON.stringify({
    prompt: 'Explain quantum computing',
    model: 'llama-3.3-70b-versatile'
  })
});

const reader = response.body.getReader();
const decoder = new TextDecoder();
let fullResponse = '';

while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  
  const chunk = decoder.decode(value);
  const lines = chunk.split('\n\n');
  
  for (const line of lines) {
    if (line.startsWith('data: ')) {
      const event = JSON.parse(line.substring(6));
      
      if (event.object === 'ai.stream.delta') {
        fullResponse += event.delta.text;
        console.log(fullResponse); // Update UI
      } else if (event.object === 'ai.stream.done') {
        console.log('Stream complete');
      } else if (event.object === 'ai.error') {
        console.error(event.message);
      }
    }
  }
}

data: {"object":"ai.stream.delta","delta":{"type":"output_text","text":"Quantum"}}\n\n
data: {"object":"ai.stream.delta","delta":{"type":"output_text","text":" computing"}}\n\n
data: {"object":"ai.stream.delta","delta":{"type":"output_text","text":" uses"}}\n\n
data: {"object":"ai.stream.delta","delta":{"type":"output_text","text":" quantum"}}\n\n
...
data: {"object":"ai.stream.done","finish_reason":"stop","usage":{"totalTokens":135}}\n\n

Streaming Chat Stable

Stream AI model responses token-by-token using . Ideal for real-time chat interfaces.

Authentication Required

Request Body

prompt

string

required

User message textValidation (Line 452):

if (request == null || string.IsNullOrWhiteSpace(request.Prompt)) {
    // Write SSE error event
    await Response.WriteAsync($"data: {errorEvent}\\n\\n");
    return;
}

model

string

Model name or "auto" for random selectionSelection Logic (Lines 468-470):

var selectedModel = string.IsNullOrWhiteSpace(request.Model) || request.Model == "auto"
    ? await _modelSelector.GetRandomModelAsync()
    : request.Model;

system

string

System prompt for chat contextDefault: Implementation-defined and not guaranteed (provider-specific)

maxTokens

integer

Maximum tokens in responseProvider Limits: Model-dependent

temperature

number

Sampling temperature (0.0-2.0)Default: Not specified by API contract (provider-defined)

threadId

string

Thread UUID (not persisted in streaming mode)Note: Streaming endpoints do NOT write to thread_messages table

Response

Content-Type: text/event-stream (Line 450) Headers (implicit in ASP.NET SSE):

Content-Type: text/event-stream
Cache-Control: no-cache
Connection: keep-alive

Event Types

ai.stream.delta

object

Incremental text chunk (Lines 486-502)Format:

{
  "object": "ai.stream.delta",
  "delta": {
    "type": "output_text",
    "text": "chunk"
  }
}

Emitted: 0+ times during stream

ai.stream.done

object

Stream completionFormat:

{
  "object": "ai.stream.done",
  "finish_reason": "stop",
  "usage": {
    "totalTokens": 135
  }
}

Emitted: Once at end of successful stream

ai.error

object

Stream error (Lines 456-463, 512-519)Format:

{
  "object": "ai.error",
  "code": "STREAM_ERROR",
  "message": "Stream failed: timeout"
}

Error Codes:

INVALID_REQUEST: Prompt validation failed (Line 459)
STREAM_ERROR: Provider or network error (Line 515)

const response = await fetch('http://localhost:5079/api/arena/chat/stream', {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${jwt}`,
    'Content-Type': 'application/json',
    'Accept': 'text/event-stream'
  },
  body: JSON.stringify({
    prompt: 'Explain quantum computing',
    model: 'llama-3.3-70b-versatile'
  })
});

const reader = response.body.getReader();
const decoder = new TextDecoder();
let fullResponse = '';

while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  
  const chunk = decoder.decode(value);
  const lines = chunk.split('\n\n');
  
  for (const line of lines) {
    if (line.startsWith('data: ')) {
      const event = JSON.parse(line.substring(6));
      
      if (event.object === 'ai.stream.delta') {
        fullResponse += event.delta.text;
        console.log(fullResponse); // Update UI
      } else if (event.object === 'ai.stream.done') {
        console.log('Stream complete');
      } else if (event.object === 'ai.error') {
        console.error(event.message);
      }
    }
  }
}

data: {"object":"ai.stream.delta","delta":{"type":"output_text","text":"Quantum"}}\n\n
data: {"object":"ai.stream.delta","delta":{"type":"output_text","text":" computing"}}\n\n
data: {"object":"ai.stream.delta","delta":{"type":"output_text","text":" uses"}}\n\n
data: {"object":"ai.stream.delta","delta":{"type":"output_text","text":" quantum"}}\n\n
...
data: {"object":"ai.stream.done","finish_reason":"stop","usage":{"totalTokens":135}}\n\n

Side Effects

Database Mutations: NONE for streaming endpoints Logging: Internal only (not persisted to user-facing tables) Internal Audit: message_logs table write not enforced by server contract (implementation detail)

Behavior

SSE Implementation (Lines 450, 486-502):

Response.ContentType = "text/event-stream";

Func<AIStreamEvent, Task> onEvent = async (e) => {
    var json = JsonConvert.SerializeObject(e, ...);
    await Response.WriteAsync($"data: {json}\\n\\n");
    await Response.Body.FlushAsync();
};

await provider.StreamAsync(request, onEvent, HttpContext.RequestAborted);

Key Characteristics:

Content-Type set immediately (Line 450)
Each event written as data: {json}\n\n (Line 495)
Immediate flush after each event (Line 496)
Cancellation token monitors client disconnect (Line 504)

Provider Selection (Lines 472-484):

var info = _modelSelector.GetModelInfo(selectedModel);
var providerName = info?.Provider ?? "groq";

IChatProvider provider;
try {
    provider = _chatProviderFactory.GetProvider(providerName);
} catch (Exception ex) {
    provider = _chatProviderFactory.GetGroqProvider(); // Fallback
}

Client Disconnect Handling (Line 504):

HttpContext.RequestAborted // CancellationToken

Passed to provider’s StreamAsync method
Provider monitors token and terminates stream on cancellation
No error written to client (they already disconnected)

Error Handling (Lines 506-525):

try {
    await provider.StreamAsync(...);
} catch (Exception ex) {
    // Try to write error event if stream still open
    try {
        await Response.WriteAsync($"data: {errorEvent}\\n\\n");
        await Response.Body.FlushAsync();
    } catch { /* Ignore */ }
}

Error Recovery:

If stream already started, error event written
If stream closed (client disconnect), write attempt silently fails
No 500 status code returned (SSE already in progress)

JSON Serialization (Lines 490-494):

var json = JsonConvert.SerializeObject(e, new JsonSerializerSettings {
    NullValueHandling = NullValueHandling.Ignore,
    ContractResolver = new CamelCasePropertyNamesContractResolver()
});

Null values omitted
camelCase property names
Single-line JSON (no embedded newlines)

Event Lifecycle

Successful Stream:

1. Response.ContentType set to "text/event-stream"
2. Provider starts streaming
3. For each chunk:
   - ai.stream.delta event
   - Immediate flush to client
4. Stream completes:
   - ai.stream.done event
5. Connection closes

Error During Stream:

Response.ContentType set
Provider starts streaming
Several ai.stream.delta events
Provider throws exception
ai.error event (if stream still open)
Connection closes

Client Disconnect:

Response.ContentType set
Provider starts streaming
Several ai.stream.delta events
Client closes connection
HttpContext.RequestAborted signaled
Provider terminates stream
No error event (client gone)

Constraints

No Dual-Chat Streaming: Only single-chat mode supports streaming No Database Write: Streaming responses NOT persisted to thread_messages table Reason: Partial streams cannot be meaningfully stored Workaround: Use non-streaming /api/arena/chat if persistence required UTF-8 Encoding: All events UTF-8 encoded (Line 464 decoder implicit) Single-Line JSON: Event payloads MUST NOT contain newlines (breaks SSE format) Provider Support: Only providers implementing StreamAsync method support streaming

SSE Format Specification

Event Format:

data: {json}\n\n

Field:

data: prefix (required)
Space after colon
JSON object (single line)
\n\n double newline (event terminator)

Client Parsing:

Read lines until \n\n
Extract line starting with data:
Parse JSON from position 6 onward
Handle event based on object field

Error Conditions

Code	Format	Cause	Controller Line
`INVALID_REQUEST`	SSE event	Prompt null/whitespace	452-463
`STREAM_ERROR`	SSE event	Provider exception	506-525
N/A	Silent	Client disconnect	504 (token)

No HTTP Status Codes: Once SSE starts, errors communicated via events (status already 200)

Edge Cases

Empty prompt: Error event written, stream terminates (Lines 452-463)
Client disconnect mid-stream: Provider cancels, no error event
Provider timeout: Exception caught, error event written if possible
Invalid model name: Fallback to Groq provider (Lines 476-484)
JSON serialization failure: Caught and logged, event skipped (Lines 498-501)
Stream already flushed: Error write attempt fails silently (Lines 520-524)

Performance Characteristics

First Token Latency: Provider-dependent

Groq: 100-300ms typical
Bytez: Higher latency

Chunk Frequency: Provider-dependent

Groq: High frequency (near real-time)
Some providers: Batched chunks

Total Stream Time: No timeout enforced in controller

Provider-level timeouts apply
Client can cancel anytime

Rate Limits

No explicit rate limiting. Provider limits apply:

Groq free tier: 30 req/min
Streaming counts same as non-streaming

Streaming Advantage: User sees partial response even if rate-limited mid-stream

Client Implementation Notes

Fetch API Recommended: EventSource doesn’t support POST or custom headers Backpressure: Client must read stream continuously or buffer will fill Reconnection: Not automatic, client must implement retry logic Error Handling: Parse ai.error events and display to user

GET /api/threadsList user's conversation threads

const response = await fetch('http://localhost:5079/api/arena/chat/stream', {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${jwt}`,
    'Content-Type': 'application/json',
    'Accept': 'text/event-stream'
  },
  body: JSON.stringify({
    prompt: 'Explain quantum computing',
    model: 'llama-3.3-70b-versatile'
  })
});

const reader = response.body.getReader();
const decoder = new TextDecoder();
let fullResponse = '';

while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  
  const chunk = decoder.decode(value);
  const lines = chunk.split('\n\n');
  
  for (const line of lines) {
    if (line.startsWith('data: ')) {
      const event = JSON.parse(line.substring(6));
      
      if (event.object === 'ai.stream.delta') {
        fullResponse += event.delta.text;
        console.log(fullResponse); // Update UI
      } else if (event.object === 'ai.stream.done') {
        console.log('Stream complete');
      } else if (event.object === 'ai.error') {
        console.error(event.message);
      }
    }
  }
}

data: {"object":"ai.stream.delta","delta":{"type":"output_text","text":"Quantum"}}\n\n
data: {"object":"ai.stream.delta","delta":{"type":"output_text","text":" computing"}}\n\n
data: {"object":"ai.stream.delta","delta":{"type":"output_text","text":" uses"}}\n\n
data: {"object":"ai.stream.delta","delta":{"type":"output_text","text":" quantum"}}\n\n
...
data: {"object":"ai.stream.done","finish_reason":"stop","usage":{"totalTokens":135}}\n\n

​Streaming Chat Stable

​Authentication Required

​Request Body

​Response

​Event Types

​Side Effects

​Behavior

​Event Lifecycle

​Constraints

​SSE Format Specification

​Error Conditions

​Edge Cases

​Performance Characteristics

​Rate Limits

​Client Implementation Notes

Streaming Chat Stable

Authentication Required

Request Body

Response

Event Types

Side Effects

Behavior

Event Lifecycle

Constraints

SSE Format Specification

Error Conditions

Edge Cases

Performance Characteristics

Rate Limits

Client Implementation Notes