What is Streaming?
Streaming delivers AI responses incrementally as they’re generated, rather than waiting for the complete response. User Experience:- Non-streaming: Wait 3 seconds → Full response appears
- Streaming: Text appears word-by-word as generated
Why Server-Sent Events?
DualMind uses Server-Sent Events (SSE), not WebSocket.SSE vs WebSocket
| Feature | SSE | WebSocket |
|---|---|---|
| Direction | Server → Client only | Bidirectional |
| Protocol | HTTP | Custom (upgrade from HTTP) |
| Reconnection | Automatic | Manual implementation required |
| Debugging | Standard browser DevTools | Specialized tools |
| Complexity | Low | Higher |
| Use Case | Server pushes data | Client-server chat |
DualMind only needs server-to-client streaming during response generation. There’s no client data to stream back during inference, making SSE the perfect protocol.
Why Not WebSocket?
WebSocket adds unnecessary complexity:- Requires connection upgrade handshake
- No automatic reconnection on disconnect
- Harder to debug (non-HTTP protocol)
- Bidirectional capability unused
SSE Event Format
Events follow a strict format:- Each event prefixed with
data: - JSON must be single-line (no newlines in payload)
- Each event terminated with double newline (
\n\n) - UTF-8 encoding required
Event Types
All events useai.stream.* namespace:
ai.stream.start
ai.stream.start
Purpose: Stream initialization, provides model metadataPayload:When sent: First event immediately after connection established
ai.stream.delta
ai.stream.delta
Purpose: Incremental text chunkPayload:When sent: Repeatedly as model generates text (0+ times)Client handling: Append
delta.text to displayed responseai.stream.done
ai.stream.done
Purpose: Stream completion with final metricsPayload:When sent: After final delta event, stream terminates
ai.error
ai.error
Purpose: Stream failure notificationPayload:When sent: On provider failure, timeout, or other errorsStream behavior: Terminates after error event
Event Lifecycle
A successful stream follows this sequence:
Failure path:
start → error (connection closes)
Client Implementation
Browser (Fetch API with ReadableStream)
Modern browsers can consume SSE via the Fetch API:Custom HTTP Client
For environments without EventSource:Server Behavior
Connection Lifecycle
Disconnect Handling
If client disconnects during streaming:RequestAbortedcancellation token fires- Provider receives cancellation signal
- Stream processing terminates immediately
- No database writes occur for partial streams
Partial streams are not persisted to thread messages. Only complete non-streaming requests write to database.
Timeout Behavior
Streaming requests have same 45-second timeout as non-streaming:- Provider timeout (45s)
- Fallback to alternative model (45s)
- Fallback to Bytez provider (45s)
- Return error if all fail
ai.error event instead of JSON response.
Streaming vs Non-Streaming
| Aspect | Non-Streaming | Streaming |
|---|---|---|
| Response time | Wait for full response | Immediate first token |
| Perceived latency | High | Low |
| Response format | Single JSON object | SSE event stream |
| Database persistence | Yes (if threadId provided) | No |
| Client complexity | Simple (await response) | Higher (event handling) |
| Bandwidth | Single burst | Gradual delivery |
When to Use Streaming
Use streaming when:- User experience priority (perceived speed)
- Long responses (>500 tokens)
- Interactive chat interfaces
- Real-time feedback important
- Database persistence required immediately
- Processing full response programmatically
- Client doesn’t support SSE
- Response needs to be votable (dual-chat)
Dual-Chat Streaming
Current status: Not supported Why not?- Two parallel SSE streams complicate client-side handling
- Arena comparisons require complete responses for fair voting
- Use case unclear (voters need full text anyway)
Error Scenarios
Provider Timeout
If provider exceeds 45 seconds:Network Interruption
If network drops during stream:- Browser’s EventSource auto-reconnects (with
Last-Event-IDheader) - DualMind doesn’t support resumption (no event IDs)
- Client must start new request
Malformed Events
Provider sends invalid JSON:- Backend catches parse errors
- Sends
ai.errorevent - Stream terminates
Performance Considerations
Bandwidth
Streaming uses more total bandwidth than non-streaming:- Event overhead:
data:prefix +\n\nsuffix per chunk - JSON overhead: Repeated
{"type":"ai.stream.delta","delta":{"text":"..."}}
Server Resources
Streaming holds HTTP connection open longer:- Non-streaming: 2-3 seconds
- Streaming: 2-3 seconds (same inference time, different delivery)
Next Steps
Chat Modes
Single vs dual chat explained
Thread Management
Persisting conversations
System Overview
Architecture decisions