What DualMind Does
DualMind enables developers to compare AI language models by running the same prompt through multiple models and collecting user votes on quality. The platform focuses on simplicity and transparency in model evaluation.Design Philosophy
HTTP-Only Architecture
DualMind uses HTTP for all communication. No WebSocket, no SignalR, no complex bidirectional protocols. Why this decision?- Simplicity: HTTP is universally supported and well-understood
- Debugging: Standard browser tools work perfectly
- Reliability: HTTP has proven failure modes and retry patterns
- Scaling: Stateless requests scale horizontally without connection management
For real-time streaming, the system uses Server-Sent Events (SSE) - a simple, unidirectional protocol built on HTTP.
Provider Abstraction
AI providers (Groq, Bytez) are abstracted behind a unified interface. Clients never know which provider serves a request. Why abstraction matters:- Reliability: Automatic failover when primary provider fails
- Flexibility: Add new providers without changing client code
- Cost Optimization: Route traffic based on availability and pricing
- Independence: Not locked into single provider’s API
System Components
Backend API (.NET 8.0)
Backend API (.NET 8.0)
The backend handles all business logic:
- JWT validation and user sync
- Model selection strategies
- Provider routing and fallback
- Thread and vote persistence
- SSE stream management
Database (PostgreSQL via Supabase)
Database (PostgreSQL via Supabase)
PostgreSQL stores all persistent data:
- User records (synced from Supabase Auth)
- AI model registry
- Threads and messages
- Comparisons and votes
Authentication (Supabase Auth)
Authentication (Supabase Auth)
User authentication is delegated entirely to Supabase:
- Password, OAuth, magic link support
- JWT token issuance
- Token refresh handling
- User metadata management
AI Providers
AI Providers
Groq (Primary):
- Fast inference (LPU acceleration)
- Low latency for chat applications
- Llama and Mixtral models
- Reliability when Groq unavailable
- Alternative model access
- Redundancy for production uptime
Data Flow
Request Lifecycle
Client Authentication
User authenticates via Supabase Auth and receives JWT token containing user ID and email.
JWT Validation
Backend validates token signature, issuer, audience, and expiration. Extracts user ID from
sub claim.User Sync
System performs UPSERT on users table to ensure user exists in database (idempotent operation).
Key Design Decisions
Why Dual-Chat Arena?
Comparing models side-by-side reveals quality differences that single-model testing misses. Users vote on which response is better, creating a crowd-sourced quality metric. Benefits:- Objective comparison (same prompt, same conditions)
- User preferences guide model selection
- Competitive benchmarking across model families
- Real-world quality assessment
Why Thread Persistence?
Conversations are more valuable when they’re persistent. Threads enable:- Continuing conversations across sessions
- Sharing comparisons with others
- Building conversation history
- Organizing different topics
- Private: Only owner can access (default)
- Public: Anyone can view (shareable in directory)
- Unlisted: Accessible via link (shareable without listing)
Why SSE Instead of WebSocket?
Server-Sent Events are simpler than WebSocket for unidirectional streaming:| Feature | SSE | WebSocket |
|---|---|---|
| Direction | Server → Client | Bidirectional |
| Protocol | HTTP | Custom upgrade |
| Reconnection | Automatic | Manual |
| Debugging | Standard HTTP tools | Specialized tools |
| Complexity | Low | Higher |
DualMind doesn’t need client→server streaming during response generation, making SSE the perfect fit.
Why Provider Fallback Chain?
AI providers have varying uptime and rate limits. The fallback chain ensures reliability:- Groq attempt (45s timeout)
- Groq retry with fallback model (45s timeout)
- Bytez attempt (45s timeout)
- Return error to client
Constraints and Tradeoffs
45-Second Timeout
Each provider request times out after 45 seconds. Why this limit?- Prevents indefinite hanging
- Enforces reasonable response times
- Enables fallback to alternative provider
- Matches typical AI inference duration
No Background Jobs
DualMind has zero background workers. All processing happens synchronously in HTTP request handlers. Why no background jobs?- Simpler architecture (no queue management)
- Immediate feedback (no “processing” states)
- Easier debugging (single request trace)
- Lower operational complexity
Idempotency Patterns
Not all operations are idempotent:| Operation | Idempotent? | Reason |
|---|---|---|
| User sync | ✅ Yes | UPSERT operation |
| GET requests | ✅ Yes | Read-only |
| Chat request | ❌ No | Creates new AI response each time |
| Thread creation | ❌ No | Creates new UUID each time |
| Vote submission | ❌ No | Allows vote changes via multiple rows |
| DELETE | ✅ Yes | Deleting deleted resource succeeds |
Non-idempotent operations are explicit design choices. Chat requests SHOULD generate fresh responses, not cached results.
Security Model
Trust Boundary
The trust boundary is at JWT validation. Once a valid JWT is confirmed:- User ID extracted from
subclaim is trusted - User owns resources with matching
user_id - Private threads accessible only to owner
No Row-Level Security in Backend
Backend uses Supabase service role key, bypassing RLS. All access control implemented in application code. Why not RLS?- Full control over authorization logic
- Complex rules easier in C# than PostgreSQL policies
- Performance (no RLS evaluation overhead)
- Flexibility for future authorization models
- JWT signature validation
- Ownership checks before mutations
- Visibility enforcement for threads
- Input validation on all endpoints
Scalability Considerations
Stateless Design
All requests are stateless. No session storage, no in-memory state, no sticky sessions required. Benefits:- Horizontal scaling (add more API instances)
- Load balancer friendly
- No session replication complexity
- Graceful instance restarts
Database as State Store
PostgreSQL is the single source of truth. No distributed caching, no separate session stores. Why this works:- Supabase provides connection pooling
- Read queries are fast with proper indexing
- Writes are infrequent relative to reads
- UUIDs enable partition-friendly design
Parallel Execution
Dual-chat executes both models in parallel usingTask.WhenAll(). This halves response time compared to sequential execution.
Example: If each model takes 2 seconds, total time is ~2 seconds (parallel) vs 4 seconds (sequential).
Next Steps
Chat Modes
Learn single vs dual chat differences
Model Selection
Understand selection strategies
Streaming Protocol
How SSE streaming works
Thread Management
Visibility and persistence