System Overview - DualMind Lab

What DualMind Does

DualMind enables developers to compare AI language models by running the same prompt through multiple models and collecting user votes on quality. The platform focuses on simplicity and transparency in model evaluation.

Design Philosophy

HTTP-Only Architecture

DualMind uses HTTP for all communication. No WebSocket, no SignalR, no complex bidirectional protocols. Why this decision?

Simplicity: HTTP is universally supported and well-understood
Debugging: Standard browser tools work perfectly
Reliability: HTTP has proven failure modes and retry patterns
Scaling: Stateless requests scale horizontally without connection management

For real-time streaming, the system uses Server-Sent Events (SSE) - a simple, unidirectional protocol built on HTTP.

Provider Abstraction

AI providers (Groq, Bytez) are abstracted behind a unified interface. Clients never know which provider serves a request. Why abstraction matters:

Reliability: Automatic failover when primary provider fails
Flexibility: Add new providers without changing client code
Cost Optimization: Route traffic based on availability and pricing
Independence: Not locked into single provider’s API

System Components

Backend API (.NET 8.0)

The backend handles all business logic:

JWT validation and user sync
Model selection strategies
Provider routing and fallback
Thread and vote persistence
SSE stream management

Technology Choice: .NET was chosen for performance, strong typing, and excellent async support needed for parallel model execution.

Database (PostgreSQL via Supabase)

PostgreSQL stores all persistent data:

User records (synced from Supabase Auth)
AI model registry
Threads and messages
Comparisons and votes

Technology Choice: PostgreSQL provides ACID compliance, foreign key enforcement, and complex query support. Supabase adds real-time capabilities and automatic API generation.

Authentication (Supabase Auth)

User authentication is delegated entirely to Supabase:

Password, OAuth, magic link support
JWT token issuance
Token refresh handling
User metadata management

Technology Choice: Offloading auth to Supabase eliminates security complexity and provides battle-tested authentication flows.

AI Providers

Groq (Primary):

Fast inference (LPU acceleration)
Low latency for chat applications
Llama and Mixtral models

Bytez (Fallback):

Reliability when Groq unavailable
Alternative model access
Redundancy for production uptime

Technology Choice: Multiple providers ensure high availability. Groq prioritized for speed, Bytez for reliability.

Data Flow

Request Lifecycle

Client Authentication

User authenticates via Supabase Auth and receives JWT token containing user ID and email.

API Request

Client sends HTTP request to DualMind API with JWT in Authorization header.

JWT Validation

Backend validates token signature, issuer, audience, and expiration. Extracts user ID from sub claim.

User Sync

System performs UPSERT on users table to ensure user exists in database (idempotent operation).

Business Logic

Request processed: model selection, provider routing, arena comparison, etc.

Database Persistence

Results written to appropriate tables (messages, comparisons, votes).

Response

JSON response returned to client with AI output, usage stats, and timing.

Key Design Decisions

Why Dual-Chat Arena?

Comparing models side-by-side reveals quality differences that single-model testing misses. Users vote on which response is better, creating a crowd-sourced quality metric. Benefits:

Objective comparison (same prompt, same conditions)
User preferences guide model selection
Competitive benchmarking across model families
Real-world quality assessment

Why Thread Persistence?

Conversations are more valuable when they’re persistent. Threads enable:

Continuing conversations across sessions
Sharing comparisons with others
Building conversation history
Organizing different topics

Visibility Modes:

Private: Only owner can access (default)
Public: Anyone can view (shareable in directory)
Unlisted: Accessible via link (shareable without listing)

Why SSE Instead of WebSocket?

Server-Sent Events are simpler than WebSocket for unidirectional streaming:

Feature	SSE	WebSocket
Direction	Server → Client	Bidirectional
Protocol	HTTP	Custom upgrade
Reconnection	Automatic	Manual
Debugging	Standard HTTP tools	Specialized tools
Complexity	Low	Higher

DualMind doesn’t need client→server streaming during response generation, making SSE the perfect fit.

Why Provider Fallback Chain?

AI providers have varying uptime and rate limits. The fallback chain ensures reliability:

Groq attempt (45s timeout)
Groq retry with fallback model (45s timeout)
Bytez attempt (45s timeout)
Return error to client

This 3-tier approach balances speed (Groq first) with reliability (Bytez backup).

Constraints and Tradeoffs

Understanding system constraints helps developers build reliable integrations.

45-Second Timeout

Each provider request times out after 45 seconds. Why this limit?

Prevents indefinite hanging
Enforces reasonable response times
Enables fallback to alternative provider
Matches typical AI inference duration

Tradeoff: Very long prompts or complex reasoning may timeout. Solution: Use streaming for long responses.

No Background Jobs

DualMind has zero background workers. All processing happens synchronously in HTTP request handlers. Why no background jobs?

Simpler architecture (no queue management)
Immediate feedback (no “processing” states)
Easier debugging (single request trace)
Lower operational complexity

Tradeoff: Long operations block request thread. Solution: 45s timeout and parallel execution for dual-chat.

Idempotency Patterns

Not all operations are idempotent:

Operation	Idempotent?	Reason
User sync	✅ Yes	UPSERT operation
GET requests	✅ Yes	Read-only
Chat request	❌ No	Creates new AI response each time
Thread creation	❌ No	Creates new UUID each time
Vote submission	❌ No	Allows vote changes via multiple rows
DELETE	✅ Yes	Deleting deleted resource succeeds

Non-idempotent operations are explicit design choices. Chat requests SHOULD generate fresh responses, not cached results.

Security Model

Trust Boundary

The trust boundary is at JWT validation. Once a valid JWT is confirmed:

User ID extracted from sub claim is trusted
User owns resources with matching user_id
Private threads accessible only to owner

No Row-Level Security in Backend

Backend uses Supabase service role key, bypassing RLS. All access control implemented in application code. Why not RLS?

Full control over authorization logic
Complex rules easier in C# than PostgreSQL policies
Performance (no RLS evaluation overhead)
Flexibility for future authorization models

Security Measures:

JWT signature validation
Ownership checks before mutations
Visibility enforcement for threads
Input validation on all endpoints

Scalability Considerations

Stateless Design

All requests are stateless. No session storage, no in-memory state, no sticky sessions required. Benefits:

Horizontal scaling (add more API instances)
Load balancer friendly
No session replication complexity
Graceful instance restarts

Database as State Store

PostgreSQL is the single source of truth. No distributed caching, no separate session stores. Why this works:

Supabase provides connection pooling
Read queries are fast with proper indexing
Writes are infrequent relative to reads
UUIDs enable partition-friendly design

Parallel Execution

Dual-chat executes both models in parallel using Task.WhenAll(). This halves response time compared to sequential execution. Example: If each model takes 2 seconds, total time is ~2 seconds (parallel) vs 4 seconds (sequential).

Next Steps

Chat Modes

Learn single vs dual chat differences

Model Selection

Understand selection strategies

Streaming Protocol

How SSE streaming works

Thread Management

Visibility and persistence

​What DualMind Does

​Design Philosophy

​HTTP-Only Architecture

​Provider Abstraction

​System Components

​Data Flow

​Request Lifecycle

​Key Design Decisions

​Why Dual-Chat Arena?

​Why Thread Persistence?

​Why SSE Instead of WebSocket?

​Why Provider Fallback Chain?

​Constraints and Tradeoffs

​45-Second Timeout

​No Background Jobs

​Idempotency Patterns

​Security Model

​Trust Boundary

​No Row-Level Security in Backend

​Scalability Considerations

​Stateless Design

​Database as State Store

​Parallel Execution

​Next Steps

Chat Modes

Model Selection

Streaming Protocol

Thread Management

What DualMind Does

Design Philosophy

HTTP-Only Architecture

Provider Abstraction

System Components

Data Flow

Request Lifecycle

Key Design Decisions

Why Dual-Chat Arena?

Why Thread Persistence?

Why SSE Instead of WebSocket?

Why Provider Fallback Chain?

Constraints and Tradeoffs

45-Second Timeout

No Background Jobs

Idempotency Patterns

Security Model

Trust Boundary

No Row-Level Security in Backend

Scalability Considerations

Stateless Design

Database as State Store

Parallel Execution

Next Steps