Skip to main content

What DualMind Does

DualMind enables developers to compare AI language models by running the same prompt through multiple models and collecting user votes on quality. The platform focuses on simplicity and transparency in model evaluation.

Design Philosophy

HTTP-Only Architecture

DualMind uses HTTP for all communication. No WebSocket, no SignalR, no complex bidirectional protocols. Why this decision?
  • Simplicity: HTTP is universally supported and well-understood
  • Debugging: Standard browser tools work perfectly
  • Reliability: HTTP has proven failure modes and retry patterns
  • Scaling: Stateless requests scale horizontally without connection management
For real-time streaming, the system uses Server-Sent Events (SSE) - a simple, unidirectional protocol built on HTTP.

Provider Abstraction

AI providers (Groq, Bytez) are abstracted behind a unified interface. Clients never know which provider serves a request. Why abstraction matters:
  • Reliability: Automatic failover when primary provider fails
  • Flexibility: Add new providers without changing client code
  • Cost Optimization: Route traffic based on availability and pricing
  • Independence: Not locked into single provider’s API

System Components

The backend handles all business logic:
  • JWT validation and user sync
  • Model selection strategies
  • Provider routing and fallback
  • Thread and vote persistence
  • SSE stream management
Technology Choice: .NET was chosen for performance, strong typing, and excellent async support needed for parallel model execution.
PostgreSQL stores all persistent data:
  • User records (synced from Supabase Auth)
  • AI model registry
  • Threads and messages
  • Comparisons and votes
Technology Choice: PostgreSQL provides ACID compliance, foreign key enforcement, and complex query support. Supabase adds real-time capabilities and automatic API generation.
User authentication is delegated entirely to Supabase:
  • Password, OAuth, magic link support
  • JWT token issuance
  • Token refresh handling
  • User metadata management
Technology Choice: Offloading auth to Supabase eliminates security complexity and provides battle-tested authentication flows.
Groq (Primary):
  • Fast inference (LPU acceleration)
  • Low latency for chat applications
  • Llama and Mixtral models
Bytez (Fallback):
  • Reliability when Groq unavailable
  • Alternative model access
  • Redundancy for production uptime
Technology Choice: Multiple providers ensure high availability. Groq prioritized for speed, Bytez for reliability.

Data Flow

Request Lifecycle

1

Client Authentication

User authenticates via Supabase Auth and receives JWT token containing user ID and email.
2

API Request

Client sends HTTP request to DualMind API with JWT in Authorization header.
3

JWT Validation

Backend validates token signature, issuer, audience, and expiration. Extracts user ID from sub claim.
4

User Sync

System performs UPSERT on users table to ensure user exists in database (idempotent operation).
5

Business Logic

Request processed: model selection, provider routing, arena comparison, etc.
6

Database Persistence

Results written to appropriate tables (messages, comparisons, votes).
7

Response

JSON response returned to client with AI output, usage stats, and timing.

Key Design Decisions

Why Dual-Chat Arena?

Comparing models side-by-side reveals quality differences that single-model testing misses. Users vote on which response is better, creating a crowd-sourced quality metric. Benefits:
  • Objective comparison (same prompt, same conditions)
  • User preferences guide model selection
  • Competitive benchmarking across model families
  • Real-world quality assessment

Why Thread Persistence?

Conversations are more valuable when they’re persistent. Threads enable:
  • Continuing conversations across sessions
  • Sharing comparisons with others
  • Building conversation history
  • Organizing different topics
Visibility Modes:
  • Private: Only owner can access (default)
  • Public: Anyone can view (shareable in directory)
  • Unlisted: Accessible via link (shareable without listing)

Why SSE Instead of WebSocket?

Server-Sent Events are simpler than WebSocket for unidirectional streaming:
FeatureSSEWebSocket
DirectionServer → ClientBidirectional
ProtocolHTTPCustom upgrade
ReconnectionAutomaticManual
DebuggingStandard HTTP toolsSpecialized tools
ComplexityLowHigher
DualMind doesn’t need client→server streaming during response generation, making SSE the perfect fit.

Why Provider Fallback Chain?

AI providers have varying uptime and rate limits. The fallback chain ensures reliability:
  1. Groq attempt (45s timeout)
  2. Groq retry with fallback model (45s timeout)
  3. Bytez attempt (45s timeout)
  4. Return error to client
This 3-tier approach balances speed (Groq first) with reliability (Bytez backup).

Constraints and Tradeoffs

Understanding system constraints helps developers build reliable integrations.

45-Second Timeout

Each provider request times out after 45 seconds. Why this limit?
  • Prevents indefinite hanging
  • Enforces reasonable response times
  • Enables fallback to alternative provider
  • Matches typical AI inference duration
Tradeoff: Very long prompts or complex reasoning may timeout. Solution: Use streaming for long responses.

No Background Jobs

DualMind has zero background workers. All processing happens synchronously in HTTP request handlers. Why no background jobs?
  • Simpler architecture (no queue management)
  • Immediate feedback (no “processing” states)
  • Easier debugging (single request trace)
  • Lower operational complexity
Tradeoff: Long operations block request thread. Solution: 45s timeout and parallel execution for dual-chat.

Idempotency Patterns

Not all operations are idempotent:
OperationIdempotent?Reason
User sync✅ YesUPSERT operation
GET requests✅ YesRead-only
Chat request❌ NoCreates new AI response each time
Thread creation❌ NoCreates new UUID each time
Vote submission❌ NoAllows vote changes via multiple rows
DELETE✅ YesDeleting deleted resource succeeds
Non-idempotent operations are explicit design choices. Chat requests SHOULD generate fresh responses, not cached results.

Security Model

Trust Boundary

The trust boundary is at JWT validation. Once a valid JWT is confirmed:
  • User ID extracted from sub claim is trusted
  • User owns resources with matching user_id
  • Private threads accessible only to owner

No Row-Level Security in Backend

Backend uses Supabase service role key, bypassing RLS. All access control implemented in application code. Why not RLS?
  • Full control over authorization logic
  • Complex rules easier in C# than PostgreSQL policies
  • Performance (no RLS evaluation overhead)
  • Flexibility for future authorization models
Security Measures:
  • JWT signature validation
  • Ownership checks before mutations
  • Visibility enforcement for threads
  • Input validation on all endpoints

Scalability Considerations

Stateless Design

All requests are stateless. No session storage, no in-memory state, no sticky sessions required. Benefits:
  • Horizontal scaling (add more API instances)
  • Load balancer friendly
  • No session replication complexity
  • Graceful instance restarts

Database as State Store

PostgreSQL is the single source of truth. No distributed caching, no separate session stores. Why this works:
  • Supabase provides connection pooling
  • Read queries are fast with proper indexing
  • Writes are infrequent relative to reads
  • UUIDs enable partition-friendly design

Parallel Execution

Dual-chat executes both models in parallel using Task.WhenAll(). This halves response time compared to sequential execution. Example: If each model takes 2 seconds, total time is ~2 seconds (parallel) vs 4 seconds (sequential).

Next Steps

Chat Modes

Learn single vs dual chat differences

Model Selection

Understand selection strategies

Streaming Protocol

How SSE streaming works

Thread Management

Visibility and persistence