Skip to main content

Why two modes?

DualMind offers two distinct chat modes serving different purposes:
ModeModelsUse CaseVoting
Single Chat1 modelQuick AI responsesNo
Dual Chat2 modelsYes

Single Chat

Purpose

Single chat provides a straightforward AI inference endpoint. Send a prompt, receive a response. Use Cases:
  • Building chatbots with consistent model choice
  • Testing specific model behavior
  • Production applications requiring predictable responses
  • Conversations where model identity matters

How It Works

1

Model Selection

Client specifies model name OR requests random selection.
2

Provider Routing

Backend routes to Groq (primary) or Bytez (fallback).
3

Inference

AI model processes prompt and generates response.
4

Response

Client receives message, model info, usage stats, and timing.

Characteristics

AspectBehavior
Response time~1-3 seconds typical
Model choiceExplicit or random
Database writesMessage logged if threadId provided
Streaming supportYes (via SSE endpoint)
Parallel executionNo (single model)
Single chat is the foundation. Dual chat builds on this by running two parallel single-chat executions.

Dual Chat (Arena Mode)

Purpose

Dual chat compares two models with the same prompt under identical conditions. Users vote on which response is better, creating comparative quality data. Use Cases:
  • Benchmarking model quality
  • Collecting user preferences
  • Building model leaderboards
  • Running blind comparisons (users don’t see model names until after voting)

How It Works

1

Model Pairing

Backend selects two models based on selection mode (random, topper, or manual).
2

Parallel Execution

Both models receive identical prompt simultaneously via Task.WhenAll().
3

Independent Fallback

Each model has its own 45s timeout and independent fallback chain.
4

Arena Verdict

Backend calculates winner by response length and token count.
5

Comparison Logging

Comparison record created with both responses and timing metrics.

Selection Modes

Mode: randomSelects two different models randomly from active model pool.Why use this?
  • Unbiased comparisons
  • Discovering unexpected model differences
  • Equal exposure for all models
  • Building diverse comparison dataset

Parallel Execution

Both models execute simultaneously, not sequentially. Performance Benefit:
Sequential: Model1 (2s) + Model2 (2s) = 4 seconds total
Parallel:   max(Model1 (2s), Model2 (2s)) = ~2 seconds total
If one model fails, the other completes independently. Partial failures result in single-model response rather than complete failure.

Arena Verdict

The system automatically computes a comparison verdict:
MetricCalculation
Winner by lengthModel with longer response text
Winner by tokensModel with higher completion token count
Verdict textHuman-readable summary
Example Verdict:
“Agent 2 (Mixtral) provided a slightly longer response with more detailed explanations”
Automatic verdict is informational only. User votes determine actual winner for statistics.

Key Differences

FeatureSingle ChatDual Chat
Models executed12
ExecutionSequential (with fallback)Parallel
Comparison IDNoneGenerated UUID
Voting supportNoYes
Response structure
PerformanceFaster (1 model)Slower (2 models) but parallelized
Database writes1 message2 messages + 1 comparison

When to Use Each Mode

Choose Single Chat When:

  • Building a chatbot with consistent model behavior
  • Model identity is known and important
  • Minimizing latency (faster than dual-chat)
  • User doesn’t need to compare models
  • Streaming response (SSE) is priority

Choose Dual Chat When:

  • Quality comparison is the goal
  • Collecting user votes on model preference
  • Building leaderboards or benchmarks
  • Running blind tests (hide model names initially)
  • Research requires comparative data

Streaming Considerations

Single Chat: Full streaming support via SSE endpoint Dual Chat: No streaming support currently Why no dual-chat streaming?
  • Complexity: Two parallel SSE streams harder to manage client-side
  • Use case: Arena comparisons typically need full responses for fair comparison
  • Future: Could support if use case emerges
For long prompts in dual-chat, consider using single-chat SSE endpoint twice sequentially if streaming is needed.

Database Persistence

Single Chat Writes

If threadId provided:
  • 1 row in thread_messages table
  • Model response, prompt, timing stored
  • Links to thread for conversation history

Dual Chat Writes

Always writes:
  • 1 row in comparisons table (comparison ID, both models, responses, timing)
  • 1 row in thread_messages (if threadId provided)
  • Links message to comparison via comparison_id foreign key
Future votes reference the comparison ID.

Model Selection Transparency

Single Chat

Model name returned in response. User always knows which model generated response.

Dual Chat

Models identified as “agent1” and “agent2” in response. Model names included, enabling:
  • Revealed Arena: Show model names immediately
  • Blind Arena: Hide names until after vote (client-side logic)
For unbiased voting, hide model names until user submits vote. This prevents brand bias affecting quality assessment.

Next Steps

Model Selection

How models are chosen for inference

Voting System

How user votes affect statistics

Streaming Protocol

SSE implementation for single chat

Thread Management

Persisting conversations