Skip to main content

Selection Mechanism

DualMind maintains a registry of active AI models in the database. Selection strategies determine which model(s) execute for each request.

Active Model Registry

The ai_models table contains:
  • Model name (unique identifier)
  • Display name (human-readable label)
  • Provider (Groq, Bytez)
  • Status (active or inactive)
Only active models are eligible for random selection.
Inactive models can still be explicitly requested by name but won’t appear in random selection pool.

Single Chat Selection

Explicit Selection

Client specifies exact model name:
{
  "model": "llama-3.3-70b-versatile"
}
Backend validates model exists and is active, then routes to appropriate provider.

Random Selection

Client requests automatic model choice:
{
  "model": null
}
or
{
  "model": "auto"
}
Backend selects random model from active pool using ModelSelector.GetRandomModelAsync(). Why random selection exists:
  • Simplifies client integration (no model knowledge required)
  • Distributes usage across model pool
  • Enables experimentation without hardcoding models
  • Useful for testing robustness across different models

Dual Chat Selection

Dual chat requires selecting two different models. Three strategies available:

Random Mode

{
  "selectionMode": "random"
}
Selects two different models at random from active pool. Algorithm:
  1. Get all active models
  2. Randomly select first model
  3. Randomly select second model (excluding first)
  4. Return model pair
Why this exists:
  • Unbiased model comparisons
  • Every model pair has equal probability
  • Discovers unexpected quality differences
  • Builds diverse comparison dataset
Use cases:
  • General-purpose arena
  • Exploratory quality assessment
  • Collecting broad preference data

Topper Mode

{
  "selectionMode": "topper"
}
Pairs the top-performing model (highest win rate) against a random model. Algorithm:
  1. Query model_stats view for model with highest win rate
  2. Randomly select second model (excluding topper)
  3. Return model pair with topper as agent1
Why this exists:
  • Benchmarks new models against current leader
  • Provides consistent baseline for comparisons
  • Helps stabilize model rankings faster
  • Identifies models capable of beating champion
Use cases:
  • Leaderboard-focused arenas
  • Quality assurance (new models must beat leader)
  • Tournament-style comparisons
If no votes exist yet, “topper” mode falls back to random selection since no model has established win rate.

Manual Mode

{
  "selectionMode": "manual",
  "model1": "llama-3.3-70b-versatile",
  "model2": "mixtral-8x7b-32768"
}
Client explicitly specifies both models. Validation:
  • Both models must be specified
  • Both models must exist and be active
  • Models must be different
Why this exists:
  • Controlled A/B testing (always same pair)
  • Reproducing specific comparisons
  • Custom tournament brackets
  • Comparing specific model families
Use cases:
  • Research requiring consistent model pairs
  • Head-to-head matchups
  • Regression testing (did quality change?)

Selection Strategy Comparison

StrategyModel 1Model 2ConsistencyBias
RandomRandomRandom (≠ M1)LowNone
TopperTop modelRandomMediumToward leader
ManualSpecifiedSpecifiedHighUser-defined

Model Metadata

Each selected model includes:
{
  "name": "llama-3.3-70b-versatile",
  "displayName": "Llama 3.3 70B",
  "provider": "groq"
}
Why three fields?
  • name: API identifier for requests
  • displayName: User-friendly label for UI
  • provider: Backend routing information

Provider Routing

After model selection, backend routes to provider:
Model Name PatternProvider
llama-*, mixtral-*Groq
OthersBytez
Provider routing is transparent to client. Selection strategy only chooses models, not providers.

Fallback Behavior

Single Chat Fallback

If selected model fails:
  1. Retry with llama-3.3-70b-versatile via Groq
  2. If retry fails, route to Bytez provider
  3. If Bytez fails, return error

Dual Chat Fallback

Each model has independent fallback chain. Possible outcomes:
  • ✅ Both models succeed
  • ⚠️ One model succeeds, one fails (partial result)
  • ❌ Both models fail (error response)
Partial result handling: Client receives single-model response rather than comparison if one fails.

Model Statistics

Selection strategies use aggregated statistics from the model_votes table:
SELECT 
  model_id,
  COUNT(*) as total_wins,
  (SELECT COUNT(*) FROM comparisons WHERE model1_id = model_id OR model2_id = model_id) as appearances,
  win_rate = (total_wins  / appearances) * 100
FROM model_votes
GROUP BY model_id
Win Rate Calculation: (wins / total_appearances) * 100
“Topper” mode queries this data to find highest win rate. Statistics update in real-time as votes are submitted.

Adding New Models

Models are added to registry via admin endpoints or direct database insertion. Required fields:
  • model_name: Unique identifier
  • display_name: Human-readable label
  • provider_name: "groq" or "bytez"
  • status: "active" or "inactive"
Once added with status='active', model immediately enters random selection pool.

Model Lifecycle

1

Registration

Model added to ai_models table with status='inactive'.
2

Activation

Status changed to 'active', enters selection pool.
3

Usage

Model selected for comparisons, accumulates vote data.
4

Evaluation

Win rate calculated from vote statistics.
5

Potential Deactivation

If model performs poorly or is deprecated, status set to 'inactive'.

Selection Fairness

Random Selection Fairness

All active models have equal probability in pure random mode. No weighting based on:
  • Past performance
  • Provider
  • Inference cost
  • Model size
Why no weighting?
  • Simpler implementation
  • Unbiased data collection
  • Fair exposure for new models

Topper Bias

Topper mode intentionally biases toward top model:
  • Top model appears in 100% of comparisons
  • Other models share remaining 50% of slots
Effect: Top model accumulates data faster, enabling quicker identification if quality degrades.

Next Steps

Chat Modes

Single vs dual chat explained

Voting System

How votes affect statistics

System Overview

Architecture and design decisions