Model Selection - DualMind Lab

Selection Mechanism

DualMind maintains a registry of active AI models in the database. Selection strategies determine which model(s) execute for each request.

Active Model Registry

The ai_models table contains:

Model name (unique identifier)
Display name (human-readable label)
Provider (Groq, Bytez)
Status (active or inactive)

Only active models are eligible for random selection.

Inactive models can still be explicitly requested by name but won’t appear in random selection pool.

Single Chat Selection

Explicit Selection

Client specifies exact model name:

{
  "model": "llama-3.3-70b-versatile"
}

Backend validates model exists and is active, then routes to appropriate provider.

Random Selection

Client requests automatic model choice:

{
  "model": null
}

{
  "model": "auto"
}

Backend selects random model from active pool using ModelSelector.GetRandomModelAsync(). Why random selection exists:

Simplifies client integration (no model knowledge required)
Distributes usage across model pool
Enables experimentation without hardcoding models
Useful for testing robustness across different models

Dual Chat Selection

Dual chat requires selecting two different models. Three strategies available:

Random Mode

{
  "selectionMode": "random"
}

Selects two different models at random from active pool. Algorithm:

Get all active models
Randomly select first model
Randomly select second model (excluding first)
Return model pair

Why this exists:

Unbiased model comparisons
Every model pair has equal probability
Discovers unexpected quality differences
Builds diverse comparison dataset

Use cases:

General-purpose arena
Exploratory quality assessment
Collecting broad preference data

Topper Mode

{
  "selectionMode": "topper"
}

Pairs the top-performing model (highest win rate) against a random model. Algorithm:

Query model_stats view for model with highest win rate
Randomly select second model (excluding topper)
Return model pair with topper as agent1

Why this exists:

Benchmarks new models against current leader
Provides consistent baseline for comparisons
Helps stabilize model rankings faster
Identifies models capable of beating champion

Use cases:

Leaderboard-focused arenas
Quality assurance (new models must beat leader)
Tournament-style comparisons

If no votes exist yet, “topper” mode falls back to random selection since no model has established win rate.

Manual Mode

{
  "selectionMode": "manual",
  "model1": "llama-3.3-70b-versatile",
  "model2": "mixtral-8x7b-32768"
}

Client explicitly specifies both models. Validation:

Both models must be specified
Both models must exist and be active
Models must be different

Why this exists:

Controlled A/B testing (always same pair)
Reproducing specific comparisons
Custom tournament brackets
Comparing specific model families

Use cases:

Research requiring consistent model pairs
Head-to-head matchups
Regression testing (did quality change?)

Selection Strategy Comparison

Strategy	Model 1	Model 2	Consistency	Bias
Random	Random	Random (≠ M1)	Low	None
Topper	Top model	Random	Medium	Toward leader
Manual	Specified	Specified	High	User-defined

Model Metadata

Each selected model includes:

{
  "name": "llama-3.3-70b-versatile",
  "displayName": "Llama 3.3 70B",
  "provider": "groq"
}

Why three fields?

name: API identifier for requests
displayName: User-friendly label for UI
provider: Backend routing information

Provider Routing

After model selection, backend routes to provider:

Model Name Pattern	Provider
`llama-`, `mixtral-`	Groq
Others	Bytez

Provider routing is transparent to client. Selection strategy only chooses models, not providers.

Fallback Behavior

Single Chat Fallback

If selected model fails:

Retry with llama-3.3-70b-versatile via Groq
If retry fails, route to Bytez provider
If Bytez fails, return error

Dual Chat Fallback

Each model has independent fallback chain. Possible outcomes:

✅ Both models succeed
⚠️ One model succeeds, one fails (partial result)
❌ Both models fail (error response)

Partial result handling: Client receives single-model response rather than comparison if one fails.

Model Statistics

Selection strategies use aggregated statistics from the model_votes table:

SELECT 
  model_id,
  COUNT(*) as total_wins,
  (SELECT COUNT(*) FROM comparisons WHERE model1_id = model_id OR model2_id = model_id) as appearances,
  win_rate = (total_wins  / appearances) * 100
FROM model_votes
GROUP BY model_id

Win Rate Calculation: (wins / total_appearances) * 100

“Topper” mode queries this data to find highest win rate. Statistics update in real-time as votes are submitted.

Adding New Models

Models are added to registry via admin endpoints or direct database insertion. Required fields:

model_name: Unique identifier
display_name: Human-readable label
provider_name: "groq" or "bytez"
status: "active" or "inactive"

Once added with status='active', model immediately enters random selection pool.

Model Lifecycle

Registration

Model added to ai_models table with status='inactive'.

Activation

Status changed to 'active', enters selection pool.

Usage

Model selected for comparisons, accumulates vote data.

Evaluation

Win rate calculated from vote statistics.

Potential Deactivation

If model performs poorly or is deprecated, status set to 'inactive'.

Selection Fairness

Random Selection Fairness

All active models have equal probability in pure random mode. No weighting based on:

Past performance
Provider
Inference cost
Model size

Why no weighting?

Simpler implementation
Unbiased data collection
Fair exposure for new models

Topper Bias

Topper mode intentionally biases toward top model:

Top model appears in 100% of comparisons
Other models share remaining 50% of slots

Effect: Top model accumulates data faster, enabling quicker identification if quality degrades.

Next Steps

Chat Modes

Single vs dual chat explained

Voting System

How votes affect statistics

System Overview

Architecture and design decisions

​Selection Mechanism

​Active Model Registry

​Single Chat Selection

​Explicit Selection

​Random Selection

​Dual Chat Selection

​Random Mode

​Topper Mode

​Manual Mode

​Selection Strategy Comparison

​Model Metadata

​Provider Routing

​Fallback Behavior

​Single Chat Fallback

​Dual Chat Fallback

​Model Statistics

​Adding New Models

​Model Lifecycle

​Selection Fairness

​Random Selection Fairness

​Topper Bias

​Next Steps

Chat Modes

Voting System

System Overview

Selection Mechanism

Active Model Registry

Single Chat Selection

Explicit Selection

Random Selection

Dual Chat Selection

Random Mode

Topper Mode

Manual Mode

Selection Strategy Comparison

Model Metadata

Provider Routing

Fallback Behavior

Single Chat Fallback

Dual Chat Fallback

Model Statistics

Adding New Models

Model Lifecycle

Selection Fairness

Random Selection Fairness

Topper Bias

Next Steps