Why voting exists
Voting enables crowd-sourced quality assessment. Instead of assuming which model is better, let users decide based on actual responses. Core Idea: Same prompt → Two models → User votes on winner → updateVote Choices
Users vote on comparison outcomes with four options:Left (Model 1 Wins)
User prefers Model 1’s response over Model 2. Database effect: Insert vote row withwinner_model_id = model1_id
Right (Model 2 Wins)
User prefers Model 2’s response over Model 1. Database effect: Insert vote row withwinner_model_id = model2_id
Tie (Both Good)
Both models provided quality responses of equal value. Database effect: Insert two vote rows:- One with
winner_model_id = model1_id - One with
winner_model_id = model2_id
Both Bad
Neither model provided acceptable response. Database effect: Insert one vote row withwinner_model_id = NULL
Effect on statistics: Comparison counts toward both models’ appearance totals but neither gets a win.
“Tie” awards both models a win. “Both-bad” awards neither a win. These are distinct outcomes with different statistical impacts.
Vote Choice Comparison
| Choice | Winner | Database Rows | Effect on Winner | Effect on Loser |
|---|---|---|---|---|
| Left | Model 1 | 1 | +1 win | +0 wins |
| Right | Model 2 | 1 | +1 win | +0 wins |
| Tie | Both | 2 | +1 win each | N/A |
| Both-bad | Neither | 1 | +0 wins each | +0 wins |
Vote Statistics
Votes aggregate into model performance metrics:Win Count
Total number of votes where model was winner. Calculation:Appearance Count
Total number of comparisons model participated in. Calculation:Win Rate
Percentage of appearances where model won. Calculation:Voting Workflow
User Reviews Responses
Reads both AI responses (ideally without seeing model names for unbiased judgment).
User Submits Vote
Calls vote endpoint with
comparisonId and voteChoice (left, right, tie, both-bad).Vote Immutability
Votes are immutable once submitted. No UPDATE operations allowed. Vote changes require:- DELETE existing vote(s)
- INSERT new vote(s)
- Simpler audit trail
- Prevents accidental vote manipulation
- Explicit about vote changes (not silent updates)
Current implementation allows duplicate votes on same comparison. Vote “changes” are implemented as additional rows, not replacements.
Vote Persistence
Votes link to comparisons, not thread messages:Statistical Guarantees
Eventually Consistent
Win rates reflect all votes in database at query time. No caching lag. Benefit: Real-time leaderboard updates Tradeoff: Expensive queries for large vote datasets (requires aggregation)No Vote Deduplication
System allows multiple votes on same comparison by same user. Intentional design choice: Enables vote revisions without complex state tracking. Implication: Clients should prevent duplicate submissions if desired (UI-level enforcement).Leaderboard Construction
Model leaderboard ranks models by win rate:- Win rate (primary)
- Appearance count (tiebreaker for models with equal win rate)
Models with zero appearances have undefined win rate and should be excluded from rankings.
Blind vs Revealed Voting
DualMind API doesn’t enforce blind voting. Client UIs decide whether to show model names before collecting votes.Blind Voting (Recommended)
Hide model names until after vote submission. Benefits:- Eliminates brand bias
- Pure quality assessment
- More objective comparisons
Revealed Voting
Show model names immediately. Benefits:- Users can factor model reputation
- Useful for testing specific models
- Transparent about what’s being compared
Vote Choice Psychology
Why “Tie” Exists
Some responses are genuinely equal quality. Forcing users to pick creates artificial preference data. Effect: “Tie” votes prevent:- Random guessing when responses equal
- Quality responses losing unfairly
- Biased data from forced choices
Why “Both-Bad” Exists
Sometimes both models produce poor responses. This outcome deserves representation. Effect: “Both-bad” votes:- Penalize both models appropriately
- Identify prompts that challenge all models
- Provide feedback that specific comparison was unhelpful
Statistical Edge Cases
New Model Bootstrap
Model with zero votes has undefined win rate. Solution: Either exclude from rankings or assign default (e.g., 0% or 50%). Current behavior: Included in API response with actual win rate calculation (may be NaN if zero appearances).Tie Inflation
If users frequently vote “tie”, all models accumulate wins without clear differentiation. Mitigation: Monitor tie rate. High tie rate may indicate:- Models are actually very similar
- Prompts don’t reveal quality differences
- Users defaulting to “tie” instead of judging carefully
Vote Manipulation
Malicious users could submit many votes favoring specific model. Current state: No rate limiting or duplicate detection. Potential mitigation: Limit votes per user per comparison (requires user tracking).Next Steps
Chat Modes
How dual-chat enables voting
Model Selection
How “topper” mode uses win rates
System Overview
Architecture decisions