POST

api

arena

chat

curl -X POST 'http://localhost:5079/api/arena/chat' \
  -H 'Authorization: Bearer YOUR_JWT_TOKEN' \
  -H 'Content-Type: application/json' \
  -d '{
    "prompt": "Explain quantum computing in simple terms",
    "model": "llama-3.3-70b-versatile",
    "temperature": 0.7,
    "maxTokens": 500,
    "threadId": "f47ac10b-58cc-4372-a567-0e02b2c3d479"
  }'

{
  "object": "ai.response",
  "output": {
    "type": "message",
    "content": [
      {
        "type": "output_text",
        "text": "Quantum computing uses quantum bits (qubits) which can exist in multiple states simultaneously through superposition..."
      }
    ]
  },
  "success": true,
  "message": "Quantum computing uses quantum bits (qubits)...",
  "model": {
    "name": "llama-3.3-70b-versatile",
    "displayName": "Llama 3.3 70B",
    "provider": "groq"
  },
  "prompt": "Explain quantum computing in simple terms",
  "selectionMode": "manual",
  "usage": {
    "promptTokens": 15,
    "completionTokens": 120,
    "totalTokens": 135
  },
  "responseTimeMs": 1245,
  "timestamp": "2024-01-15T10:30:00.000Z"
}

Single Chat Stable

Send a prompt to a single AI model. Supports automatic or manual model selection with a 3-tier fallback chain.

Authentication Required

JWT Claims Extraction (Controller Lines 154-166):

sub | ClaimTypes.NameIdentifier → User UUID (required)
email | ClaimTypes.Email → User email
full_name | name | ClaimTypes.Name → Display name

Request Body

prompt

string

required

User message textValidation (Line 89):

if (request == null || string.IsNullOrWhiteSpace(request.Prompt))
    return BadRequest("INVALID_REQUEST");

Constraints:

MUST NOT be null
MUST NOT be whitespace-only
No max length enforced (provider-dependent)

model

string

Model name or "auto" for random selectionSelection Logic (Lines 102-108):

var selectedModel = string.IsNullOrWhiteSpace(request.Model) || request.Model == "auto"
    ? await _modelSelector.GetRandomModelAsync()
    : request.Model;

var selectionMode = ... ? "automatic" : "manual";

Behavior:

null, empty, "auto" → Random active model
Specific name → Direct model usage

system

string

System prompt (maps internally to request.System)Default: Implementation-defined and not guaranteed (provider-specific)

maxTokens

integer

Maximum response tokensLimits: Provider-dependent (checked at provider level)

temperature

number

Sampling temperatureRange: 0.0 (deterministic) to 2.0 (maximum creativity)Default: Not specified by API contract (provider-defined)

sessionId

string

Session identifierAuto-generation (Line 87):

var sessionId = Guid.NewGuid();

threadId

string

Thread UUID for message persistenceValidation (Lines 168-174):

if (!string.IsNullOrEmpty(request.ThreadId)) {
    if (Guid.TryParse(request.ThreadId, out Guid threadIdGuid)) {
        await _threadMessagesService.LogSingleAsync(...);
    }
}

Behavior: If invalid GUID or omitted, message not persisted to thread

Response

object

string

Always "ai.response" (Line 121)

output

object

Show properties

type

string

Always "message" (Line 124)

content

array

Array with single content part (Lines 125-128)

Show Content part

type

string

Always "output_text"

text

string

AI response text

success

boolean

Always true on success (Line 130)

message

string

AI response text (Line 131, mirrors output.content[0].text)

model

object

Show properties

name

string

Model identifier (Line 134)Note: May differ from request if fallback occurred

displayName

string

Human-readable label (Line 135)

provider

string

Provider name: "groq", "bytez", or "Unknown" (Line 136)

prompt

string

Echo of user prompt (Line 138)

selectionMode

string

"automatic" or "manual" (Line 139)

usage

object

Show properties

promptTokens

integer

Input tokens (Line 143)

completionTokens

integer

Output tokens (Line 144)

totalTokens

integer

Sum of prompt + completion (Line 145)

responseTimeMs

integer

Total duration in milliseconds (Lines 115, 140)Includes: Fallback retry time if primary provider failed

timestamp

string

ISO8601 UTC timestamp (Line 147)

curl -X POST 'http://localhost:5079/api/arena/chat' \
  -H 'Authorization: Bearer YOUR_JWT_TOKEN' \
  -H 'Content-Type: application/json' \
  -d '{
    "prompt": "Explain quantum computing in simple terms",
    "model": "llama-3.3-70b-versatile",
    "temperature": 0.7,
    "maxTokens": 500,
    "threadId": "f47ac10b-58cc-4372-a567-0e02b2c3d479"
  }'

{
  "object": "ai.response",
  "output": {
    "type": "message",
    "content": [
      {
        "type": "output_text",
        "text": "Quantum computing uses quantum bits (qubits) which can exist in multiple states simultaneously through superposition..."
      }
    ]
  },
  "success": true,
  "message": "Quantum computing uses quantum bits (qubits)...",
  "model": {
    "name": "llama-3.3-70b-versatile",
    "displayName": "Llama 3.3 70B",
    "provider": "groq"
  },
  "prompt": "Explain quantum computing in simple terms",
  "selectionMode": "manual",
  "usage": {
    "promptTokens": 15,
    "completionTokens": 120,
    "totalTokens": 135
  },
  "responseTimeMs": 1245,
  "timestamp": "2024-01-15T10:30:00.000Z"
}

Side Effects

Database Mutations (Lines 150-174):

message_logs table (Line 150):

await _messageLogger.LogMessageAsync(sessionId, finalModel, "single", request, response);

users table UPSERT (Lines 152-166):
```
await _userSyncService.EnsureUserExistsAsync(userId, email, name);
```
- Executes on every authenticated request
- Idempotent UPSERT operation
- Creates user if not exists, updates if exists

thread_messages table (conditional, Lines 168-174):

if (!string.IsNullOrEmpty(request.ThreadId)) {
    await _threadMessagesService.LogSingleAsync(threadIdGuid, request.Prompt, finalModel, response);
}

Only if threadId provided and valid GUID
Links message to existing thread

Behavior

Provider Execution with Fallback (Lines 111, 528-593):

ExecuteWithFallbackAsync flowchart:

1. Primary provider attempt (45s timeout)
   ├─ Success → Return response
   └─ Failure/Timeout → Step 2

2. If non-Groq provider failed:
   ├─ Fallback to Groq with llama-3.3-70b-versatile (45s timeout)
   ├─ Success → Return response
   └─ Failure → Step 3

3. If Groq failed or reached here:
   ├─ Retry with llama-3.3-70b-versatile (45s timeout)
   ├─ Success → Return response
   └─ Failure → Throw exception (500 error)

Max total time: 135 seconds (3 × 45s)

Timeout Implementation (Lines 540-552):

var chatTask = provider.ChatAsync(model, prompt, system, maxTokens, temperature);
var timeoutTask = Task.Delay(45000); // 45 seconds

var completedTask = await Task.WhenAny(chatTask, timeoutTask);
if (completedTask == chatTask) {
    return await chatTask;
} else {
    throw new TimeoutException($"Provider '{providerName}' timed out after 45s");
}

Model Selection (Lines 102-104):

null or "auto": Query ai_models table for random active model
Specific model name: Direct lookup in model registry

User Sync Timing:

Happens after AI inference (Lines 152-166)
Non-blocking (awaited)
Failure behavior not enforced by server contract

Thread Message Persistence:

Happens after AI inference and user sync
Only if threadId provided
Only if threadId valid GUID
Failure would bubble to 500 error

Error Conditions

Code	HTTP	Cause	Controller Line
`INVALID_REQUEST`	400	Prompt null or whitespace	91-97
`UNAUTHORIZED`	401	Missing/invalid JWT	Middleware
`API_ERROR`	500	Provider failure	180-186
`API_ERROR`	500	Uncaught exception	178-187

Exception Messages (Lines 184, 428):

message = ex.InnerException?.Message ?? ex.Message

Inner exceptions exposed (provider timeout/connection errors visible to client)

Edge Cases

Invalid threadId GUID: Silently skipped, no error (Line 170 guard)
User sync failure: Logged as warning, request continues (implicit in UserSyncService)
Model not found: Fallback chain triggered
All providers fail: 500 error after ~135s
Empty model name: Treated as "auto" (Line 102 check)

Rate Limits

No explicit rate limiting in controller. Provider-level limits apply:

Groq free tier: 30 req/min, 14,400 tokens/min
Groq paid tier: Higher limits (check API dashboard)
Bytez: Provider-dependent

429 Handling: Not explicitly caught, would trigger fallback chain

POST /api/arena/dualchatSend a prompt to two AI models simultaneously for blind comparison. Supports random, topper, and manual selection modes.

curl -X POST 'http://localhost:5079/api/arena/chat' \
  -H 'Authorization: Bearer YOUR_JWT_TOKEN' \
  -H 'Content-Type: application/json' \
  -d '{
    "prompt": "Explain quantum computing in simple terms",
    "model": "llama-3.3-70b-versatile",
    "temperature": 0.7,
    "maxTokens": 500,
    "threadId": "f47ac10b-58cc-4372-a567-0e02b2c3d479"
  }'

{
  "object": "ai.response",
  "output": {
    "type": "message",
    "content": [
      {
        "type": "output_text",
        "text": "Quantum computing uses quantum bits (qubits) which can exist in multiple states simultaneously through superposition..."
      }
    ]
  },
  "success": true,
  "message": "Quantum computing uses quantum bits (qubits)...",
  "model": {
    "name": "llama-3.3-70b-versatile",
    "displayName": "Llama 3.3 70B",
    "provider": "groq"
  },
  "prompt": "Explain quantum computing in simple terms",
  "selectionMode": "manual",
  "usage": {
    "promptTokens": 15,
    "completionTokens": 120,
    "totalTokens": 135
  },
  "responseTimeMs": 1245,
  "timestamp": "2024-01-15T10:30:00.000Z"
}

​Single Chat Stable

​Authentication Required

​Request Body

​Response

​Side Effects

​Behavior

​Error Conditions

​Edge Cases

​Rate Limits

Single Chat Stable

Authentication Required

Request Body

Response

Side Effects

Behavior

Error Conditions

Edge Cases

Rate Limits