Skip to main content
POST
/
api
/
arena
/
chat
curl -X POST 'http://localhost:5079/api/arena/chat' \
  -H 'Authorization: Bearer YOUR_JWT_TOKEN' \
  -H 'Content-Type: application/json' \
  -d '{
    "prompt": "Explain quantum computing in simple terms",
    "model": "llama-3.3-70b-versatile",
    "temperature": 0.7,
    "maxTokens": 500,
    "threadId": "f47ac10b-58cc-4372-a567-0e02b2c3d479"
  }'
{
  "object": "ai.response",
  "output": {
    "type": "message",
    "content": [
      {
        "type": "output_text",
        "text": "Quantum computing uses quantum bits (qubits) which can exist in multiple states simultaneously through superposition..."
      }
    ]
  },
  "success": true,
  "message": "Quantum computing uses quantum bits (qubits)...",
  "model": {
    "name": "llama-3.3-70b-versatile",
    "displayName": "Llama 3.3 70B",
    "provider": "groq"
  },
  "prompt": "Explain quantum computing in simple terms",
  "selectionMode": "manual",
  "usage": {
    "promptTokens": 15,
    "completionTokens": 120,
    "totalTokens": 135
  },
  "responseTimeMs": 1245,
  "timestamp": "2024-01-15T10:30:00.000Z"
}

Single Chat Stable

Send a prompt to a single AI model. Supports automatic or manual model selection with a 3-tier fallback chain.

Authentication Required

JWT Claims Extraction (Controller Lines 154-166):
sub | ClaimTypes.NameIdentifier → User UUID (required)
email | ClaimTypes.Email → User email
full_name | name | ClaimTypes.Name → Display name

Request Body

prompt
string
required
User message textValidation (Line 89):
if (request == null || string.IsNullOrWhiteSpace(request.Prompt))
    return BadRequest("INVALID_REQUEST");
Constraints:
  • MUST NOT be null
  • MUST NOT be whitespace-only
  • No max length enforced (provider-dependent)
model
string
Model name or "auto" for random selectionSelection Logic (Lines 102-108):
var selectedModel = string.IsNullOrWhiteSpace(request.Model) || request.Model == "auto"
    ? await _modelSelector.GetRandomModelAsync()
    : request.Model;

var selectionMode = ... ? "automatic" : "manual";
Behavior:
  • null, empty, "auto" → Random active model
  • Specific name → Direct model usage
system
string
System prompt (maps internally to request.System)Default: Implementation-defined and not guaranteed (provider-specific)
maxTokens
integer
Maximum response tokensLimits: Provider-dependent (checked at provider level)
temperature
number
Sampling temperatureRange: 0.0 (deterministic) to 2.0 (maximum creativity)Default: Not specified by API contract (provider-defined)
sessionId
string
Session identifierAuto-generation (Line 87):
var sessionId = Guid.NewGuid();
threadId
string
Thread UUID for message persistenceValidation (Lines 168-174):
if (!string.IsNullOrEmpty(request.ThreadId)) {
    if (Guid.TryParse(request.ThreadId, out Guid threadIdGuid)) {
        await _threadMessagesService.LogSingleAsync(...);
    }
}
Behavior: If invalid GUID or omitted, message not persisted to thread

Response

object
string
Always "ai.response" (Line 121)
output
object
success
boolean
Always true on success (Line 130)
message
string
AI response text (Line 131, mirrors output.content[0].text)
model
object
prompt
string
Echo of user prompt (Line 138)
selectionMode
string
"automatic" or "manual" (Line 139)
usage
object
responseTimeMs
integer
Total duration in milliseconds (Lines 115, 140)Includes: Fallback retry time if primary provider failed
timestamp
string
ISO8601 UTC timestamp (Line 147)
curl -X POST 'http://localhost:5079/api/arena/chat' \
  -H 'Authorization: Bearer YOUR_JWT_TOKEN' \
  -H 'Content-Type: application/json' \
  -d '{
    "prompt": "Explain quantum computing in simple terms",
    "model": "llama-3.3-70b-versatile",
    "temperature": 0.7,
    "maxTokens": 500,
    "threadId": "f47ac10b-58cc-4372-a567-0e02b2c3d479"
  }'
{
  "object": "ai.response",
  "output": {
    "type": "message",
    "content": [
      {
        "type": "output_text",
        "text": "Quantum computing uses quantum bits (qubits) which can exist in multiple states simultaneously through superposition..."
      }
    ]
  },
  "success": true,
  "message": "Quantum computing uses quantum bits (qubits)...",
  "model": {
    "name": "llama-3.3-70b-versatile",
    "displayName": "Llama 3.3 70B",
    "provider": "groq"
  },
  "prompt": "Explain quantum computing in simple terms",
  "selectionMode": "manual",
  "usage": {
    "promptTokens": 15,
    "completionTokens": 120,
    "totalTokens": 135
  },
  "responseTimeMs": 1245,
  "timestamp": "2024-01-15T10:30:00.000Z"
}

Side Effects

Database Mutations (Lines 150-174):
  1. message_logs table (Line 150):
    await _messageLogger.LogMessageAsync(sessionId, finalModel, "single", request, response);
    
  2. users table UPSERT (Lines 152-166):
    await _userSyncService.EnsureUserExistsAsync(userId, email, name);
    
    • Executes on every authenticated request
    • Idempotent UPSERT operation
    • Creates user if not exists, updates if exists
  3. thread_messages table (conditional, Lines 168-174):
    if (!string.IsNullOrEmpty(request.ThreadId)) {
        await _threadMessagesService.LogSingleAsync(threadIdGuid, request.Prompt, finalModel, response);
    }
    
    • Only if threadId provided and valid GUID
    • Links message to existing thread

Behavior

Provider Execution with Fallback (Lines 111, 528-593):
ExecuteWithFallbackAsync flowchart:

1. Primary provider attempt (45s timeout)
   ├─ Success → Return response
   └─ Failure/Timeout → Step 2

2. If non-Groq provider failed:
   ├─ Fallback to Groq with llama-3.3-70b-versatile (45s timeout)
   ├─ Success → Return response
   └─ Failure → Step 3

3. If Groq failed or reached here:
   ├─ Retry with llama-3.3-70b-versatile (45s timeout)
   ├─ Success → Return response
   └─ Failure → Throw exception (500 error)

Max total time: 135 seconds (3 × 45s)
Timeout Implementation (Lines 540-552):
var chatTask = provider.ChatAsync(model, prompt, system, maxTokens, temperature);
var timeoutTask = Task.Delay(45000); // 45 seconds

var completedTask = await Task.WhenAny(chatTask, timeoutTask);
if (completedTask == chatTask) {
    return await chatTask;
} else {
    throw new TimeoutException($"Provider '{providerName}' timed out after 45s");
}
Model Selection (Lines 102-104):
  • null or "auto": Query ai_models table for random active model
  • Specific model name: Direct lookup in model registry
User Sync Timing:
  • Happens after AI inference (Lines 152-166)
  • Non-blocking (awaited)
  • Failure behavior not enforced by server contract
Thread Message Persistence:
  • Happens after AI inference and user sync
  • Only if threadId provided
  • Only if threadId valid GUID
  • Failure would bubble to 500 error

Error Conditions

CodeHTTPCauseController Line
INVALID_REQUEST400Prompt null or whitespace91-97
UNAUTHORIZED401Missing/invalid JWTMiddleware
API_ERROR500Provider failure180-186
API_ERROR500Uncaught exception178-187
Exception Messages (Lines 184, 428):
message = ex.InnerException?.Message ?? ex.Message
Inner exceptions exposed (provider timeout/connection errors visible to client)

Edge Cases

  1. Invalid threadId GUID: Silently skipped, no error (Line 170 guard)
  2. User sync failure: Logged as warning, request continues (implicit in UserSyncService)
  3. Model not found: Fallback chain triggered
  4. All providers fail: 500 error after ~135s
  5. Empty model name: Treated as "auto" (Line 102 check)

Rate Limits

No explicit rate limiting in controller. Provider-level limits apply:
  • Groq free tier: 30 req/min, 14,400 tokens/min
  • Groq paid tier: Higher limits (check API dashboard)
  • Bytez: Provider-dependent
429 Handling: Not explicitly caught, would trigger fallback chain