Skip to main content

Provider Architecture

Interface Abstraction

public interface IChatProvider
{
    Task<ChatResponse> ChatCompletionAsync(ChatRequest request);
    Task ChatCompletionStreamAsync(ChatRequest request, Func<string, Task> onChunk);
    string ProviderName { get; }
}
Implementations:
  • GroqProvider: Primary LPU-accelerated inference
  • BytezProvider: Fallback for reliability

Provider Factory

public class ChatProviderFactory
{
    public IChatProvider GetProvider(string providerName)
    {
        return providerName.ToLower() switch {
            "groq" => _serviceProvider.GetRequiredService<GroqProvider>(),
            "bytez" => _serviceProvider.GetRequiredService<BytezProvider>(),
            _ => throw new ArgumentException($"Unknown provider: {providerName}")
        };
    }
}
Model → Provider Routing:
llama-*     → Groq
mixtral-*   → Groq
gemma-*     → Groq
Others      → Bytez

Groq Provider

API Configuration

private readonly HttpClient _httpClient;
private const string BaseUrl = "https://api.groq.com/openai/v1";
private readonly string _apiKey;

public GroqProvider(IConfiguration config)
{
    _apiKey = config["GROQ_API_KEY"];
    _httpClient.DefaultRequestHeaders.Add("Authorization", $"Bearer {_apiKey}");
}

Chat Completion Request

public async Task<ChatResponse> ChatCompletionAsync(ChatRequest request)
{
    var payload = new {
        model = request.Model,
        messages = new[] {
            new { role = "system", content = request.SystemMessage ?? "You are a helpful assistant." },
            new { role = "user", content = request.Prompt }
        },
        temperature = request.Temperature ?? 0.7,
        max_tokens = request.MaxTokens,
        stream = false
    };
    
    var cts = new CancellationTokenSource(TimeSpan.FromSeconds(45));
    
    var response = await _httpClient.PostAsJsonAsync(
        $"{BaseUrl}/chat/completions",
        payload,
        cts.Token
    );
    
    response.EnsureSuccessStatusCode();
    
    var result = await response.Content.ReadFromJsonAsync<GroqResponse>();
    
    return new ChatResponse {
        Message = result.Choices[0].Message.Content,
        Usage = result.Usage,
        ResponseTimeMs = (int)(DateTime.UtcNow - startTime).TotalMilliseconds
    };
}

Streaming Implementation

public async Task ChatCompletionStreamAsync(
    ChatRequest request,
    Func<string, Task> onChunk,
    CancellationToken cancellationToken)
{
    var payload = new { /* same as above but stream = true */ };
    
    var httpRequest = new HttpRequestMessage(HttpMethod.Post, $"{BaseUrl}/chat/completions") {
        Content = JsonContent.Create(payload)
    };
    
    using var response = await _httpClient.SendAsync(
        httpRequest,
        HttpCompletionOption.ResponseHeadersRead,
        cancellationToken
    );
    
    using var stream = await response.Content.ReadAsStreamAsync();
    using var reader = new StreamReader(stream);
    
    while (!reader.EndOfStream && !cancellationToken.IsCancellationRequested) {
        var line = await reader.ReadLineAsync();
        
        if (string.IsNullOrWhiteSpace(line) || !line.StartsWith("data: "))
            continue;
            
        var json = line.Substring(6); // Remove "data: " prefix
        
        if (json == "[DONE]")
            break;
            
        var chunk = JsonSerializer.Deserialize<GroqStreamChunk>(json);
        var delta = chunk.Choices[0]?.Delta?.Content;
        
        if (!string.IsNullOrEmpty(delta))
            await onChunk(delta);
    }
}

Error Handling

try {
    return await ChatCompletionAsync(request);
}
catch (TaskCanceledException) {
    throw new ProviderTimeoutException("Groq API timeout after 45 seconds");
}
catch (HttpRequestException ex) when (ex.StatusCode == HttpStatusCode.TooManyRequests) {
    throw new ProviderRateLimitException("Groq rate limit exceeded");
}
catch (HttpRequestException ex) {
    throw new ProviderException($"Groq API error: {ex.Message}");
}

Bytez Provider

API Configuration

private const string BaseUrl = "https://api.bytez.com/v1";
private readonly string _apiKey;

public BytezProvider(IConfiguration config)
{
    _apiKey = config["BYTEZ_API_KEY"];
}

Implementation Differences

Request Format: Similar to OpenAI API Response Parsing: Expects OpenAI-compatible response structure Timeout: Same 45-second limit Streaming: Not currently implemented (fallback is non-streaming only)

Failover Chain

Single Chat Failover

public async Task<ChatResponse> ExecuteWithFallbackAsync(ChatRequest request)
{
    // Attempt 1: Primary provider with selected model
    try {
        var provider = _providerFactory.GetProvider(request.Provider);
        return await provider.ChatCompletionAsync(request);
    }
    catch (ProviderException ex) {
        _logger.LogWarning($"Primary attempt failed: {ex.Message}");
    }
    
    // Attempt 2: Groq with fallback model
    try {
        request.Model = "llama-3.3-70b-versatile";
        var groqProvider = _providerFactory.GetProvider("groq");
        return await groqProvider.ChatCompletionAsync(request);
    }
    catch (ProviderException ex) {
        _logger.LogWarning($"Groq retry failed: {ex.Message}");
    }
    
    // Attempt 3: Bytez provider
    try {
        var bytezProvider = _providerFactory.GetProvider("bytez");
        return await bytezProvider.ChatCompletionAsync(request);
    }
    catch (ProviderException ex) {
        _logger.LogError($"All providers failed: {ex.Message}");
        throw new AllProvidersFailedException("All providers exhausted");
    }
}
Total Max Time: 135 seconds (3 × 45s) Logging: Each failure logged with attempt number

Dual Chat Failover

Each model has independent fallback chain:
var task1 = ExecuteWithFallbackAsync(new ChatRequest { Model = model1 });
var task2 = ExecuteWithFallbackAsync(new ChatRequest { Model = model2 });

await Task.WhenAll(task1, task2);
Outcomes:
  1. ✅ Both succeed → Full dual-chat response
  2. ⚠️ One succeeds → Partial response with error note
  3. ❌ Both fail → 500 error

Performance Characteristics

Response Times

Groq (LPU):
  • Simple prompt: 500-1500ms
  • Complex prompt: 1500-3000ms
  • Streaming first token: 100-300ms
Bytez:
  • Simple prompt: 1500-3000ms
  • Complex prompt: 3000-5000ms

Timeout Strategy

45-Second Rationale:
  • Balances user patience with completion probability
  • Most responses complete within 30 seconds
  • Allows fallback attempts within reasonable total time
Alternative Approaches:
  • Shorter timeout (30s): More frequent fallbacks
  • Longer timeout (60s): Fewer fallbacks but slower failover

Rate Limits

Groq Free Tier:
  • 30 requests/minute
  • 14,400 tokens/minute
Groq Paid Tier:
  • Higher limits (check API dashboard)
Bytez:
  • Provider-specific limits (not documented here)
Handling: 429 Too Many Requests triggers fallback chain

Model Registry Integration

Model Lookup

public async Task<AIModel> GetModelByNameAsync(string modelName)
{
    return await _supabaseClient
        .From<AIModel>()
        .Where(m => m.ModelName == modelName)
        .Single();
}
Response:
public class AIModel
{
    public Guid ModelId { get; set; }
    public string ModelName { get; set; }
    public string DisplayName { get; set; }
    public string ProviderName { get; set; }  // "groq" or "bytez"
    public string ApiUrl { get; set; }
    public string Status { get; set; }        // "active" or "inactive"
}

Provider Assignment

Static Mapping (current):
string provider = model.ModelName switch {
    var m when m.StartsWith("llama-") => "groq",
    var m when m.StartsWith("mixtral-") => "groq",
    var m when m.StartsWith("gemma-") => "groq",
    _ => "bytez"
};
Database-Driven (future):
string provider = model.ProviderName; // From ai_models.provider_name

Provider-Specific Features

Groq LPU Advantages

Low Latency: Hardware-optimized tensor processing Fast Streaming: Sub-300ms first token latency Cost: Competitive pricing on per-token basis Models: Llama, Mixtral, Gemma families

Bytez Reliability

Uptime: Independent from Groq (diversification) Fallback: Critical for production availability Models: Variety beyond Groq’s catalog

Monitoring & Observability

Metrics to Track

Provider Success Rate:
_metrics.IncrementCounter("provider.success", tags: new[] { $"provider:{providerName}" });
_metrics.IncrementCounter("provider.failure", tags: new[] { $"provider:{providerName}" });
Response Time Percentiles:
  • p50 (median)
  • p95
  • p99
Timeout Rate:
_metrics.IncrementCounter("provider.timeout", tags: new[] { $"provider:{providerName}" });
Fallback Frequency:
  • How often does primary fail?
  • How often does secondary succeed?

Alerting Thresholds

High Timeout Rate: > 10% of requests timeout Low Success Rate: < 95% for primary provider Fallback Dependency: > 20% of requests use fallback

Configuration

Environment Variables

# Groq Configuration
GROQ_API_KEY=gsk_...
GROQ_API_URL=https://api.groq.com/openai/v1

# Bytez Configuration
BYTEZ_API_KEY=btz_...
BYTEZ_API_URL=https://api.bytez.com/v1

# Timeout Configuration
PROVIDER_TIMEOUT_SECONDS=45

HttpClient Configuration

services.AddHttpClient<GroqProvider>(client => {
    client.BaseAddress = new Uri(Configuration["GROQ_API_URL"]);
    client.Timeout = TimeSpan.FromSeconds(45);
    client.DefaultRequestHeaders.Add("User-Agent", "DualMind/1.0");
});
Resilience: HttpClient connection pooling and DNS refresh handled automatically

Future Enhancements

Adaptive Routing

Route based on:
  • Model performance metrics
  • Current provider latency
  • Rate limit status
  • Cost optimization

Circuit Breaker

if (_groqCircuitBreaker.IsOpen) {
    // Skip Groq, go directly to Bytez
    return await _bytezProvider.ChatCompletionAsync(request);
}
Trigger: Open circuit after N consecutive failures Reset: Close circuit after cooldown period

Response Caching

Deterministic Requests (temperature = 0):
var cacheKey = HashPrompt(request.Prompt + request.Model);
if (_cache.TryGetValue(cacheKey, out var cachedResponse))
    return cachedResponse;
Non-Deterministic: No caching (current behavior)

Next Steps

Request Lifecycle

Provider execution in request flow

System Invariants

Provider timeout and fallback invariants