Provider Integration - DualMind Lab

Provider Architecture

Interface Abstraction

public interface IChatProvider
{
    Task<ChatResponse> ChatCompletionAsync(ChatRequest request);
    Task ChatCompletionStreamAsync(ChatRequest request, Func<string, Task> onChunk);
    string ProviderName { get; }
}

Implementations:

GroqProvider: Primary LPU-accelerated inference
BytezProvider: Fallback for reliability

Provider Factory

public class ChatProviderFactory
{
    public IChatProvider GetProvider(string providerName)
    {
        return providerName.ToLower() switch {
            "groq" => _serviceProvider.GetRequiredService<GroqProvider>(),
            "bytez" => _serviceProvider.GetRequiredService<BytezProvider>(),
            _ => throw new ArgumentException($"Unknown provider: {providerName}")
        };
    }
}

Model → Provider Routing:

llama-*     → Groq
mixtral-*   → Groq
gemma-*     → Groq
Others      → Bytez

Groq Provider

API Configuration

private readonly HttpClient _httpClient;
private const string BaseUrl = "https://api.groq.com/openai/v1";
private readonly string _apiKey;

public GroqProvider(IConfiguration config)
{
    _apiKey = config["GROQ_API_KEY"];
    _httpClient.DefaultRequestHeaders.Add("Authorization", $"Bearer {_apiKey}");
}

Chat Completion Request

public async Task<ChatResponse> ChatCompletionAsync(ChatRequest request)
{
    var payload = new {
        model = request.Model,
        messages = new[] {
            new { role = "system", content = request.SystemMessage ?? "You are a helpful assistant." },
            new { role = "user", content = request.Prompt }
        },
        temperature = request.Temperature ?? 0.7,
        max_tokens = request.MaxTokens,
        stream = false
    };
    
    var cts = new CancellationTokenSource(TimeSpan.FromSeconds(45));
    
    var response = await _httpClient.PostAsJsonAsync(
        $"{BaseUrl}/chat/completions",
        payload,
        cts.Token
    );
    
    response.EnsureSuccessStatusCode();
    
    var result = await response.Content.ReadFromJsonAsync<GroqResponse>();
    
    return new ChatResponse {
        Message = result.Choices[0].Message.Content,
        Usage = result.Usage,
        ResponseTimeMs = (int)(DateTime.UtcNow - startTime).TotalMilliseconds
    };
}

Streaming Implementation

public async Task ChatCompletionStreamAsync(
    ChatRequest request,
    Func<string, Task> onChunk,
    CancellationToken cancellationToken)
{
    var payload = new { /* same as above but stream = true */ };
    
    var httpRequest = new HttpRequestMessage(HttpMethod.Post, $"{BaseUrl}/chat/completions") {
        Content = JsonContent.Create(payload)
    };
    
    using var response = await _httpClient.SendAsync(
        httpRequest,
        HttpCompletionOption.ResponseHeadersRead,
        cancellationToken
    );
    
    using var stream = await response.Content.ReadAsStreamAsync();
    using var reader = new StreamReader(stream);
    
    while (!reader.EndOfStream && !cancellationToken.IsCancellationRequested) {
        var line = await reader.ReadLineAsync();
        
        if (string.IsNullOrWhiteSpace(line) || !line.StartsWith("data: "))
            continue;
            
        var json = line.Substring(6); // Remove "data: " prefix
        
        if (json == "[DONE]")
            break;
            
        var chunk = JsonSerializer.Deserialize<GroqStreamChunk>(json);
        var delta = chunk.Choices[0]?.Delta?.Content;
        
        if (!string.IsNullOrEmpty(delta))
            await onChunk(delta);
    }
}

Error Handling

try {
    return await ChatCompletionAsync(request);
}
catch (TaskCanceledException) {
    throw new ProviderTimeoutException("Groq API timeout after 45 seconds");
}
catch (HttpRequestException ex) when (ex.StatusCode == HttpStatusCode.TooManyRequests) {
    throw new ProviderRateLimitException("Groq rate limit exceeded");
}
catch (HttpRequestException ex) {
    throw new ProviderException($"Groq API error: {ex.Message}");
}

Bytez Provider

API Configuration

private const string BaseUrl = "https://api.bytez.com/v1";
private readonly string _apiKey;

public BytezProvider(IConfiguration config)
{
    _apiKey = config["BYTEZ_API_KEY"];
}

Implementation Differences

Request Format: Similar to OpenAI API Response Parsing: Expects OpenAI-compatible response structure Timeout: Same 45-second limit Streaming: Not currently implemented (fallback is non-streaming only)

Failover Chain

Single Chat Failover

public async Task<ChatResponse> ExecuteWithFallbackAsync(ChatRequest request)
{
    // Attempt 1: Primary provider with selected model
    try {
        var provider = _providerFactory.GetProvider(request.Provider);
        return await provider.ChatCompletionAsync(request);
    }
    catch (ProviderException ex) {
        _logger.LogWarning($"Primary attempt failed: {ex.Message}");
    }
    
    // Attempt 2: Groq with fallback model
    try {
        request.Model = "llama-3.3-70b-versatile";
        var groqProvider = _providerFactory.GetProvider("groq");
        return await groqProvider.ChatCompletionAsync(request);
    }
    catch (ProviderException ex) {
        _logger.LogWarning($"Groq retry failed: {ex.Message}");
    }
    
    // Attempt 3: Bytez provider
    try {
        var bytezProvider = _providerFactory.GetProvider("bytez");
        return await bytezProvider.ChatCompletionAsync(request);
    }
    catch (ProviderException ex) {
        _logger.LogError($"All providers failed: {ex.Message}");
        throw new AllProvidersFailedException("All providers exhausted");
    }
}

Total Max Time: 135 seconds (3 × 45s) Logging: Each failure logged with attempt number

Dual Chat Failover

Each model has independent fallback chain:

var task1 = ExecuteWithFallbackAsync(new ChatRequest { Model = model1 });
var task2 = ExecuteWithFallbackAsync(new ChatRequest { Model = model2 });

await Task.WhenAll(task1, task2);

Outcomes:

✅ Both succeed → Full dual-chat response
⚠️ One succeeds → Partial response with error note
❌ Both fail → 500 error

Performance Characteristics

Response Times

Groq (LPU):

Simple prompt: 500-1500ms
Complex prompt: 1500-3000ms
Streaming first token: 100-300ms

Bytez:

Simple prompt: 1500-3000ms
Complex prompt: 3000-5000ms

Timeout Strategy

45-Second Rationale:

Balances user patience with completion probability
Most responses complete within 30 seconds
Allows fallback attempts within reasonable total time

Alternative Approaches:

Shorter timeout (30s): More frequent fallbacks
Longer timeout (60s): Fewer fallbacks but slower failover

Rate Limits

Groq Free Tier:

30 requests/minute
14,400 tokens/minute

Groq Paid Tier:

Higher limits (check API dashboard)

Bytez:

Provider-specific limits (not documented here)

Handling: 429 Too Many Requests triggers fallback chain

Model Registry Integration

Model Lookup

public async Task<AIModel> GetModelByNameAsync(string modelName)
{
    return await _supabaseClient
        .From<AIModel>()
        .Where(m => m.ModelName == modelName)
        .Single();
}

Response:

public class AIModel
{
    public Guid ModelId { get; set; }
    public string ModelName { get; set; }
    public string DisplayName { get; set; }
    public string ProviderName { get; set; }  // "groq" or "bytez"
    public string ApiUrl { get; set; }
    public string Status { get; set; }        // "active" or "inactive"
}

Provider Assignment

Static Mapping (current):

string provider = model.ModelName switch {
    var m when m.StartsWith("llama-") => "groq",
    var m when m.StartsWith("mixtral-") => "groq",
    var m when m.StartsWith("gemma-") => "groq",
    _ => "bytez"
};

Database-Driven (future):

string provider = model.ProviderName; // From ai_models.provider_name

Provider-Specific Features

Groq LPU Advantages

Low Latency: Hardware-optimized tensor processing Fast Streaming: Sub-300ms first token latency Cost: Competitive pricing on per-token basis Models: Llama, Mixtral, Gemma families

Bytez Reliability

Uptime: Independent from Groq (diversification) Fallback: Critical for production availability Models: Variety beyond Groq’s catalog

Monitoring & Observability

Metrics to Track

Provider Success Rate:

_metrics.IncrementCounter("provider.success", tags: new[] { $"provider:{providerName}" });
_metrics.IncrementCounter("provider.failure", tags: new[] { $"provider:{providerName}" });

Response Time Percentiles:

p50 (median)
p95
p99

Timeout Rate:

_metrics.IncrementCounter("provider.timeout", tags: new[] { $"provider:{providerName}" });

Fallback Frequency:

How often does primary fail?
How often does secondary succeed?

Alerting Thresholds

High Timeout Rate: > 10% of requests timeout Low Success Rate: < 95% for primary provider Fallback Dependency: > 20% of requests use fallback

Configuration

Environment Variables

# Groq Configuration
GROQ_API_KEY=gsk_...
GROQ_API_URL=https://api.groq.com/openai/v1

# Bytez Configuration
BYTEZ_API_KEY=btz_...
BYTEZ_API_URL=https://api.bytez.com/v1

# Timeout Configuration
PROVIDER_TIMEOUT_SECONDS=45

HttpClient Configuration

services.AddHttpClient<GroqProvider>(client => {
    client.BaseAddress = new Uri(Configuration["GROQ_API_URL"]);
    client.Timeout = TimeSpan.FromSeconds(45);
    client.DefaultRequestHeaders.Add("User-Agent", "DualMind/1.0");
});

Resilience: HttpClient connection pooling and DNS refresh handled automatically

Future Enhancements

Adaptive Routing

Route based on:

Model performance metrics
Current provider latency
Rate limit status
Cost optimization

Circuit Breaker

if (_groqCircuitBreaker.IsOpen) {
    // Skip Groq, go directly to Bytez
    return await _bytezProvider.ChatCompletionAsync(request);
}

Trigger: Open circuit after N consecutive failures Reset: Close circuit after cooldown period

Response Caching

Deterministic Requests (temperature = 0):

var cacheKey = HashPrompt(request.Prompt + request.Model);
if (_cache.TryGetValue(cacheKey, out var cachedResponse))
    return cachedResponse;

Non-Deterministic: No caching (current behavior)

Next Steps

Request Lifecycle

Provider execution in request flow

System Invariants

Provider timeout and fallback invariants

Getting Started

Core Concepts

Architecture

Database

Frontend Guide

Development

Deployment

​Provider Architecture

​Interface Abstraction

​Provider Factory

​Groq Provider

​API Configuration

​Chat Completion Request

​Streaming Implementation

​Error Handling

​Bytez Provider

​API Configuration

​Implementation Differences

​Failover Chain

​Single Chat Failover

​Dual Chat Failover

​Performance Characteristics

​Response Times

​Timeout Strategy

​Rate Limits

​Model Registry Integration

​Model Lookup

​Provider Assignment

​Provider-Specific Features

​Groq LPU Advantages

​Bytez Reliability

​Monitoring & Observability

​Metrics to Track

​Alerting Thresholds

​Configuration

​Environment Variables

​HttpClient Configuration

​Future Enhancements

​Adaptive Routing

​Circuit Breaker

​Response Caching

​Next Steps

Request Lifecycle

System Invariants

Provider Architecture

Interface Abstraction

Provider Factory

Groq Provider

API Configuration

Chat Completion Request

Streaming Implementation

Error Handling

Bytez Provider

API Configuration

Implementation Differences

Failover Chain

Single Chat Failover

Dual Chat Failover

Performance Characteristics

Response Times

Timeout Strategy

Rate Limits

Model Registry Integration

Model Lookup

Provider Assignment

Provider-Specific Features

Groq LPU Advantages

Bytez Reliability

Monitoring & Observability

Metrics to Track

Alerting Thresholds

Configuration

Environment Variables

HttpClient Configuration

Future Enhancements

Adaptive Routing

Circuit Breaker

Response Caching

Next Steps