Performance Guide

This guide covers performance optimization strategies for Hapax, including HTTP/3, caching, queuing, and load management.

Performance Features

HTTP/3 Support

Hapax supports HTTP/3 (QUIC) for improved performance:

server:
  http3:
    enabled: true
    port: 443
    tls_cert_file: "/etc/certs/server.crt"
    tls_key_file: "/etc/certs/server.key"
    idle_timeout: 30s
    max_bi_streams_concurrent: 100     # Concurrent bidirectional streams
    max_uni_streams_concurrent: 100     # Concurrent unidirectional streams
    max_stream_receive_window: 6291456       # 6MB stream window
    max_connection_receive_window: 15728640   # 15MB connection window
    enable_0rtt: true            # Enable 0-RTT for faster connections
    max_0rtt_size: 16384         # 16KB max 0-RTT size
    allow_0rtt_replay: false     # Disable replay protection
    udp_receive_buffer_size: 8388608   # 8MB UDP buffer

Benefits of HTTP/3:

  • Improved connection establishment
  • Better multiplexing
  • Reduced head-of-line blocking
  • Enhanced mobile performance
  • Faster connection recovery

Response Caching

Three caching strategies available:

llm:
  cache:
    enable: true
    type: "redis"        # Options: memory, redis, file
    ttl: 24h            # Cache entry lifetime
    max_size: 1000      # Maximum entries/size
    redis:              # Redis-specific settings
      address: "localhost:6379"
      password: ${REDIS_PASSWORD}
      db: 0

Cache types:

  • Memory: Fast, non-persistent, cleared on restart
  • Redis: Persistent, distributed, good for clusters
  • File: Persistent, good for single instances

Request Queuing

Queue system for high-load scenarios:

queue:
  enabled: true
  initial_size: 1000         # Starting queue capacity
  state_path: "/var/lib/hapax/queue.state"  # Persistence path
  save_interval: 30s         # State save frequency

Benefits:

  • Handles traffic spikes
  • Prevents system overload
  • Optional state persistence
  • Configurable queue size

Circuit Breaker

Protects system from cascading failures:

circuit_breaker:
  max_requests: 100          # Requests in half-open state
  interval: 30s              # Monitoring interval
  timeout: 10s              # Time in open state
  failure_threshold: 5      # Failures before opening

States:

  • Closed: Normal operation
  • Open: Stop requests after failures
  • Half-Open: Testing recovery

Provider Failover

Automatic provider switching for reliability:

providers:
  anthropic:
    type: anthropic
    model: claude-3-haiku
    api_key: ${ANTHROPIC_API_KEY}
  openai:
    type: openai
    model: gpt-4
    api_key: ${OPENAI_API_KEY}

provider_preference:
  - anthropic
  - openai

Features:

  • Automatic failover
  • Health monitoring
  • Configurable preference order
  • Seamless switching

Performance Tuning

Memory Optimization

Adjust these settings based on available memory:

  • max_header_bytes: HTTP header size limit
  • max_stream_receive_window: Per-stream buffer
  • max_connection_receive_window: Per-connection buffer
  • Cache size limits

Concurrency Settings

Tune these for your workload:

  • max_bi_streams_concurrent: Bidirectional streams
  • max_uni_streams_concurrent: Unidirectional streams
  • Queue size and persistence
  • Circuit breaker thresholds

Network Optimization

Network performance settings:

  • HTTP/3 buffer sizes
  • UDP receive buffer size
  • Idle timeouts
  • 0-RTT configuration

Monitoring Performance

Use built-in metrics:

routes:
  - path: "/metrics"
    handler: "metrics"
    version: "v1"
    methods: ["GET"]
    middleware: ["auth"]

Available metrics:

  • Request latencies
  • Queue lengths
  • Cache hit rates
  • Circuit breaker states
  • Provider health status

Best Practices

Development Environment

server:
  port: 8080
  http3:
    enabled: false
llm:
  cache:
    type: "memory"
    max_size: 1000
queue:
  enabled: false

Production Environment

server:
  port: 443
  http3:
    enabled: true
    max_bi_streams_concurrent: 200
    max_stream_receive_window: 8388608  # 8MB
llm:
  cache:
    type: "redis"
    ttl: 24h
queue:
  enabled: true
  initial_size: 5000
  state_path: "/var/lib/hapax/queue.state"
circuit_breaker:
  max_requests: 200
  failure_threshold: 10

High-Load Environment

server:
  http3:
    max_bi_streams_concurrent: 500
    max_stream_receive_window: 16777216  # 16MB
    max_connection_receive_window: 33554432  # 32MB
    udp_receive_buffer_size: 16777216  # 16MB
llm:
  cache:
    type: "redis"
    max_size: 10000
queue:
  enabled: true
  initial_size: 10000
circuit_breaker:
  max_requests: 500
  interval: 60s

Troubleshooting

Common performance issues and solutions:

High Latency

  • Enable HTTP/3
  • Increase stream windows
  • Adjust UDP buffer size
  • Check provider health

Memory Usage

  • Reduce cache size
  • Lower stream limits
  • Adjust queue size
  • Monitor metrics

Request Failures

  • Check circuit breaker logs
  • Verify provider health
  • Adjust retry settings
  • Enable failover

Queue Overflow

  • Increase queue size
  • Enable persistence
  • Adjust circuit breaker
  • Scale horizontally

Back to top

Licensed under Apache License, Version 2.0.