Performance Guide
This guide covers performance optimization strategies for Hapax, including HTTP/3, caching, queuing, and load management.
Performance Features
HTTP/3 Support
Hapax supports HTTP/3 (QUIC) for improved performance:
server:
http3:
enabled: true
port: 443
tls_cert_file: "/etc/certs/server.crt"
tls_key_file: "/etc/certs/server.key"
idle_timeout: 30s
max_bi_streams_concurrent: 100 # Concurrent bidirectional streams
max_uni_streams_concurrent: 100 # Concurrent unidirectional streams
max_stream_receive_window: 6291456 # 6MB stream window
max_connection_receive_window: 15728640 # 15MB connection window
enable_0rtt: true # Enable 0-RTT for faster connections
max_0rtt_size: 16384 # 16KB max 0-RTT size
allow_0rtt_replay: false # Disable replay protection
udp_receive_buffer_size: 8388608 # 8MB UDP buffer
Benefits of HTTP/3:
- Improved connection establishment
- Better multiplexing
- Reduced head-of-line blocking
- Enhanced mobile performance
- Faster connection recovery
Response Caching
Three caching strategies available:
llm:
cache:
enable: true
type: "redis" # Options: memory, redis, file
ttl: 24h # Cache entry lifetime
max_size: 1000 # Maximum entries/size
redis: # Redis-specific settings
address: "localhost:6379"
password: ${REDIS_PASSWORD}
db: 0
Cache types:
- Memory: Fast, non-persistent, cleared on restart
- Redis: Persistent, distributed, good for clusters
- File: Persistent, good for single instances
Request Queuing
Queue system for high-load scenarios:
queue:
enabled: true
initial_size: 1000 # Starting queue capacity
state_path: "/var/lib/hapax/queue.state" # Persistence path
save_interval: 30s # State save frequency
Benefits:
- Handles traffic spikes
- Prevents system overload
- Optional state persistence
- Configurable queue size
Circuit Breaker
Protects system from cascading failures:
circuit_breaker:
max_requests: 100 # Requests in half-open state
interval: 30s # Monitoring interval
timeout: 10s # Time in open state
failure_threshold: 5 # Failures before opening
States:
- Closed: Normal operation
- Open: Stop requests after failures
- Half-Open: Testing recovery
Provider Failover
Automatic provider switching for reliability:
providers:
anthropic:
type: anthropic
model: claude-3-haiku
api_key: ${ANTHROPIC_API_KEY}
openai:
type: openai
model: gpt-4
api_key: ${OPENAI_API_KEY}
provider_preference:
- anthropic
- openai
Features:
- Automatic failover
- Health monitoring
- Configurable preference order
- Seamless switching
Performance Tuning
Memory Optimization
Adjust these settings based on available memory:
max_header_bytes
: HTTP header size limitmax_stream_receive_window
: Per-stream buffermax_connection_receive_window
: Per-connection buffer- Cache size limits
Concurrency Settings
Tune these for your workload:
max_bi_streams_concurrent
: Bidirectional streamsmax_uni_streams_concurrent
: Unidirectional streams- Queue size and persistence
- Circuit breaker thresholds
Network Optimization
Network performance settings:
- HTTP/3 buffer sizes
- UDP receive buffer size
- Idle timeouts
- 0-RTT configuration
Monitoring Performance
Use built-in metrics:
routes:
- path: "/metrics"
handler: "metrics"
version: "v1"
methods: ["GET"]
middleware: ["auth"]
Available metrics:
- Request latencies
- Queue lengths
- Cache hit rates
- Circuit breaker states
- Provider health status
Best Practices
Development Environment
server:
port: 8080
http3:
enabled: false
llm:
cache:
type: "memory"
max_size: 1000
queue:
enabled: false
Production Environment
server:
port: 443
http3:
enabled: true
max_bi_streams_concurrent: 200
max_stream_receive_window: 8388608 # 8MB
llm:
cache:
type: "redis"
ttl: 24h
queue:
enabled: true
initial_size: 5000
state_path: "/var/lib/hapax/queue.state"
circuit_breaker:
max_requests: 200
failure_threshold: 10
High-Load Environment
server:
http3:
max_bi_streams_concurrent: 500
max_stream_receive_window: 16777216 # 16MB
max_connection_receive_window: 33554432 # 32MB
udp_receive_buffer_size: 16777216 # 16MB
llm:
cache:
type: "redis"
max_size: 10000
queue:
enabled: true
initial_size: 10000
circuit_breaker:
max_requests: 500
interval: 60s
Troubleshooting
Common performance issues and solutions:
High Latency
- Enable HTTP/3
- Increase stream windows
- Adjust UDP buffer size
- Check provider health
Memory Usage
- Reduce cache size
- Lower stream limits
- Adjust queue size
- Monitor metrics
Request Failures
- Check circuit breaker logs
- Verify provider health
- Adjust retry settings
- Enable failover
Queue Overflow
- Increase queue size
- Enable persistence
- Adjust circuit breaker
- Scale horizontally