Security Guide

Hapax implements a comprehensive security architecture that protects your LLM service through multiple integrated layers. This guide explains how each security component works and how to configure them for your specific needs.

Security Architecture

The security system in Hapax is built around a modular architecture where each component provides specific protections while working together as a cohesive system. At the core of this architecture is a configuration system that manages all security settings through a structured YAML format.

The server component forms the first line of defense, handling network security, transport encryption, and request validation. It implements timeouts, connection limits, and protocol-specific protections. Behind this, the queue system manages resource allocation and prevents system exhaustion by controlling request flow.

For external communications, the provider access system securely manages API keys and implements automatic failover mechanisms. This is complemented by a comprehensive monitoring system that provides real-time visibility into security events through structured logging and metrics collection.

Here’s how these components are organized in the configuration:

server:
  # Network and transport security settings
  read_timeout: 30s
  write_timeout: 45s
  max_header_bytes: 2097152
  http3:
    enabled: false
    # TLS and protocol security settings

queue:
  # Resource protection settings
  enabled: true
  initial_size: 1000
  # State management settings

llm:
  # Provider security settings
  api_key: ${LLM_API_KEY}
  # Backup provider configuration
  
logging:
  # Audit and monitoring settings
  level: info
  format: json

metrics:
  # Security metrics collection
  enabled: true
  # Metric configuration

Core Security Features

Hapax provides security by default through carefully chosen configurations that protect your service from the moment it starts. These defaults are designed to prevent common security issues while remaining flexible enough to adapt to different deployment scenarios.

The default configuration implements several key protections:

Request timeouts prevent resource exhaustion by limiting how long a request can take. The read timeout (30 seconds) controls how long the server will wait for a complete request, while the write timeout (45 seconds) ensures responses don’t hang indefinitely.

Header size limits (2MB by default) protect against memory exhaustion attacks by restricting the amount of data that can be sent in HTTP headers. This is particularly important for preventing denial of service attempts through oversized headers.

API key management is handled securely through environment variables, preventing accidental exposure in configuration files or logs. The system supports multiple provider keys with automatic failover capabilities.

Error handling includes automatic retries for specific types of failures, with configurable delays and backoff strategies. This helps maintain service stability during transient issues while preventing excessive retry attempts that could worsen an outage.

When to Enhance Security

Your security needs will evolve as your deployment grows and faces different challenges. Consider implementing additional security measures in these scenarios:

Production Deployment: When moving from development to production, you’ll need to enable additional security features like TLS certificates, rate limiting, and audit logging.

Sensitive Data Handling: If your service processes sensitive information, implement encryption at rest and in transit, along with strict access controls.

Audit Requirements: When compliance or internal policies require tracking of system access and changes, enable comprehensive audit logging and monitoring.

High-Scale Operations: As your service scales, implement additional protections against resource exhaustion and denial of service attacks.

Compliance Requirements: When meeting specific compliance standards, enable relevant security controls and monitoring capabilities.

Core Security Components

Request Queue Protection

The queue system provides robust protection against resource exhaustion and ensures system stability through a sophisticated request management system. Here’s how it works:

queue:
  enabled: true                # Enable queue protection
  initial_size: 1000          # Starting queue capacity
  state_path: "/var/queue"    # State persistence location
  save_interval: "30s"        # State backup frequency

The queue implements several critical security mechanisms:

Request Lifecycle Management The request lifecycle is managed through a sophisticated FIFO queue system. When a request arrives, it is assigned a dedicated channel that signals its completion status, ensuring proper cleanup even in case of failures. The queue position is stored in the request context using a type-safe key (queuePositionKey), enabling accurate tracking and monitoring throughout the request’s lifetime. The system implements automatic cleanup through multiple mechanisms: channel-based signaling for completion, defer statements for resource cleanup even during panics, and automatic queue management that removes completed requests.

Thread Safety Protection Thread safety is achieved through a multi-layered approach. A read-write mutex (RWMutex) protects all queue operations, preventing race conditions during concurrent access. The queue size is managed through atomic operations, ensuring accurate counting even under high concurrency. Request completion is coordinated through channel-based synchronization, which provides a safe way to signal when requests are finished. Memory safety is further enhanced through defer-based cleanup mechanisms that ensure resources are always released properly.

State Persistence Security The queue’s state persistence system ensures reliable operation across server restarts through a robust atomic file operation mechanism. When saving state, the system first writes to a temporary file with carefully chosen permissions (0644) and then uses an atomic rename operation to replace the old state file, preventing corruption during saves. The storage directory is created with secure permissions (0755) if it doesn’t exist. The system implements automatic recovery on startup by attempting to restore the previous state, falling back gracefully to initial configuration if the state file is missing or invalid.

Health Monitoring The queue implements comprehensive health monitoring through Prometheus metrics. It actively tracks the number of requests in various states: those waiting in the queue and those being processed. The system measures queue wait times and processing duration for performance analysis. Error conditions, such as queue capacity limits and persistence failures, are tracked through dedicated error metrics. All metrics are automatically updated through deferred functions and atomic operations to ensure accuracy even during concurrent operations.

The queue system’s resilience is built into every operation: The system automatically cleans up completed requests through a combination of defer statements and channel-based signaling. When a request finishes or encounters an error, all associated resources are properly released. Memory leaks are prevented through careful resource management and Go’s garbage collection. During shutdown, the system performs a graceful cleanup process, waiting for in-flight requests to complete and ensuring a final state save before terminating.

API Key Management

The API key management system in Hapax provides secure handling of provider credentials through a flexible configuration system. The LLMConfig structure supports multiple providers (OpenAI, Anthropic, Ollama) with individual API keys and settings. Each key is stored using environment variable substitution (e.g., ${OPENAI_API_KEY}), ensuring sensitive credentials are never hardcoded in configuration files.

The system supports a primary provider configuration with backup providers for failover scenarios. Each provider configuration includes the API key, model selection, and endpoint URL, allowing for fine-grained control over service access. For example:

llm:
  # Primary provider configuration
  provider: "anthropic"
  api_key: ${ANTHROPIC_API_KEY}
  endpoint: "https://api.anthropic.com/v1"
  
  # Backup provider configuration
  backup_providers:
    - provider: "openai"
      api_key: ${OPENAI_API_KEY}
      model: "gpt-3.5-turbo"
    - provider: "ollama"
      api_key: ${OLLAMA_API_KEY}
      model: "llama2"

Provider health is actively monitored through a dedicated health check system. The ProviderHealthCheck configuration enables automatic monitoring with configurable intervals and failure thresholds:

health_check:
  enabled: true              # Enables continuous monitoring
  interval: 15s              # Health check frequency
  timeout: 5s                # Maximum time for health check
  failure_threshold: 2       # Failures before marking unhealthy

The system implements sophisticated error handling through a RetryConfig that manages transient failures and rate limiting. The retry mechanism uses exponential backoff with configurable parameters:

retry:
  max_retries: 5             # Maximum retry attempts
  initial_delay: 100ms       # Starting delay
  max_delay: 5s              # Maximum backoff time
  multiplier: 1.5            # Exponential increase factor
  retryable_errors:          # Specific error handling
    - rate_limit
    - timeout
    - server_error

This configuration enables automatic failover between providers based on their health status and availability. The system monitors each provider’s performance and automatically switches to backup providers when necessary, ensuring continuous service availability while maintaining secure API key handling.

HTTP/3 Security

HTTP/3 support in Hapax is implemented through a comprehensive configuration system that requires mandatory TLS for enhanced security. The HTTP3Config structure defines all security-related settings and enforces secure defaults. Here’s the detailed configuration:

server:
  http3:
    enabled: true                        # HTTP/3 must be explicitly enabled
    port: 443                           # Standard HTTPS/QUIC port
    tls_cert_file: "/certs/server.crt"  # Required TLS certificate
    tls_key_file: "/certs/server.key"   # Required private key
    
    # Connection Management
    idle_timeout: 30s                   # Maximum idle connection time
    max_bi_streams_concurrent: 100      # Bidirectional stream limit
    max_uni_streams_concurrent: 100     # Unidirectional stream limit
    
    # Flow Control
    max_stream_receive_window: "6MB"    # Per-stream buffer limit
    max_connection_receive_window: "15MB" # Per-connection buffer limit
    udp_receive_buffer_size: "8MB"      # UDP socket buffer size
    
    # Early Data Security
    enable_0rtt: true                   # Optional 0-RTT support
    max_0rtt_size: "16KB"              # 0-RTT data size limit
    allow_0rtt_replay: false           # Replay protection enabled

The transport security system enforces mandatory TLS 1.3 for all HTTP/3 connections. This requirement ensures perfect forward secrecy and secure key exchange. The system requires valid TLS certificates and private keys, which must be specified through the TLSCertFile and TLSKeyFile configuration options.

Request protection is implemented through a sophisticated early data (0-RTT) system. While 0-RTT support can be enabled for improved performance, the system includes built-in replay protection through the allow_0rtt_replay setting. Early data is limited to 16KB by default to prevent abuse, and all requests are validated for proper headers and content.

Resource control is managed through a comprehensive flow control system. The configuration allows fine-tuning of various buffer sizes: per-stream receive windows are limited to 6MB, connection-level windows to 15MB, and UDP socket buffers to 8MB. These limits prevent memory exhaustion while maintaining good performance. Additionally, concurrent stream limits (100 for both bidirectional and unidirectional) prevent resource exhaustion from excessive stream creation.

The configuration system includes several safety measures. Certificate paths must be valid and accessible, port numbers are validated, and all security-related settings are checked during startup. The system enforces secure defaults, with HTTP/3 disabled by default and replay protection enabled when 0-RTT is used.

Request Processing

The request processing system implements security through a carefully designed middleware chain that handles different aspects of request security. Each middleware component provides specific security features while working together to create a comprehensive protection system:

server:
  middleware:
    - request_timer     # Performance monitoring
    - panic_recovery    # Error protection
    - cors             # Cross-origin security
    - queue            # Resource management
    - rate_limit       # Request throttling
    - auth             # Access control
    - logging          # Audit trail

The RequestTimer middleware provides essential performance monitoring and tracking. It wraps each HTTP handler to measure request processing time accurately. The middleware records the start time when a request arrives, tracks its execution through the handler chain, and calculates the total duration upon completion. This timing information is added to the response headers through X-Response-Time, enabling precise monitoring of request handling performance.

The PanicRecovery middleware ensures system stability by implementing a robust error recovery mechanism. It uses Go’s defer mechanism to catch any panics that occur during request processing. When a panic is detected, the middleware prevents the error from crashing the server by catching it and returning a controlled 500 Internal Server Error response. This ensures that individual request failures don’t affect system stability.

The CORS (Cross-Origin Resource Sharing) middleware implements a comprehensive security policy for cross-origin requests. It carefully controls which origins can access the API by setting appropriate security headers:

cors:
  allow_origins: ["*"]                  # Origin control
  allow_methods: ["GET", "POST", "PUT", "DELETE", "OPTIONS"]  # Method restrictions
  allow_headers: ["Accept", "Authorization", "Content-Type", "X-CSRF-Token"]  # Header validation

The middleware handles preflight requests with proper OPTIONS response handling and enforces security policies through carefully configured headers. This ensures that the API is protected from unauthorized cross-origin access while remaining accessible to legitimate clients.

The request processing system is designed to be both secure and efficient. Each middleware component focuses on a specific security aspect while maintaining high performance through careful implementation. The system uses efficient data structures and minimizes memory allocations, ensuring that security features don’t significantly impact request processing speed.

Network Security

The network security system implements comprehensive protection through carefully configured server settings and timeout mechanisms. The ServerConfig structure defines essential security parameters that protect against various network-based attacks:

server:
  # Basic Server Security
  port: 8080                    # Configurable server port
  read_timeout: 30s             # Request read protection
  write_timeout: 30s            # Response write protection
  max_header_bytes: 1048576     # Header size control (1MB)
  shutdown_timeout: 30s         # Graceful shutdown period

  # Health Monitoring
  health_check:
    enabled: true               # Active monitoring
    interval: 60s               # Check frequency
    timeout: 5s                 # Check deadline
    threshold: 3                # Failure limit
    checks:
      memory: "system"          # Resource monitoring
      latency: "http"          # Performance tracking

The server implements multiple timeout mechanisms to prevent resource exhaustion and denial of service attacks. The ReadTimeout setting limits the time allowed for reading the entire request, including the body, protecting against slow-read attacks. The WriteTimeout setting controls response writing time, preventing slow-write attacks and ensuring timely response delivery.

Resource control is implemented through careful limits on request components. The MaxHeaderBytes setting caps header size at 1MB by default, preventing header-based memory exhaustion attacks. The system also implements automatic cleanup of idle connections and proper connection state management through the shutdown timeout mechanism.

Health monitoring provides continuous security validation through active checks. The health check system monitors various aspects of server operation:

health_check:
  memory_threshold: "90%"       # Memory usage limit
  cpu_threshold: "80%"          # CPU usage limit
  disk_threshold: "95%"         # Storage limit
  latency_threshold: "500ms"    # Response time limit

When health checks detect issues, the system can take automatic action to protect itself, such as temporarily rejecting new connections or initiating graceful shutdown procedures.

The network security system is designed for robust operation in production environments. It includes proper error handling for network issues, automatic recovery from transient failures, and detailed logging of security-relevant events. The system maintains high availability while protecting against common network-based attacks through its comprehensive security configuration.

Monitoring and Auditing

The monitoring and auditing system implements comprehensive observability through structured logging, health checks, and metrics collection. The configuration system provides detailed control over monitoring behavior:

logging:
  level: info                   # Logging verbosity
  format: json                  # Structured output

routes:
  - path: /metrics
    handler: metrics
    version: v1
    methods: [GET]
    middleware: [auth]          # Protected metrics endpoint

  - path: /health
    handler: health
    version: v1
    methods: [GET]             # Health check endpoint

  - path: /v1/completions
    handler: completion
    health_check:
      enabled: true
      interval: 30s            # Check frequency
      timeout: 5s              # Check deadline
      threshold: 3             # Failure limit
      checks:
        api: http              # API availability
        latency: threshold     # Performance monitoring

The logging system provides detailed security event tracking through structured JSON output. This format ensures that security-relevant events are easily parseable and can be integrated with security information and event management (SIEM) systems. The logging level can be adjusted to capture different levels of detail, from basic security events to detailed debug information.

Health monitoring is implemented through a sophisticated check system that monitors various aspects of the service:

health_check:
  enabled: true
  interval: 15s               # Check frequency
  timeout: 5s                 # Check deadline
  failure_threshold: 2        # Failure limit

The health check system actively monitors API endpoints, verifies latency thresholds, and tracks resource usage. When issues are detected, the system can automatically take corrective action or alert operators. Each route can have its own health check configuration, allowing for fine-grained monitoring of different service components.

Metrics collection is implemented through a protected /metrics endpoint that provides detailed performance and security metrics:

metrics:
  - request_duration_seconds    # Latency tracking
  - request_size_bytes         # Request size monitoring
  - response_size_bytes        # Response size tracking
  - active_requests            # Concurrent request count
  - error_total{type}          # Error tracking by type
  - circuit_breaker_state      # Circuit breaker status

The metrics system provides essential data for security monitoring, performance tracking, and capacity planning. All metrics are collected with appropriate labels to enable detailed analysis and alerting. The metrics endpoint is protected by authentication middleware to prevent unauthorized access to sensitive operational data.

The monitoring and auditing system is designed for production environments, providing comprehensive visibility into the service’s security posture while maintaining high performance. The system includes automatic cleanup of old logs, proper metric type selection for efficiency, and careful management of monitoring overhead.

Production Security Checklist

Before deploying Hapax to production, ensure all security features are properly configured and enabled. The following configuration represents a secure production setup:

server:
  # Server Security
  read_timeout: 30s
  write_timeout: 45s
  max_header_bytes: 2097152      # 2MB limit
  shutdown_timeout: 30s

  # HTTP/3 Security
  http3:
    enabled: true
    port: 443
    tls_cert_file: "/path/to/cert"
    tls_key_file: "/path/to/key"
    allow_0rtt_replay: false

# Provider Security
llm:
  provider: "ollama"
  api_key: "${OLLAMA_API_KEY}"   # Environment variable
  health_check:
    enabled: true
    interval: 15s
    timeout: 5s
    failure_threshold: 2
  
  # Backup Providers
  backup_providers:
    - provider: "anthropic"
      model: "claude-3-haiku"
      api_key: "${ANTHROPIC_API_KEY}"
    - provider: "openai"
      model: "gpt-3.5-turbo"
      api_key: "${OPENAI_API_KEY}"

# Circuit Breaker Protection
circuit_breaker:
  max_requests: 100
  interval: 30s
  timeout: 10s
  failure_threshold: 5

# Request Queue Management
queue:
  enabled: true
  initial_size: 1000
  state_path: "/var/queue"
  save_interval: 30s

# Monitoring Configuration
logging:
  level: "info"
  format: "json"

# Route Security
routes:
  - path: "/v1/completions"
    middleware: ["auth", "rate-limit", "cors", "logging"]
    health_check:
      enabled: true
      interval: 30s
      timeout: 5s
      threshold: 3

The configuration system implements several critical security features:

Environment Variable Protection: The system includes a sophisticated environment variable expansion mechanism that supports secure variable resolution:

expandEnvVars(s string) (string, error) {
    // Secure environment variable expansion with:
    // 1. Default value handling: ${VAR:-default}
    // 2. Nested variable resolution
    // 3. Syntax validation
    // 4. Logging for traceability
}

Default Security Configuration: The DefaultConfig() function provides secure defaults for all critical settings:

Timeouts configured to prevent resource exhaustion
HTTP/3 disabled by default for explicit opt-in
Replay protection enabled for 0-RTT when HTTP/3 is used
Authentication required for sensitive endpoints
Health monitoring enabled with reasonable thresholds
Circuit breaker configured for fail-fast behavior
Queue system configured for controlled request handling

The production deployment should verify these additional security measures:

TLS certificates are valid and properly configured
API keys are securely stored in environment variables
Logging is configured for security event tracking
Health checks are enabled and properly configured
Circuit breaker thresholds match production load
Queue system is properly sized for expected traffic
Monitoring endpoints are protected by authentication
CORS settings are appropriately restrictive
Rate limiting is configured for production traffic

Conclusion

Hapax implements a comprehensive security architecture that protects all aspects of the service through multiple layers of defense. The security features are deeply integrated into the core functionality:

Transport Security: The HTTP/3 implementation provides strong transport security through mandatory TLS 1.3, with careful configuration of timeouts, buffer sizes, and replay protection. The system supports both traditional HTTPS and modern QUIC protocols, ensuring secure communication across different network conditions.

Request Processing: The middleware chain implements essential security features such as request timing, panic recovery, CORS protection, and authentication. Each component is carefully designed to handle specific security concerns while maintaining high performance through efficient implementation.

Resource Protection: The system includes multiple resource protection mechanisms:

Circuit breaker pattern for fail-fast behavior
Request queue for traffic management
Rate limiting for abuse prevention
Memory protection through buffer limits
Graceful shutdown for clean termination

Monitoring and Auditing: Comprehensive observability is achieved through:

Structured JSON logging
Prometheus metrics collection
Health check system
Performance monitoring
Security event tracking

The configuration system provides secure defaults while allowing customization for different deployment scenarios. The environment variable expansion system ensures secure handling of sensitive configuration values, and the validation system prevents misconfigurations that could impact security.

For production deployments, the system includes a detailed security checklist and configuration examples that demonstrate secure settings for all components. The documentation provides clear guidance on security best practices, configuration options, and operational considerations.

The security architecture is designed to be both comprehensive and maintainable, with clear separation of concerns and well-defined interfaces between components. This design allows for future security enhancements while maintaining compatibility with existing deployments.