Configuration

Backend configuration is read from backend/.env or environment variables. APP_ENV=production enables fail-closed startup validation for JWT, credential encryption, and S3 secrets.

Security

Production deployments must override AUTH_JWT_SECRET, CREDENTIAL_ENCRYPTION_KEY, S3_ACCESS_KEY, and S3_SECRET_KEY
Provider API keys are encrypted at rest (AES-256)
Personal API keys are stored as HMAC-SHA256 hashes (non-reversible)
Frontend backend access is configured with NEXT_PUBLIC_API_URL
Backend CORS origins are configured with CORS_ORIGINS

Environment Variables

Application

Variable	Description	Default
`APP_ENV`	Environment (`development`/`production`)	`development`
`LOG_LEVEL`	Logging level (`DEBUG`, `INFO`, `WARNING`, `ERROR`)	`INFO`
`ROOT_PATH`	API root path for reverse-proxy sub-path deployments (e.g. `/rag`)	—
`CORS_ORIGINS`	Comma-separated allowed CORS origins	`http://localhost:3000`
`FRONTEND_URL`	Public frontend URL (used in invite emails)	`http://localhost:3000`

Authentication

Variable	Description	Default
`AUTH_MODE`	Auth mode: `local`, `oidc`, or `both`	`local`
`AUTH_JWT_SECRET`	JWT signing secret (min 32 chars in prod)	required
`OIDC_ISSUER_URL`	OIDC provider issuer URL	—
`OIDC_AUDIENCE`	OIDC token audience	`m8ty-rag`
`OIDC_TENANT_CLAIM`	OIDC claim for tenant ID auto-selection	—

Infrastructure

Variable	Description	Default
`DATABASE_URL`	PostgreSQL connection string (must include pgvector extension if Qdrant is not used)	required
`QDRANT_URL`	Qdrant vector database URL. If empty, pgvector (PostgreSQL) is used as vector backend	—
`QDRANT_COLLECTION_NAME`	Qdrant collection name	`documents`
`REDIS_URL`	Redis URL (task queue, caching)	required
`S3_ENDPOINT`	S3-compatible storage endpoint	required
`S3_ACCESS_KEY`	S3 access key	required
`S3_SECRET_KEY`	S3 secret key	required
`S3_ARTIFACTS_BUCKET`	S3 bucket name for document artifacts (auto-created on startup)	`m8ty-artifacts`
`CREDENTIAL_ENCRYPTION_KEY`	AES key for provider credential encryption	required

Vector Backend

The platform supports two vector storage backends, selected automatically:

Backend	Condition	Requirement
Qdrant	`QDRANT_URL` is set	Qdrant server running
pgvector	`QDRANT_URL` is empty or unset	PostgreSQL with pgvector extension (use `pgvector/pgvector:pg16` Docker image)

Both backends provide identical search behavior (dense cosine similarity, named vector spaces). The /health endpoint reports the active backend: {"status": "ok", "vector_backend": "qdrant"} or "pgvector".

Embedding & LLM

Variable	Description	Default
`DEFAULT_EMBEDDING_PROVIDER`	Default embedding provider	`openai`
`OPENAI_API_KEY`	OpenAI API key for embeddings	—
`OLLAMA_URL`	Ollama endpoint	`http://localhost:11434`
`LITELLM_URL`	LiteLLM proxy URL	`http://localhost:4000`
`LITELLM_API_KEY`	LiteLLM API key	—

Email (SMTP)

Variable	Description	Default
`SMTP_ENABLED`	Enable SMTP email sending	`false`
`SMTP_HOST`	SMTP server hostname	—
`SMTP_PORT`	SMTP server port	`587`
`SMTP_USER`	SMTP authentication username	—
`SMTP_PASSWORD`	SMTP authentication password	—
`SMTP_FROM`	Sender email address	—
`SMTP_TLS`	Use TLS for SMTP	`true`

Observability

Variable	Description	Default
`LANGFUSE_PUBLIC_KEY`	Langfuse public key (optional)	—
`LANGFUSE_SECRET_KEY`	Langfuse secret key (optional)	—
`LANGFUSE_HOST`	Langfuse host	`https://cloud.langfuse.com`

Frontend

Variable	Description	Default
`NEXT_PUBLIC_API_URL`	Frontend → backend URL	`http://localhost:8000`

Deployment

Docker Compose (Development & Small Teams)

The recommended way to run RAGTY locally or for small teams:

docker compose up --build

This starts all services:

Service	Container	Port	Description
Backend	m8ty-rag-backend	8000	FastAPI application server
Worker	m8ty-rag-worker	—	ARQ async task processor (parsing, embedding)
Frontend	m8ty-rag-frontend	3000	Next.js web UI
Qdrant	qdrant	6333	Vector database (optional — remove to use pgvector)
PostgreSQL	postgres	5432	Relational database (also vector backend when Qdrant is not used)
Redis	redis	6379	Task queue + caching
MinIO	minio	9000/9001	Object storage (S3-compatible)

Stop:

docker compose down

Rebuild after code changes:

docker compose build backend frontend
docker compose up -d

Docker Networking

Services communicate via container names inside Docker:

Host URL	Docker Internal URL
`http://localhost:8300`	`http://host.docker.internal:8300`
`http://localhost:6333`	`http://qdrant:6333` (only if Qdrant is enabled)
`redis://localhost:6379`	`redis://redis:6379`
`http://localhost:9000`	`http://minio:9000`

Resource Requirements

Component	Min RAM	Recommended
Backend + Worker	2 GB	4 GB
DeepDoc parsing (PDF)	4 GB	8 GB
Qdrant (optional)	1 GB	4 GB (scales with document count)
PostgreSQL	256 MB	1 GB
Redis	128 MB	512 MB
Total (with Qdrant)	8 GB	16 GB
Total (pgvector only)	7 GB	13 GB

Kubernetes (Production)

RAGTY is fully Kubernetes-compatible. Each service runs as a separate deployment with its own scaling configuration.

Architecture on K8s

Ingress (NGINX/Traefik)
├── /        → Frontend Deployment (replicas: 2+)
├── /api/*   → Backend Deployment (replicas: 2+)
└── /mcp/    → Backend Deployment

Worker Deployment (replicas: 1-4, scaled by queue depth)
  └── GPU Worker (optional, concurrency=1 per GPU)

StatefulSets:
  ├── Qdrant (optional, persistent volume — omit when using pgvector)
  ├── PostgreSQL (or managed: RDS, CloudSQL)
  └── Redis (or managed: ElastiCache, Memorystore)

S3-compatible storage:
  └── AWS S3 / GCS / Azure Blob / MinIO

Helm Chart

The GitLab CI pipeline deploys via Helm:

helm upgrade m8ty-rag ./devops/helm/m8ty-rag \
  --install \
  --set backend.image=registry.example.com/m8ty-rag/backend:latest \
  --set frontend.image=registry.example.com/m8ty-rag/frontend:latest \
  --set imagePullSecrets[0].name=docker-pull-secret \
  -f values-production.yaml \
  -n production

Key Kubernetes Considerations

Topic	Recommendation
Scaling	Backend is stateless — scale horizontally. Worker scales based on queue depth.
GPU Workers	Use node selectors/tolerations for GPU nodes. Set `concurrency=1` per GPU to prevent VRAM OOM.
Storage	Use managed S3 (not MinIO) in production. If using Qdrant, it needs persistent volumes with SSD. If using pgvector, storage is handled by PostgreSQL.
Database	Use managed PostgreSQL (RDS, CloudSQL, Azure Database) for HA.
Redis	Use managed Redis (ElastiCache, Memorystore) for HA.
Secrets	Store `AUTH_JWT_SECRET`, `CREDENTIAL_ENCRYPTION_KEY`, S3 keys in Kubernetes Secrets or external vault.
Health checks	Backend: `GET /health`. Frontend: TCP port 3000. Worker: ARQ heartbeat.
Ingress	Route `/api/*` and `/mcp/` to backend, everything else to frontend.
TLS	Terminate at ingress level (cert-manager + Let's Encrypt recommended).

Production Environment Variables

In production, set APP_ENV=production to enable:

Fail-closed validation (missing secrets = startup failure)
Minimum 32-char JWT secret enforcement
Mandatory credential encryption key
No development defaults

Reverse Proxy (Sub-Path Deployment)

When the backend is served behind a reverse proxy at a sub-path (e.g. https://example.com/rag/ instead of root), set ROOT_PATH so FastAPI generates correct OpenAPI docs and redirect URLs:

ROOT_PATH=/rag

Nginx Example

location /rag/ {
    proxy_pass http://backend:8000/;
    proxy_set_header Host $host;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header X-Forwarded-Proto $scheme;
    proxy_set_header X-Forwarded-Prefix /rag;
}

location / {
    proxy_pass http://frontend:3000;
    proxy_set_header Host $host;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header X-Forwarded-Proto $scheme;
}

Kubernetes Ingress Example

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: m8ty-rag
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /$2
spec:
  rules:
    - host: example.com
      http:
        paths:
          - path: /rag(/|$)(.*)
            pathType: ImplementationSpecific
            backend:
              service:
                name: backend
                port:
                  number: 8000

When using ROOT_PATH, also update NEXT_PUBLIC_API_URL in the frontend to include the sub-path:

NEXT_PUBLIC_API_URL=https://example.com/rag

SSL / HTTPS

SSL must always be terminated at the reverse proxy or ingress layer. Never run Uvicorn or Next.js with TLS directly in production — it complicates certificate renewal and is not supported by the health check and workers.

Option 1: Nginx with Let's Encrypt (Docker Compose)

Install Certbot on the host, obtain a certificate, then configure Nginx:

server {
    listen 80;
    server_name example.com;
    return 301 https://$host$request_uri;
}

server {
    listen 443 ssl;
    server_name example.com;

    ssl_certificate     /etc/letsencrypt/live/example.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/example.com/privkey.pem;
    ssl_protocols       TLSv1.2 TLSv1.3;
    ssl_ciphers         HIGH:!aNULL:!MD5;

    # Frontend
    location / {
        proxy_pass http://frontend:3000;
        proxy_set_header Host $host;
        proxy_set_header X-Forwarded-Proto https;
        proxy_set_header X-Real-IP $remote_addr;
    }

    # Backend API
    location /api/ {
        proxy_pass http://backend:8000/;
        proxy_set_header Host $host;
        proxy_set_header X-Forwarded-Proto https;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    }
}

Option 2: Caddy (automatic HTTPS, Docker Compose)

Caddy obtains and renews Let's Encrypt certificates automatically — no manual certificate management needed:

# docker-compose.yml — add this service
caddy:
  image: caddy:2-alpine
  ports:
    - "80:80"
    - "443:443"
  volumes:
    - ./Caddyfile:/etc/caddy/Caddyfile
    - caddy_data:/data

# Caddyfile
example.com {
    handle /api/* {
        uri strip_prefix /api
        reverse_proxy backend:8000
    }
    handle {
        reverse_proxy frontend:3000
    }
}

Option 3: Kubernetes with cert-manager

kubectl apply -f https://github.com/cert-manager/cert-manager/releases/latest/download/cert-manager.yaml

# ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
  tls:
    - hosts: [example.com]
      secretName: m8ty-rag-tls
  rules:
    - host: example.com
      http:
        paths:
          - path: /api/
            pathType: Prefix
            backend:
              service: { name: backend, port: { number: 8000 } }
          - path: /
            pathType: Prefix
            backend:
              service: { name: frontend, port: { number: 3000 } }

Required environment variables for HTTPS

When running behind HTTPS, update these variables:

CORS_ORIGINS=https://example.com
FRONTEND_URL=https://example.com
NEXT_PUBLIC_API_URL=https://example.com/api

Workers are background processes that handle long-running, resource-intensive tasks asynchronously — keeping the API server fast and responsive.

Why Workers?

Document parsing can take seconds to minutes depending on file size and complexity. Without workers:

Upload API would block for minutes (timeout risk)
Users would see a frozen UI
A single large PDF would block all other requests

With workers, the flow is:

User uploads document → API returns immediately (HTTP 202)
                      → Task queued in Redis
                      → Worker picks up task in background
                      → Worker: parse → chunk → embed → store
                      → Document state updated to "DONE"

Worker Types

Worker	Task	Concurrency	Resource
CPU Worker	MarkItDown parsing, embedding API calls, metadata stamping	High (8+)	CPU only
GPU Worker	DeepDoc vision (layout detection, OCR, table structure)	1 per GPU	GPU VRAM

GPU workers are gated to concurrency=1 to prevent VRAM Out-of-Memory crashes. Multiple GPU workers can run on separate GPU devices.

Configuration

Variable	Description	Default
`WORKER_COUNT`	Number of worker processes	1
`WORKER_MAX_JOBS`	Max concurrent jobs per worker	1
`GPU_WORKER_CONCURRENCY`	Simultaneous GPU tasks	1

Scalability

RAGTY is designed for horizontal scaling at every layer:

Scaling Strategy

                    ┌──────────────────────────────────────┐
                    │          Load Balancer / Ingress      │
                    └─────────────────┬────────────────────┘
                                      │
              ┌───────────────────────┼───────────────────────┐
              │                       │                       │
     ┌────────┴────────┐    ┌────────┴────────┐    ┌────────┴────────┐
     │  Backend (N)     │    │  Backend (N)     │    │  Backend (N)     │
     │  Stateless API   │    │  Stateless API   │    │  Stateless API   │
     └─────────────────┘    └─────────────────┘    └─────────────────┘
              │                       │                       │
              └───────────────────────┼───────────────────────┘
                                      │
         ┌────────────────────────────┼────────────────────────────┐
         │                            │                            │
┌────────┴────────┐          ┌────────┴────────┐         ┌────────┴────────┐
│  Worker (N)      │          │  Vector Backend  │         │  PostgreSQL HA   │
│  Scale by queue  │          │  Qdrant or       │         │  Read replicas   │
└─────────────────┘          │  pgvector        │         └─────────────────┘
                              └─────────────────┘

Component Scaling

Component	Scaling Model	How
Backend API	Horizontal (stateless)	Add replicas behind load balancer. No session state — any instance handles any request.
Workers	Horizontal (queue-based)	Add worker replicas. Redis queue distributes tasks automatically. More workers = faster document processing.
Qdrant	Vertical + Sharding	Add RAM for larger collections. Shard across nodes for millions of vectors. Replicas for read throughput. Only relevant when `QDRANT_URL` is set.
pgvector	Scales with PostgreSQL	Suitable for smaller deployments. Uses IVFFlat indexes. Scales via PostgreSQL read replicas. Used when `QDRANT_URL` is empty.
PostgreSQL	Vertical + Read replicas	Primary for writes, replicas for read-heavy queries (dataset listing, user auth).
Redis	Vertical	Single instance handles thousands of queue operations/sec. Redis Cluster for extreme scale.
Object Storage (S3)	Infinite	Managed S3/GCS/Azure Blob scales automatically. No intervention needed.

Bottlenecks & Solutions

Bottleneck	Symptom	Solution
Parsing too slow	Documents stuck in "processing"	Add more workers
Search latency high	Slow query responses	Add Qdrant replicas (if using Qdrant), or switch from pgvector to Qdrant for large collections; enable caching
Embedding API slow	Parse tasks queue up	Use local embeddings (FastEmbed) or batch requests
Upload bursts	API timeouts	Scale backend replicas, increase Redis queue capacity
Large collections	Qdrant memory pressure	Shard Qdrant collection, use disk-backed indexes; or switch to pgvector for moderate scale

Multi-Tenancy & Isolation

Each tenant's data is isolated at every layer:

Qdrant (when enabled): Metadata filter tenant_id on every query (no cross-tenant data leakage)
pgvector (when Qdrant is not used): tenant_id filter applied via SQL WHERE clause on every query
PostgreSQL: All tables have tenant_id column with enforced filtering
S3: Artifacts stored under tenant-scoped prefixes (s3://bucket/{tenant_id}/...)
API: Every request goes through resolve_tenant_scope — fail-closed (no scope = no access)

Tracing & Health

Langfuse Tracing

Every query is traced with spans for each pipeline stage:

trace: user_query
├── span: query_condensation (if multi-turn)
├── span: embedding_generation
├── span: hybrid_search
│   ├── span: dense_search
│   └── span: sparse_bm25
├── span: reranking
└── span: llm_generation

Health Check

GET /health → {"status": "ok"}

14 June 2026