Help Instance Help

Configuration

Backend configuration is read from backend/.env or environment variables. APP_ENV=production enables fail-closed startup validation for JWT, credential encryption, and S3 secrets.

Security

  • Production deployments must override AUTH_JWT_SECRET, CREDENTIAL_ENCRYPTION_KEY, S3_ACCESS_KEY, and S3_SECRET_KEY

  • Provider API keys are encrypted at rest (AES-256)

  • Personal API keys are stored as HMAC-SHA256 hashes (non-reversible)

  • Frontend backend access is configured with NEXT_PUBLIC_API_URL

  • Backend CORS origins are configured with CORS_ORIGINS

Environment Variables

Application

Variable

Description

Default

APP_ENV

Environment (development/production)

development

LOG_LEVEL

Logging level (DEBUG, INFO, WARNING, ERROR)

INFO

ROOT_PATH

API root path for reverse-proxy sub-path deployments (e.g. /rag)

CORS_ORIGINS

Comma-separated allowed CORS origins

http://localhost:3000

FRONTEND_URL

Public frontend URL (used in invite emails)

http://localhost:3000

Authentication

Variable

Description

Default

AUTH_MODE

Auth mode: local, oidc, or both

local

AUTH_JWT_SECRET

JWT signing secret (min 32 chars in prod)

required

OIDC_ISSUER_URL

OIDC provider issuer URL

OIDC_AUDIENCE

OIDC token audience

m8ty-rag

OIDC_TENANT_CLAIM

OIDC claim for tenant ID auto-selection

Infrastructure

Variable

Description

Default

DATABASE_URL

PostgreSQL connection string (must include pgvector extension if Qdrant is not used)

required

QDRANT_URL

Qdrant vector database URL. If empty, pgvector (PostgreSQL) is used as vector backend

QDRANT_COLLECTION_NAME

Qdrant collection name

documents

REDIS_URL

Redis URL (task queue, caching)

required

S3_ENDPOINT

S3-compatible storage endpoint

required

S3_ACCESS_KEY

S3 access key

required

S3_SECRET_KEY

S3 secret key

required

S3_ARTIFACTS_BUCKET

S3 bucket name for document artifacts (auto-created on startup)

m8ty-artifacts

CREDENTIAL_ENCRYPTION_KEY

AES key for provider credential encryption

required

Vector Backend

The platform supports two vector storage backends, selected automatically:

Backend

Condition

Requirement

Qdrant

QDRANT_URL is set

Qdrant server running

pgvector

QDRANT_URL is empty or unset

PostgreSQL with pgvector extension (use pgvector/pgvector:pg16 Docker image)

Both backends provide identical search behavior (dense cosine similarity, named vector spaces). The /health endpoint reports the active backend: {"status": "ok", "vector_backend": "qdrant"} or "pgvector".

Embedding & LLM

Variable

Description

Default

DEFAULT_EMBEDDING_PROVIDER

Default embedding provider

openai

OPENAI_API_KEY

OpenAI API key for embeddings

OLLAMA_URL

Ollama endpoint

http://localhost:11434

LITELLM_URL

LiteLLM proxy URL

http://localhost:4000

LITELLM_API_KEY

LiteLLM API key

Email (SMTP)

Variable

Description

Default

SMTP_ENABLED

Enable SMTP email sending

false

SMTP_HOST

SMTP server hostname

SMTP_PORT

SMTP server port

587

SMTP_USER

SMTP authentication username

SMTP_PASSWORD

SMTP authentication password

SMTP_FROM

Sender email address

SMTP_TLS

Use TLS for SMTP

true

Observability

Variable

Description

Default

LANGFUSE_PUBLIC_KEY

Langfuse public key (optional)

LANGFUSE_SECRET_KEY

Langfuse secret key (optional)

LANGFUSE_HOST

Langfuse host

https://cloud.langfuse.com

Frontend

Variable

Description

Default

NEXT_PUBLIC_API_URL

Frontend → backend URL

http://localhost:8000

Deployment

Docker Compose (Development & Small Teams)

The recommended way to run RAGTY locally or for small teams:

docker compose up --build

This starts all services:

Service

Container

Port

Description

Backend

m8ty-rag-backend

8000

FastAPI application server

Worker

m8ty-rag-worker

ARQ async task processor (parsing, embedding)

Frontend

m8ty-rag-frontend

3000

Next.js web UI

Qdrant

qdrant

6333

Vector database (optional — remove to use pgvector)

PostgreSQL

postgres

5432

Relational database (also vector backend when Qdrant is not used)

Redis

redis

6379

Task queue + caching

MinIO

minio

9000/9001

Object storage (S3-compatible)

Stop:

docker compose down

Rebuild after code changes:

docker compose build backend frontend docker compose up -d

Docker Networking

Services communicate via container names inside Docker:

Host URL

Docker Internal URL

http://localhost:8300

http://host.docker.internal:8300

http://localhost:6333

http://qdrant:6333 (only if Qdrant is enabled)

redis://localhost:6379

redis://redis:6379

http://localhost:9000

http://minio:9000

Resource Requirements

Component

Min RAM

Recommended

Backend + Worker

2 GB

4 GB

DeepDoc parsing (PDF)

4 GB

8 GB

Qdrant (optional)

1 GB

4 GB (scales with document count)

PostgreSQL

256 MB

1 GB

Redis

128 MB

512 MB

Total (with Qdrant)

8 GB

16 GB

Total (pgvector only)

7 GB

13 GB

Kubernetes (Production)

RAGTY is fully Kubernetes-compatible. Each service runs as a separate deployment with its own scaling configuration.

Architecture on K8s

Ingress (NGINX/Traefik) ├── / → Frontend Deployment (replicas: 2+) ├── /api/* → Backend Deployment (replicas: 2+) └── /mcp/ → Backend Deployment Worker Deployment (replicas: 1-4, scaled by queue depth) └── GPU Worker (optional, concurrency=1 per GPU) StatefulSets: ├── Qdrant (optional, persistent volume — omit when using pgvector) ├── PostgreSQL (or managed: RDS, CloudSQL) └── Redis (or managed: ElastiCache, Memorystore) S3-compatible storage: └── AWS S3 / GCS / Azure Blob / MinIO

Helm Chart

The GitLab CI pipeline deploys via Helm:

helm upgrade m8ty-rag ./devops/helm/m8ty-rag \ --install \ --set backend.image=registry.example.com/m8ty-rag/backend:latest \ --set frontend.image=registry.example.com/m8ty-rag/frontend:latest \ --set imagePullSecrets[0].name=docker-pull-secret \ -f values-production.yaml \ -n production

Key Kubernetes Considerations

Topic

Recommendation

Scaling

Backend is stateless — scale horizontally. Worker scales based on queue depth.

GPU Workers

Use node selectors/tolerations for GPU nodes. Set concurrency=1 per GPU to prevent VRAM OOM.

Storage

Use managed S3 (not MinIO) in production. If using Qdrant, it needs persistent volumes with SSD. If using pgvector, storage is handled by PostgreSQL.

Database

Use managed PostgreSQL (RDS, CloudSQL, Azure Database) for HA.

Redis

Use managed Redis (ElastiCache, Memorystore) for HA.

Secrets

Store AUTH_JWT_SECRET, CREDENTIAL_ENCRYPTION_KEY, S3 keys in Kubernetes Secrets or external vault.

Health checks

Backend: GET /health. Frontend: TCP port 3000. Worker: ARQ heartbeat.

Ingress

Route /api/* and /mcp/ to backend, everything else to frontend.

TLS

Terminate at ingress level (cert-manager + Let's Encrypt recommended).

Production Environment Variables

In production, set APP_ENV=production to enable:

  • Fail-closed validation (missing secrets = startup failure)

  • Minimum 32-char JWT secret enforcement

  • Mandatory credential encryption key

  • No development defaults

Reverse Proxy (Sub-Path Deployment)

When the backend is served behind a reverse proxy at a sub-path (e.g. https://example.com/rag/ instead of root), set ROOT_PATH so FastAPI generates correct OpenAPI docs and redirect URLs:

ROOT_PATH=/rag

Nginx Example

location /rag/ { proxy_pass http://backend:8000/; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Proto $scheme; proxy_set_header X-Forwarded-Prefix /rag; } location / { proxy_pass http://frontend:3000; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Proto $scheme; }

Kubernetes Ingress Example

apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: m8ty-rag annotations: nginx.ingress.kubernetes.io/rewrite-target: /$2 spec: rules: - host: example.com http: paths: - path: /rag(/|$)(.*) pathType: ImplementationSpecific backend: service: name: backend port: number: 8000

When using ROOT_PATH, also update NEXT_PUBLIC_API_URL in the frontend to include the sub-path:

NEXT_PUBLIC_API_URL=https://example.com/rag

SSL / HTTPS

SSL must always be terminated at the reverse proxy or ingress layer. Never run Uvicorn or Next.js with TLS directly in production — it complicates certificate renewal and is not supported by the health check and workers.

Option 1: Nginx with Let's Encrypt (Docker Compose)

Install Certbot on the host, obtain a certificate, then configure Nginx:

server { listen 80; server_name example.com; return 301 https://$host$request_uri; } server { listen 443 ssl; server_name example.com; ssl_certificate /etc/letsencrypt/live/example.com/fullchain.pem; ssl_certificate_key /etc/letsencrypt/live/example.com/privkey.pem; ssl_protocols TLSv1.2 TLSv1.3; ssl_ciphers HIGH:!aNULL:!MD5; # Frontend location / { proxy_pass http://frontend:3000; proxy_set_header Host $host; proxy_set_header X-Forwarded-Proto https; proxy_set_header X-Real-IP $remote_addr; } # Backend API location /api/ { proxy_pass http://backend:8000/; proxy_set_header Host $host; proxy_set_header X-Forwarded-Proto https; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; } }

Option 2: Caddy (automatic HTTPS, Docker Compose)

Caddy obtains and renews Let's Encrypt certificates automatically — no manual certificate management needed:

# docker-compose.yml — add this service caddy: image: caddy:2-alpine ports: - "80:80" - "443:443" volumes: - ./Caddyfile:/etc/caddy/Caddyfile - caddy_data:/data
# Caddyfile example.com { handle /api/* { uri strip_prefix /api reverse_proxy backend:8000 } handle { reverse_proxy frontend:3000 } }

Option 3: Kubernetes with cert-manager

kubectl apply -f https://github.com/cert-manager/cert-manager/releases/latest/download/cert-manager.yaml
# ingress.yaml apiVersion: networking.k8s.io/v1 kind: Ingress metadata: annotations: cert-manager.io/cluster-issuer: letsencrypt-prod spec: tls: - hosts: [example.com] secretName: m8ty-rag-tls rules: - host: example.com http: paths: - path: /api/ pathType: Prefix backend: service: { name: backend, port: { number: 8000 } } - path: / pathType: Prefix backend: service: { name: frontend, port: { number: 3000 } }

Required environment variables for HTTPS

When running behind HTTPS, update these variables:

CORS_ORIGINS=https://example.com FRONTEND_URL=https://example.com NEXT_PUBLIC_API_URL=https://example.com/api

Workers are background processes that handle long-running, resource-intensive tasks asynchronously — keeping the API server fast and responsive.

Why Workers?

Document parsing can take seconds to minutes depending on file size and complexity. Without workers:

  • Upload API would block for minutes (timeout risk)

  • Users would see a frozen UI

  • A single large PDF would block all other requests

With workers, the flow is:

User uploads document → API returns immediately (HTTP 202) → Task queued in Redis → Worker picks up task in background → Worker: parse → chunk → embed → store → Document state updated to "DONE"

Worker Types

Worker

Task

Concurrency

Resource

CPU Worker

MarkItDown parsing, embedding API calls, metadata stamping

High (8+)

CPU only

GPU Worker

DeepDoc vision (layout detection, OCR, table structure)

1 per GPU

GPU VRAM

GPU workers are gated to concurrency=1 to prevent VRAM Out-of-Memory crashes. Multiple GPU workers can run on separate GPU devices.

Configuration

Variable

Description

Default

WORKER_COUNT

Number of worker processes

1

WORKER_MAX_JOBS

Max concurrent jobs per worker

1

GPU_WORKER_CONCURRENCY

Simultaneous GPU tasks

1

Scalability

RAGTY is designed for horizontal scaling at every layer:

Scaling Strategy

┌──────────────────────────────────────┐ │ Load Balancer / Ingress │ └─────────────────┬────────────────────┘ │ ┌───────────────────────┼───────────────────────┐ │ │ │ ┌────────┴────────┐ ┌────────┴────────┐ ┌────────┴────────┐ │ Backend (N) │ │ Backend (N) │ │ Backend (N) │ │ Stateless API │ │ Stateless API │ │ Stateless API │ └─────────────────┘ └─────────────────┘ └─────────────────┘ │ │ │ └───────────────────────┼───────────────────────┘ │ ┌────────────────────────────┼────────────────────────────┐ │ │ │ ┌────────┴────────┐ ┌────────┴────────┐ ┌────────┴────────┐ │ Worker (N) │ │ Vector Backend │ │ PostgreSQL HA │ │ Scale by queue │ │ Qdrant or │ │ Read replicas │ └─────────────────┘ │ pgvector │ └─────────────────┘ └─────────────────┘

Component Scaling

Component

Scaling Model

How

Backend API

Horizontal (stateless)

Add replicas behind load balancer. No session state — any instance handles any request.

Workers

Horizontal (queue-based)

Add worker replicas. Redis queue distributes tasks automatically. More workers = faster document processing.

Qdrant

Vertical + Sharding

Add RAM for larger collections. Shard across nodes for millions of vectors. Replicas for read throughput. Only relevant when QDRANT_URL is set.

pgvector

Scales with PostgreSQL

Suitable for smaller deployments. Uses IVFFlat indexes. Scales via PostgreSQL read replicas. Used when QDRANT_URL is empty.

PostgreSQL

Vertical + Read replicas

Primary for writes, replicas for read-heavy queries (dataset listing, user auth).

Redis

Vertical

Single instance handles thousands of queue operations/sec. Redis Cluster for extreme scale.

Object Storage (S3)

Infinite

Managed S3/GCS/Azure Blob scales automatically. No intervention needed.

Bottlenecks & Solutions

Bottleneck

Symptom

Solution

Parsing too slow

Documents stuck in "processing"

Add more workers

Search latency high

Slow query responses

Add Qdrant replicas (if using Qdrant), or switch from pgvector to Qdrant for large collections; enable caching

Embedding API slow

Parse tasks queue up

Use local embeddings (FastEmbed) or batch requests

Upload bursts

API timeouts

Scale backend replicas, increase Redis queue capacity

Large collections

Qdrant memory pressure

Shard Qdrant collection, use disk-backed indexes; or switch to pgvector for moderate scale

Multi-Tenancy & Isolation

Each tenant's data is isolated at every layer:

  • Qdrant (when enabled): Metadata filter tenant_id on every query (no cross-tenant data leakage)

  • pgvector (when Qdrant is not used): tenant_id filter applied via SQL WHERE clause on every query

  • PostgreSQL: All tables have tenant_id column with enforced filtering

  • S3: Artifacts stored under tenant-scoped prefixes (s3://bucket/{tenant_id}/...)

  • API: Every request goes through resolve_tenant_scope — fail-closed (no scope = no access)

Tracing & Health

Langfuse Tracing

Every query is traced with spans for each pipeline stage:

trace: user_query ├── span: query_condensation (if multi-turn) ├── span: embedding_generation ├── span: hybrid_search │ ├── span: dense_search │ └── span: sparse_bm25 ├── span: reranking └── span: llm_generation

Health Check

GET /health → {"status": "ok"}
14 June 2026