Configuration
Backend configuration is read from backend/.env or environment variables. APP_ENV=production enables fail-closed startup validation for JWT, credential encryption, and S3 secrets.
Security
Production deployments must override
AUTH_JWT_SECRET,CREDENTIAL_ENCRYPTION_KEY,S3_ACCESS_KEY, andS3_SECRET_KEYProvider API keys are encrypted at rest (AES-256)
Personal API keys are stored as HMAC-SHA256 hashes (non-reversible)
Frontend backend access is configured with
NEXT_PUBLIC_API_URLBackend CORS origins are configured with
CORS_ORIGINS
Environment Variables
Application
Variable | Description | Default |
|---|---|---|
| Environment ( |
|
| Logging level ( |
|
| API root path for reverse-proxy sub-path deployments (e.g. | — |
| Comma-separated allowed CORS origins |
|
| Public frontend URL (used in invite emails) |
|
Authentication
Variable | Description | Default |
|---|---|---|
| Auth mode: |
|
| JWT signing secret (min 32 chars in prod) | required |
| OIDC provider issuer URL | — |
| OIDC token audience |
|
| OIDC claim for tenant ID auto-selection | — |
Infrastructure
Variable | Description | Default |
|---|---|---|
| PostgreSQL connection string (must include pgvector extension if Qdrant is not used) | required |
| Qdrant vector database URL. If empty, pgvector (PostgreSQL) is used as vector backend | — |
| Qdrant collection name |
|
| Redis URL (task queue, caching) | required |
| S3-compatible storage endpoint | required |
| S3 access key | required |
| S3 secret key | required |
| S3 bucket name for document artifacts (auto-created on startup) |
|
| AES key for provider credential encryption | required |
Vector Backend
The platform supports two vector storage backends, selected automatically:
Backend | Condition | Requirement |
|---|---|---|
Qdrant |
| Qdrant server running |
pgvector |
| PostgreSQL with pgvector extension (use |
Both backends provide identical search behavior (dense cosine similarity, named vector spaces). The /health endpoint reports the active backend: {"status": "ok", "vector_backend": "qdrant"} or "pgvector".
Embedding & LLM
Variable | Description | Default |
|---|---|---|
| Default embedding provider |
|
| OpenAI API key for embeddings | — |
| Ollama endpoint |
|
| LiteLLM proxy URL |
|
| LiteLLM API key | — |
Email (SMTP)
Variable | Description | Default |
|---|---|---|
| Enable SMTP email sending |
|
| SMTP server hostname | — |
| SMTP server port |
|
| SMTP authentication username | — |
| SMTP authentication password | — |
| Sender email address | — |
| Use TLS for SMTP |
|
Observability
Variable | Description | Default |
|---|---|---|
| Langfuse public key (optional) | — |
| Langfuse secret key (optional) | — |
| Langfuse host |
|
Frontend
Variable | Description | Default |
|---|---|---|
| Frontend → backend URL |
|
Deployment
Docker Compose (Development & Small Teams)
The recommended way to run RAGTY locally or for small teams:
This starts all services:
Service | Container | Port | Description |
|---|---|---|---|
Backend | m8ty-rag-backend | 8000 | FastAPI application server |
Worker | m8ty-rag-worker | — | ARQ async task processor (parsing, embedding) |
Frontend | m8ty-rag-frontend | 3000 | Next.js web UI |
Qdrant | qdrant | 6333 | Vector database (optional — remove to use pgvector) |
PostgreSQL | postgres | 5432 | Relational database (also vector backend when Qdrant is not used) |
Redis | redis | 6379 | Task queue + caching |
MinIO | minio | 9000/9001 | Object storage (S3-compatible) |
Stop:
Rebuild after code changes:
Docker Networking
Services communicate via container names inside Docker:
Host URL | Docker Internal URL |
|---|---|
|
|
|
|
|
|
|
|
Resource Requirements
Component | Min RAM | Recommended |
|---|---|---|
Backend + Worker | 2 GB | 4 GB |
DeepDoc parsing (PDF) | 4 GB | 8 GB |
Qdrant (optional) | 1 GB | 4 GB (scales with document count) |
PostgreSQL | 256 MB | 1 GB |
Redis | 128 MB | 512 MB |
Total (with Qdrant) | 8 GB | 16 GB |
Total (pgvector only) | 7 GB | 13 GB |
Kubernetes (Production)
RAGTY is fully Kubernetes-compatible. Each service runs as a separate deployment with its own scaling configuration.
Architecture on K8s
Helm Chart
The GitLab CI pipeline deploys via Helm:
Key Kubernetes Considerations
Topic | Recommendation |
|---|---|
Scaling | Backend is stateless — scale horizontally. Worker scales based on queue depth. |
GPU Workers | Use node selectors/tolerations for GPU nodes. Set |
Storage | Use managed S3 (not MinIO) in production. If using Qdrant, it needs persistent volumes with SSD. If using pgvector, storage is handled by PostgreSQL. |
Database | Use managed PostgreSQL (RDS, CloudSQL, Azure Database) for HA. |
Redis | Use managed Redis (ElastiCache, Memorystore) for HA. |
Secrets | Store |
Health checks | Backend: |
Ingress | Route |
TLS | Terminate at ingress level (cert-manager + Let's Encrypt recommended). |
Production Environment Variables
In production, set APP_ENV=production to enable:
Fail-closed validation (missing secrets = startup failure)
Minimum 32-char JWT secret enforcement
Mandatory credential encryption key
No development defaults
Reverse Proxy (Sub-Path Deployment)
When the backend is served behind a reverse proxy at a sub-path (e.g. https://example.com/rag/ instead of root), set ROOT_PATH so FastAPI generates correct OpenAPI docs and redirect URLs:
Nginx Example
Kubernetes Ingress Example
When using ROOT_PATH, also update NEXT_PUBLIC_API_URL in the frontend to include the sub-path:
SSL / HTTPS
SSL must always be terminated at the reverse proxy or ingress layer. Never run Uvicorn or Next.js with TLS directly in production — it complicates certificate renewal and is not supported by the health check and workers.
Option 1: Nginx with Let's Encrypt (Docker Compose)
Install Certbot on the host, obtain a certificate, then configure Nginx:
Option 2: Caddy (automatic HTTPS, Docker Compose)
Caddy obtains and renews Let's Encrypt certificates automatically — no manual certificate management needed:
Option 3: Kubernetes with cert-manager
Required environment variables for HTTPS
When running behind HTTPS, update these variables:
Workers are background processes that handle long-running, resource-intensive tasks asynchronously — keeping the API server fast and responsive.
Why Workers?
Document parsing can take seconds to minutes depending on file size and complexity. Without workers:
Upload API would block for minutes (timeout risk)
Users would see a frozen UI
A single large PDF would block all other requests
With workers, the flow is:
Worker Types
Worker | Task | Concurrency | Resource |
|---|---|---|---|
CPU Worker | MarkItDown parsing, embedding API calls, metadata stamping | High (8+) | CPU only |
GPU Worker | DeepDoc vision (layout detection, OCR, table structure) | 1 per GPU | GPU VRAM |
GPU workers are gated to concurrency=1 to prevent VRAM Out-of-Memory crashes. Multiple GPU workers can run on separate GPU devices.
Configuration
Variable | Description | Default |
|---|---|---|
| Number of worker processes | 1 |
| Max concurrent jobs per worker | 1 |
| Simultaneous GPU tasks | 1 |
Scalability
RAGTY is designed for horizontal scaling at every layer:
Scaling Strategy
Component Scaling
Component | Scaling Model | How |
|---|---|---|
Backend API | Horizontal (stateless) | Add replicas behind load balancer. No session state — any instance handles any request. |
Workers | Horizontal (queue-based) | Add worker replicas. Redis queue distributes tasks automatically. More workers = faster document processing. |
Qdrant | Vertical + Sharding | Add RAM for larger collections. Shard across nodes for millions of vectors. Replicas for read throughput. Only relevant when |
pgvector | Scales with PostgreSQL | Suitable for smaller deployments. Uses IVFFlat indexes. Scales via PostgreSQL read replicas. Used when |
PostgreSQL | Vertical + Read replicas | Primary for writes, replicas for read-heavy queries (dataset listing, user auth). |
Redis | Vertical | Single instance handles thousands of queue operations/sec. Redis Cluster for extreme scale. |
Object Storage (S3) | Infinite | Managed S3/GCS/Azure Blob scales automatically. No intervention needed. |
Bottlenecks & Solutions
Bottleneck | Symptom | Solution |
|---|---|---|
Parsing too slow | Documents stuck in "processing" | Add more workers |
Search latency high | Slow query responses | Add Qdrant replicas (if using Qdrant), or switch from pgvector to Qdrant for large collections; enable caching |
Embedding API slow | Parse tasks queue up | Use local embeddings (FastEmbed) or batch requests |
Upload bursts | API timeouts | Scale backend replicas, increase Redis queue capacity |
Large collections | Qdrant memory pressure | Shard Qdrant collection, use disk-backed indexes; or switch to pgvector for moderate scale |
Multi-Tenancy & Isolation
Each tenant's data is isolated at every layer:
Qdrant (when enabled): Metadata filter
tenant_idon every query (no cross-tenant data leakage)pgvector (when Qdrant is not used):
tenant_idfilter applied via SQLWHEREclause on every queryPostgreSQL: All tables have
tenant_idcolumn with enforced filteringS3: Artifacts stored under tenant-scoped prefixes (
s3://bucket/{tenant_id}/...)API: Every request goes through
resolve_tenant_scope— fail-closed (no scope = no access)
Tracing & Health
Langfuse Tracing
Every query is traced with spans for each pipeline stage: