RAG Server (RAGTY)

Multi-tenant Retrieval-Augmented Generation platform with vision-based document parsing, hybrid vector search, and AI-powered chat with citations.

What is RAG?

Retrieval-Augmented Generation (RAG) is a technique that enhances Large Language Models (LLMs) by giving them access to your private data at query time — without retraining the model.

The Problem RAG Solves

LLMs like GPT-4 or Claude are powerful but have critical limitations:

Knowledge cutoff — they don't know about your internal documents, policies, or recent data
Hallucination — without grounding, they invent plausible-sounding but wrong answers
No access control — public LLMs can't respect your organization's data permissions
No citations — you can't verify where an answer came from

How RAG Works

Instead of relying solely on the LLM's training data, RAG retrieves relevant passages from your knowledge base and augments the LLM's prompt with them before generating a response.

Real-World Examples

Scenario	Without RAG	With RAG
"What is our vacation policy?"	LLM guesses generic policies	Retrieves HR handbook → exact policy with page reference
"How do I configure the payment gateway?"	Outdated or wrong instructions	Retrieves latest internal docs → correct config steps
"What did we agree in the Q3 planning?"	"I don't have access to that"	Retrieves meeting notes → summary with citations
"Show me compliance requirements for DACH"	Generic EU regulation info	Retrieves your compliance docs → specific requirements

Why a Dedicated RAG Server?

Building RAG properly requires solving many hard problems:

Document parsing — PDFs with tables, scanned pages, complex layouts need vision-based parsing (not just text extraction)
Chunking — documents must be split into meaningful pieces that preserve context
Hybrid search — combining semantic search (understands meaning) with keyword search (finds exact terms like product IDs)
Multi-tenancy — each team/customer sees only their own documents
Scalability — handling thousands of documents and concurrent users
Observability — knowing why a particular answer was generated

The RAGTY server solves all of these as a production-ready platform.

Architecture

Tech Stack

Component	Technology
Backend	Python 3.12, FastAPI 0.115, Pydantic 2.11
Frontend	Next.js 15, React 19, TypeScript 5.8
Vector Database	Qdrant or pgvector (dense cosine similarity)
Cache/Queue	Redis 7 (ARQ task queue)
Object Storage	MinIO/S3
Relational DB	PostgreSQL 16
Document Parsing	DeepDoc (vision-based), MarkItDown
Embeddings	OpenAI, FastEmbed, Ollama
LLM	LiteLLM proxy (OpenAI, Anthropic, local models)
Observability	Langfuse tracing, Ragas quality metrics

Prerequisites

uv (Python package manager)
Node.js 22+
Docker & Docker Compose

Quick Start

docker compose up --build

Service	URL
Frontend	http://localhost:3000
Backend	http://localhost:8000
MCP	http://localhost:8000/mcp/
Qdrant	http://localhost:6333 (optional)
Redis	localhost:6379
PostgreSQL	localhost:5432
MinIO	http://localhost:9000

Local Development

Backend

cd backend
uv sync
uv run uvicorn app.main:app --reload

Frontend

cd frontend
npm install
npm run dev

Project Structure

m8ty-rag/
├── backend/
│   ├── app/
│   │   ├── main.py              # FastAPI composition root
│   │   ├── auth/                # Authentication bounded context
│   │   ├── chat/                # Dialog/chat bounded context
│   │   ├── embedding/           # Embedding bounded context
│   │   ├── ingestion/           # Dataset/document ingestion
│   │   ├── search/              # Hybrid search bounded context
│   │   ├── mcp/                 # MCP server (tools for AI agents)
│   │   ├── tenant/              # Multi-tenancy bounded context
│   │   └── shared/              # Config, dependencies, security
│   ├── pyproject.toml
│   └── Dockerfile
├── frontend/
│   ├── src/app/                 # Next.js App Router pages
│   ├── src/components/          # Reusable UI components
│   ├── src/lib/                 # API client, config
│   └── Dockerfile
└── docker-compose.yml

Topics

Multi-Tenancy — Roles, permissions, tenant isolation
MCP Server — AI agent integration, tools, client configuration
API Keys — Personal and dataset-scoped key management
Document Ingestion — Parsing, chunking, embedding pipeline
Search & Chat — Hybrid retrieval, reranking, streaming chat
Configuration — Environment variables, security, deployment

12 June 2026