Search & Chat

Hybrid Search

Search combines two retrieval strategies fused with Reciprocal Rank Fusion (RRF):

Dense (HNSW) — Semantic similarity via embeddings (understands meaning)
Sparse (BM25) — Keyword matching for exact terms (handles acronyms, IDs)

Query Pipeline

Optional Features

Feature	Description	When to use
Query condensation	Rewrites multi-turn conversations into standalone queries	Multi-turn chat
HyDE	Generates hypothetical answer, embeds that instead	Ambiguous queries
Reranking	Cross-encoder rescoring of top candidates	Higher precision needed
Parent resolution	Returns parent chunks for broader context	Need surrounding context

Metadata Filtering

Every search is automatically scoped by:

tenant_id — strict isolation between tenants
parse_generation — excludes stale chunks from re-parsed documents
available_int — respects soft-deleted documents

Streaming Chat

The RAG chat provides conversational access to knowledge bases with citation support.

OpenAI-Compatible Endpoint

POST /v1/chat/completions
Authorization: Bearer ragty-YOUR_KEY
Content-Type: application/json

Request

{
  "messages": [
    {"role": "user", "content": "What is the vacation policy?"}
  ],
  "stream": true,
  "temperature": 0.7,
  "max_tokens": 2048
}

Field	Type	Required	Description
`messages`	array	Yes	Conversation history (last must be `user`)
`stream`	boolean	No	Stream response via SSE (default: true)
`model`	string	No	Override the dialog's configured model
`temperature`	float	No	Override temperature (0.0–2.0)
`max_tokens`	integer	No	Override max output tokens

Response (SSE)

data: {"choices":[{"delta":{"content":"The vacation"},"index":0}]}

data: {"choices":[{"delta":{"content":" policy allows"},"index":0}]}

data: [DONE]

Citations

Search results include positional metadata for citation overlays:

Field	Description
`document_id`	Source document UUID
`page_num`	Page number in original document
`layout_type`	Block type: `text`, `table`, `figure`
`position`	Bounding box coordinates `[x0, y0, x1, y1]`

The frontend uses these to highlight the exact source location in a PDF viewer.

Dialogs

A dialog is a pre-configured chat profile that binds together:

An LLM model
A system prompt
One or more datasets to search
Search parameters (top_k, similarity threshold, reranking)

Dialogs can be shared externally via dialog API tokens for embedded chat widgets.

LLM Integration

All LLM calls go through LiteLLM proxy — no direct vendor SDK imports:

User Request → RAG Backend → LiteLLM Proxy → OpenAI / Anthropic / Ollama / etc.

Model identifiers: openai/gpt-4o, anthropic/claude-3-5-sonnet, ollama/llama3

Configure providers in Settings → Providers with API key and base URL.

12 June 2026