Help Instance Help

Search & Chat

Search combines two retrieval strategies fused with Reciprocal Rank Fusion (RRF):

  1. Dense (HNSW) — Semantic similarity via embeddings (understands meaning)

  2. Sparse (BM25) — Keyword matching for exact terms (handles acronyms, IDs)

Query Pipeline

RAG Query Pipeline

Optional Features

Feature

Description

When to use

Query condensation

Rewrites multi-turn conversations into standalone queries

Multi-turn chat

HyDE

Generates hypothetical answer, embeds that instead

Ambiguous queries

Reranking

Cross-encoder rescoring of top candidates

Higher precision needed

Parent resolution

Returns parent chunks for broader context

Need surrounding context

Metadata Filtering

Every search is automatically scoped by:

  • tenant_id — strict isolation between tenants

  • parse_generation — excludes stale chunks from re-parsed documents

  • available_int — respects soft-deleted documents

Streaming Chat

The RAG chat provides conversational access to knowledge bases with citation support.

OpenAI-Compatible Endpoint

POST /v1/chat/completions Authorization: Bearer ragty-YOUR_KEY Content-Type: application/json

Request

{ "messages": [ {"role": "user", "content": "What is the vacation policy?"} ], "stream": true, "temperature": 0.7, "max_tokens": 2048 }

Field

Type

Required

Description

messages

array

Yes

Conversation history (last must be user)

stream

boolean

No

Stream response via SSE (default: true)

model

string

No

Override the dialog's configured model

temperature

float

No

Override temperature (0.0–2.0)

max_tokens

integer

No

Override max output tokens

Response (SSE)

data: {"choices":[{"delta":{"content":"The vacation"},"index":0}]} data: {"choices":[{"delta":{"content":" policy allows"},"index":0}]} data: [DONE]

Citations

Search results include positional metadata for citation overlays:

Field

Description

document_id

Source document UUID

page_num

Page number in original document

layout_type

Block type: text, table, figure

position

Bounding box coordinates [x0, y0, x1, y1]

The frontend uses these to highlight the exact source location in a PDF viewer.

Dialogs

A dialog is a pre-configured chat profile that binds together:

  • An LLM model

  • A system prompt

  • One or more datasets to search

  • Search parameters (top_k, similarity threshold, reranking)

Dialogs can be shared externally via dialog API tokens for embedded chat widgets.

LLM Integration

All LLM calls go through LiteLLM proxy — no direct vendor SDK imports:

User Request → RAG Backend → LiteLLM Proxy → OpenAI / Anthropic / Ollama / etc.

Model identifiers: openai/gpt-4o, anthropic/claude-3-5-sonnet, ollama/llama3

Configure providers in Settings → Providers with API key and base URL.

12 June 2026