Skip to content

Query System

Overview

The query system provides graph-aware retrieval-augmented generation (RAG) with multiple query modes, configurable search depth, entity disambiguation, multi-hop traversal, and SSE streaming responses.

Query Modes

Mode Strategy Best For
local Entity neighborhood search Specific entities, direct associations
global Community summary-based retrieval Broad overviews, landscape questions
hybrid Merges local + global with deduplication Complex questions needing both breadth and depth
naive Direct LLM answer, no graph retrieval Simple factual questions
(omit) Auto-classification based on query content Default — system picks the best mode
curl -X POST "http://localhost:8000/v1/sessions/{session_id}/query" \
  -H "Authorization: Bearer $API_AUTH_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"question": "What proteins are associated with cardiovascular disease?", "mode": "hybrid"}'

Search Depth

Controls how aggressively the two-phase agent gathers data:

Depth Discovery Expansion Cypher Fallback Best For
fast Quick lookups, low latency
balanced 1 round If < 3 targets Default — good coverage
deep Up to 3 rounds Always Thorough research, complex questions
{"question": "...", "search_depth": "balanced"}

Multi-Hop Traversal

Discover indirect connections through intermediate nodes:

{"question": "What pathways connect BRCA1 to breast cancer?", "hop_depth": 2}
  • hop_depth=1 — direct relationships only (~100ms)
  • hop_depth=2 — one intermediate node (~300ms)
  • hop_depth=3 — two intermediate nodes (~1s)

Includes timeout protection: 5s primary query, 2s fallback with reduced seed set.

See Multi-Hop Traversal and Timeout Protection.

SSE Streaming

Request real-time token-by-token streaming:

curl -N "http://localhost:8000/v1/sessions/{session_id}/query" \
  -H "Authorization: Bearer $API_AUTH_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"question": "Overview of protein biomarkers in kidney disease", "stream": true}'

Event sequence: retrieval_startedretrieval_completesynthesis_startedtoken (repeated) → complete

Entity Disambiguation

Queries are automatically preprocessed to resolve entity mentions to canonical IDs:

  • "p53" → resolves to P04637 (TP53) with synonyms
  • "tumor necrosis factor" → resolves to TNF with UniProt ID

The expanded query includes synonyms for better retrieval.

Query Agents

Four agent implementations with different execution strategies:

Agent Status Approach
TwoPhaseAgent Active (default) Deterministic tool execution → LLM synthesis
DynamicQueryAgent Active 6-strategy parallel search + CrossEncoder re-ranking
Neo4jQueryAgent Active Extends Dynamic with Neo4j-specific retrievers
LangGraphAgent Shelved ReAct loop (shelved due to hallucination issues)

See Agent Architecture for detailed documentation.

Observation Memory

The Observer/Reflector pattern compresses session context for long conversations:

  • Observer: Watches query-response pairs, extracts key facts
  • Reflector: Periodically compresses observations into a summary
  • Configured via REFLECTION_THRESHOLD and MAX_CONTEXT_TOKENS env vars

Precomputed Query Cache

20 high-value queries are precomputed and cached in Redis for instant responses:

# Warm the cache
uv run python scripts/precompute_queries.py --service local --database neo4j

# Check cache stats
curl "http://localhost:8000/v1/cache/stats" -H "Authorization: Bearer $API_AUTH_TOKEN"

See Precomputed Query Cache.