Query System¶

Overview¶

The query system provides graph-aware retrieval-augmented generation (RAG) with multiple query modes, configurable search depth, entity disambiguation, multi-hop traversal, and SSE streaming responses.

Query Modes¶

Mode	Strategy	Best For
`local`	Entity neighborhood search	Specific entities, direct associations
`global`	Community summary-based retrieval	Broad overviews, landscape questions
`hybrid`	Merges local + global with deduplication	Complex questions needing both breadth and depth
`naive`	Direct LLM answer, no graph retrieval	Simple factual questions
(omit)	Auto-classification based on query content	Default — system picks the best mode

curl -X POST "http://localhost:8000/v1/sessions/{session_id}/query" \
  -H "Authorization: Bearer $API_AUTH_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"question": "What proteins are associated with cardiovascular disease?", "mode": "hybrid"}'

Search Depth¶

Controls how aggressively the two-phase agent gathers data:

Depth	Discovery	Expansion	Cypher Fallback	Best For
`fast`	✅	❌	❌	Quick lookups, low latency
`balanced`	✅	1 round	If < 3 targets	Default — good coverage
`deep`	✅	Up to 3 rounds	Always	Thorough research, complex questions

{"question": "...", "search_depth": "balanced"}

Multi-Hop Traversal¶

Discover indirect connections through intermediate nodes:

{"question": "What pathways connect BRCA1 to breast cancer?", "hop_depth": 2}

hop_depth=1 — direct relationships only (~100ms)
hop_depth=2 — one intermediate node (~300ms)
hop_depth=3 — two intermediate nodes (~1s)

Includes timeout protection: 5s primary query, 2s fallback with reduced seed set.

See Multi-Hop Traversal and Timeout Protection.

SSE Streaming¶

Request real-time token-by-token streaming:

curl -N "http://localhost:8000/v1/sessions/{session_id}/query" \
  -H "Authorization: Bearer $API_AUTH_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"question": "Overview of protein biomarkers in kidney disease", "stream": true}'

Event sequence: retrieval_started → retrieval_complete → synthesis_started → token (repeated) → complete

Entity Disambiguation¶

Queries are automatically preprocessed to resolve entity mentions to canonical IDs:

"p53" → resolves to P04637 (TP53) with synonyms
"tumor necrosis factor" → resolves to TNF with UniProt ID

The expanded query includes synonyms for better retrieval.

Query Agents¶

Four agent implementations with different execution strategies:

Agent	Status	Approach
TwoPhaseAgent	Active (default)	Deterministic tool execution → LLM synthesis
DynamicQueryAgent	Active	6-strategy parallel search + CrossEncoder re-ranking
Neo4jQueryAgent	Active	Extends Dynamic with Neo4j-specific retrievers
LangGraphAgent	Shelved	ReAct loop (shelved due to hallucination issues)

See Agent Architecture for detailed documentation.

Observation Memory¶

The Observer/Reflector pattern compresses session context for long conversations:

Observer: Watches query-response pairs, extracts key facts
Reflector: Periodically compresses observations into a summary
Configured via REFLECTION_THRESHOLD and MAX_CONTEXT_TOKENS env vars

Precomputed Query Cache¶

20 high-value queries are precomputed and cached in Redis for instant responses:

# Warm the cache
uv run python scripts/precompute_queries.py --service local --database neo4j

# Check cache stats
curl "http://localhost:8000/v1/cache/stats" -H "Authorization: Bearer $API_AUTH_TOKEN"

See Precomputed Query Cache.

Agent Architecture — detailed agent comparison
Multi-Hop Traversal — full feature documentation
Metrics Dashboard — query intelligence metrics