Query System¶
Overview¶
The query system provides graph-aware retrieval-augmented generation (RAG) with multiple query modes, configurable search depth, entity disambiguation, multi-hop traversal, and SSE streaming responses.
Query Modes¶
| Mode | Strategy | Best For |
|---|---|---|
local |
Entity neighborhood search | Specific entities, direct associations |
global |
Community summary-based retrieval | Broad overviews, landscape questions |
hybrid |
Merges local + global with deduplication | Complex questions needing both breadth and depth |
naive |
Direct LLM answer, no graph retrieval | Simple factual questions |
| (omit) | Auto-classification based on query content | Default — system picks the best mode |
curl -X POST "http://localhost:8000/v1/sessions/{session_id}/query" \
-H "Authorization: Bearer $API_AUTH_TOKEN" \
-H "Content-Type: application/json" \
-d '{"question": "What proteins are associated with cardiovascular disease?", "mode": "hybrid"}'
Search Depth¶
Controls how aggressively the two-phase agent gathers data:
| Depth | Discovery | Expansion | Cypher Fallback | Best For |
|---|---|---|---|---|
fast |
✅ | ❌ | ❌ | Quick lookups, low latency |
balanced |
✅ | 1 round | If < 3 targets | Default — good coverage |
deep |
✅ | Up to 3 rounds | Always | Thorough research, complex questions |
Multi-Hop Traversal¶
Discover indirect connections through intermediate nodes:
hop_depth=1— direct relationships only (~100ms)hop_depth=2— one intermediate node (~300ms)hop_depth=3— two intermediate nodes (~1s)
Includes timeout protection: 5s primary query, 2s fallback with reduced seed set.
See Multi-Hop Traversal and Timeout Protection.
SSE Streaming¶
Request real-time token-by-token streaming:
curl -N "http://localhost:8000/v1/sessions/{session_id}/query" \
-H "Authorization: Bearer $API_AUTH_TOKEN" \
-H "Content-Type: application/json" \
-d '{"question": "Overview of protein biomarkers in kidney disease", "stream": true}'
Event sequence: retrieval_started → retrieval_complete → synthesis_started → token (repeated) → complete
Entity Disambiguation¶
Queries are automatically preprocessed to resolve entity mentions to canonical IDs:
"p53"→ resolves to P04637 (TP53) with synonyms"tumor necrosis factor"→ resolves to TNF with UniProt ID
The expanded query includes synonyms for better retrieval.
Query Agents¶
Four agent implementations with different execution strategies:
| Agent | Status | Approach |
|---|---|---|
| TwoPhaseAgent | Active (default) | Deterministic tool execution → LLM synthesis |
| DynamicQueryAgent | Active | 6-strategy parallel search + CrossEncoder re-ranking |
| Neo4jQueryAgent | Active | Extends Dynamic with Neo4j-specific retrievers |
| LangGraphAgent | Shelved | ReAct loop (shelved due to hallucination issues) |
See Agent Architecture for detailed documentation.
Observation Memory¶
The Observer/Reflector pattern compresses session context for long conversations:
- Observer: Watches query-response pairs, extracts key facts
- Reflector: Periodically compresses observations into a summary
- Configured via
REFLECTION_THRESHOLDandMAX_CONTEXT_TOKENSenv vars
Precomputed Query Cache¶
20 high-value queries are precomputed and cached in Redis for instant responses:
# Warm the cache
uv run python scripts/precompute_queries.py --service local --database neo4j
# Check cache stats
curl "http://localhost:8000/v1/cache/stats" -H "Authorization: Bearer $API_AUTH_TOKEN"
Related Pages¶
- Agent Architecture — detailed agent comparison
- Multi-Hop Traversal — full feature documentation
- Metrics Dashboard — query intelligence metrics