Query Agent Architecture¶
Agent Hierarchy¶
┌─────────────────────────────────────────────────────────┐
│ TwoPhaseAgent [ACTIVE — default] │
│ Phase 1: Deterministic execution of ALL 8+ tools │
│ Phase 2: Single LLM call to synthesize results │
│ Guaranteed tool execution — 0 hallucinated data │
└─────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────┐
│ DynamicQueryAgent (Base class) │
│ 6 search strategies executed in deterministic pipeline │
│ QueryProcessor → EntityDiscovery → SearchStrategies │
│ → ResultProcessor (CrossEncoder re-ranking) → REFRAG │
└───────────────────────┬─────────────────────────────────┘
│ extends
┌───────────────────────▼─────────────────────────────────┐
│ Neo4jQueryAgent (Subclass) │
│ Adds: VectorRetriever, HybridRetriever, GraphRAG, │
│ APOC KNN expansion — uses neo4j_graphrag library │
└─────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────┐
│ LangGraphAgent [SHELVED] │
│ ReAct loop via LangGraph — LLM decides tool calls │
│ Problem: LLM often skips tools, hallucinating answers │
└─────────────────────────────────────────────────────────┘
TwoPhaseAgent (Active Default)¶
File: src/agents/two_phase_agent.py
The active production agent. Solves the hallucination problem by separating data gathering from synthesis.
Phase 1 — Discovery (no LLM): Executes all graph tools deterministically: - Entity lookup, search, vector similarity - Domain-specific tools (biomarkers, pathways, interactions) - Community summaries (for global queries)
Phase 2 — Synthesis (single LLM call): Takes all gathered data and produces a coherent answer with citations.
Search depth controls Phase 2 expansion:
- fast: Skip expansion, return discovery results directly
- balanced: One round of neighbor expansion, Cypher fallback if < 3 targets
- deep: Up to 3 iterative expansion rounds + always Cypher fallback
DynamicQueryAgent¶
File: src/agents/dynamic_query_agent.py
Base class with a 6-strategy parallel search pipeline:
- Direct entity queries — exact match on known entities
- Schema-guided expansion — follow graph schema to find related nodes
- Relationship traversal — walk edges from seed entities
- Topic-based queries — keyword-based graph search
- LLM-generated Cypher — Text2Cypher for complex questions
- Literature search — fulltext search across abstracts
Results are re-ranked using a CrossEncoder model and optionally compressed via REFRAG.
Neo4jQueryAgent¶
File: src/agents/neo4j_query_agent.py
Extends DynamicQueryAgent with Neo4j-specific components:
- VectorRetriever — chunk_embeddings index for semantic search
- HybridRetriever — node_embeddings + fulltext index
- GraphRAG — LLM + chunk retriever for graph-aware generation
- APOC KNN — K-nearest neighbor graph expansion
Agent Selection¶
Agents are created via the factory pattern in src/factories/:
agent_type |
Agent Class | Use Case |
|---|---|---|
graph_rag_agent (default) |
TwoPhaseAgent | Production queries |
Standard / Hybrid |
Neo4jQueryAgent | GraphRAG + hybrid retrieval |
Dynamic |
DynamicQueryAgent | Multi-strategy search |
Agentic |
LangGraphAgent | Shelved |
Graph Tools (used by TwoPhaseAgent)¶
16 tools registered in src/kg_tools/:
| Tool | Description |
|---|---|
get_top_biomarkers |
Top protein biomarkers for a disease |
get_disease_proteins |
All proteins associated with a disease |
get_protein_pathways |
Pathways, GO terms, locations for a protein |
get_pathway_members |
All proteins in a pathway |
get_protein_interactions |
Protein-protein interactions |
get_protein_details |
Full protein profile |
search_entities |
Fuzzy search across all entity types |
get_disease_hierarchy |
MONDO ontology parent/child tree |
get_community_summary |
LLM-generated community summary |
get_evidence_for_association |
PMIDs, confidence, excerpts for an association |
get_graph_statistics |
Node/relationship counts, degree distribution |
get_shared_diseases |
Diseases shared between proteins |
get_shared_proteins |
Proteins shared between diseases |
get_entity_disambiguation |
Resolve aliases to canonical IDs |
get_relationship_evidence_depth |
Full evidence chain for a relationship |
run_cypher_query |
Read-only Cypher execution (30s timeout) |
See KG Tools Guide for full reference.