Skip to content

Query Agent Architecture

Agent Hierarchy

┌─────────────────────────────────────────────────────────┐
│  TwoPhaseAgent  [ACTIVE — default]                      │
│  Phase 1: Deterministic execution of ALL 8+ tools       │
│  Phase 2: Single LLM call to synthesize results         │
│  Guaranteed tool execution — 0 hallucinated data        │
└─────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────┐
│  DynamicQueryAgent  (Base class)                        │
│  6 search strategies executed in deterministic pipeline  │
│  QueryProcessor → EntityDiscovery → SearchStrategies    │
│  → ResultProcessor (CrossEncoder re-ranking) → REFRAG   │
└───────────────────────┬─────────────────────────────────┘
                        │ extends
┌───────────────────────▼─────────────────────────────────┐
│  Neo4jQueryAgent  (Subclass)                            │
│  Adds: VectorRetriever, HybridRetriever, GraphRAG,     │
│  APOC KNN expansion — uses neo4j_graphrag library       │
└─────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────┐
│  LangGraphAgent  [SHELVED]                              │
│  ReAct loop via LangGraph — LLM decides tool calls      │
│  Problem: LLM often skips tools, hallucinating answers   │
└─────────────────────────────────────────────────────────┘

TwoPhaseAgent (Active Default)

File: src/agents/two_phase_agent.py

The active production agent. Solves the hallucination problem by separating data gathering from synthesis.

Phase 1 — Discovery (no LLM): Executes all graph tools deterministically: - Entity lookup, search, vector similarity - Domain-specific tools (biomarkers, pathways, interactions) - Community summaries (for global queries)

Phase 2 — Synthesis (single LLM call): Takes all gathered data and produces a coherent answer with citations.

Search depth controls Phase 2 expansion: - fast: Skip expansion, return discovery results directly - balanced: One round of neighbor expansion, Cypher fallback if < 3 targets - deep: Up to 3 iterative expansion rounds + always Cypher fallback

DynamicQueryAgent

File: src/agents/dynamic_query_agent.py

Base class with a 6-strategy parallel search pipeline:

  1. Direct entity queries — exact match on known entities
  2. Schema-guided expansion — follow graph schema to find related nodes
  3. Relationship traversal — walk edges from seed entities
  4. Topic-based queries — keyword-based graph search
  5. LLM-generated Cypher — Text2Cypher for complex questions
  6. Literature search — fulltext search across abstracts

Results are re-ranked using a CrossEncoder model and optionally compressed via REFRAG.

Neo4jQueryAgent

File: src/agents/neo4j_query_agent.py

Extends DynamicQueryAgent with Neo4j-specific components: - VectorRetrieverchunk_embeddings index for semantic search - HybridRetrievernode_embeddings + fulltext index - GraphRAG — LLM + chunk retriever for graph-aware generation - APOC KNN — K-nearest neighbor graph expansion

Agent Selection

Agents are created via the factory pattern in src/factories/:

agent_type Agent Class Use Case
graph_rag_agent (default) TwoPhaseAgent Production queries
Standard / Hybrid Neo4jQueryAgent GraphRAG + hybrid retrieval
Dynamic DynamicQueryAgent Multi-strategy search
Agentic LangGraphAgent Shelved

Graph Tools (used by TwoPhaseAgent)

16 tools registered in src/kg_tools/:

Tool Description
get_top_biomarkers Top protein biomarkers for a disease
get_disease_proteins All proteins associated with a disease
get_protein_pathways Pathways, GO terms, locations for a protein
get_pathway_members All proteins in a pathway
get_protein_interactions Protein-protein interactions
get_protein_details Full protein profile
search_entities Fuzzy search across all entity types
get_disease_hierarchy MONDO ontology parent/child tree
get_community_summary LLM-generated community summary
get_evidence_for_association PMIDs, confidence, excerpts for an association
get_graph_statistics Node/relationship counts, degree distribution
get_shared_diseases Diseases shared between proteins
get_shared_proteins Proteins shared between diseases
get_entity_disambiguation Resolve aliases to canonical IDs
get_relationship_evidence_depth Full evidence chain for a relationship
run_cypher_query Read-only Cypher execution (30s timeout)

See KG Tools Guide for full reference.