Skip to content

GraphRAG Multi-Environment Architecture

Three Environments

graph TB
    subgraph PROD["Prod (stable)"]
        PR[React] --> PA[FastAPI]
        PA --> PN[Neo4j<br/>607K nodes / 3.3M rels]
    end

    subgraph ALPHA["Alpha (expanding)"]
        AR[React] --> AA[FastAPI]
        AA --> AN[Neo4j psx]
        AA --> ANP[Neptune serverless]
    end

    subgraph BETA["Beta (new architecture)"]
        BR[React] --> BA[FastAPI]
        BA --> BN[Neptune serverless<br/>graph queries]
        BA --> BAU[Aurora pgvector<br/>vector search]
    end

    style PROD fill:#d4edda,stroke:#28a745
    style ALPHA fill:#fff3cd,stroke:#ffc107
    style BETA fill:#cce5ff,stroke:#007bff
Prod Alpha Beta
Graph Neo4j Neo4j + Neptune Neptune
Vectors Neo4j index Neo4j index Aurora pgvector
Status ✅ Stable 🔧 Bio-ingest tuning 🔧 Wiring up
Data 607K nodes, 3.3M rels Restoring psx dump + 1.7M Neptune 1.7M Neptune, Aurora pending

Beta: Split Architecture

sequenceDiagram
    participant U as User
    participant QC as QueryCoordinator
    participant N as Neptune
    participant A as Aurora pgvector
    participant LLM as Bedrock LLM

    U->>QC: Query
    par Parallel
        QC->>N: Graph traversal (OpenCypher)
        QC->>A: Vector similarity (HNSW)
    end
    N-->>QC: Graph results
    A-->>QC: Vector results
    QC->>QC: RRF merge
    QC->>LLM: Merged context
    LLM-->>U: Answer + sources

Why split? - pgvector HNSW outperforms Neo4j vector at scale (>500K embeddings) - Neptune handles graph traversal + native algorithms (PageRank, CC) - Both serverless — scale independently, near-zero cost at idle


Ingestion Pipeline

flowchart LR
    S[PubMed / PMC / bioRxiv / PDFs / CSVs] --> E[LLM Extraction<br/>Bedrock]
    E --> J[JSONL on S3<br/>nodes + rels + embeddings]
    J --> |Neo4j| N1[neo4j-admin load]
    J --> |Neptune| N2[CSV bulk loader]
    J --> |Aurora| N3[batch INSERT]
  • Extract-only mode: no DB needed during LLM extraction
  • Parallel: 8 queries × 3 sources simultaneously
  • Resumable with checkpointing

What's New (Recent Commits)

  • Neptune + Aurora migrationVectorService, QueryCoordinator (parallel search + RRF), MigrationController (dual-write with feature flags)
  • JSONL-native ETL — decouple extraction from DB writes, bulk load from S3
  • Mega parallel ingestion — saturate LLM throughput across sources
  • Neo4j → Neptune abstraction — OpenCypher translation layer

Cost (Monthly Estimate)

Prod Alpha Beta
Compute (ECS) ~$170 ~$170 ~$100
Database (idle) included +$30 Neptune +$50 (Neptune + Aurora)
Database (active) included +$150 Neptune +$250 (both scaling)
Total ~$200 $200–350 $150–350

Discussion

  1. Alpha → Prod? Once bio-ingest is tuned and validated against gold set
  2. Split worth it? Better scaling but more operational complexity — needed at our size?
  3. Neptune cold starts — 5–15s after idle. Accept or keep-alive?
  4. Standard ingestion — JSONL ETL (extract → S3 → bulk load) for all envs?
  5. Next sprint — push new Docker image, tune bio-ingest, benchmark RRF vs two-phase agent

Meeting Agenda (30 min)

Time Topic Notes
0–5 min Demo Live query on prod, show graph viz + answer synthesis
5–10 min Ingestion pipeline JSONL ETL, parallel multi-source, extract-only mode, what's tunable
10–15 min Database backends Neo4j vs Neptune vs split (Aurora pgvector) — tradeoffs, cost, what we've learned
15–20 min Query agent Two-phase agent, QueryCoordinator + RRF merge, search depth modes
20–25 min Environments & deployment Prod/Alpha/Beta setup, CDK, Docker, how to iterate safely
25–30 min Priorities & open questions What to focus on next, who owns what, blockers