GraphRAG Multi-Environment Architecture
Three Environments
graph TB
subgraph PROD["Prod (stable)"]
PR[React] --> PA[FastAPI]
PA --> PN[Neo4j<br/>607K nodes / 3.3M rels]
end
subgraph ALPHA["Alpha (expanding)"]
AR[React] --> AA[FastAPI]
AA --> AN[Neo4j psx]
AA --> ANP[Neptune serverless]
end
subgraph BETA["Beta (new architecture)"]
BR[React] --> BA[FastAPI]
BA --> BN[Neptune serverless<br/>graph queries]
BA --> BAU[Aurora pgvector<br/>vector search]
end
style PROD fill:#d4edda,stroke:#28a745
style ALPHA fill:#fff3cd,stroke:#ffc107
style BETA fill:#cce5ff,stroke:#007bff
Prod
Alpha
Beta
Graph
Neo4j
Neo4j + Neptune
Neptune
Vectors
Neo4j index
Neo4j index
Aurora pgvector
Status
✅ Stable
🔧 Bio-ingest tuning
🔧 Wiring up
Data
607K nodes, 3.3M rels
Restoring psx dump + 1.7M Neptune
1.7M Neptune, Aurora pending
Beta: Split Architecture
sequenceDiagram
participant U as User
participant QC as QueryCoordinator
participant N as Neptune
participant A as Aurora pgvector
participant LLM as Bedrock LLM
U->>QC: Query
par Parallel
QC->>N: Graph traversal (OpenCypher)
QC->>A: Vector similarity (HNSW)
end
N-->>QC: Graph results
A-->>QC: Vector results
QC->>QC: RRF merge
QC->>LLM: Merged context
LLM-->>U: Answer + sources
Why split?
- pgvector HNSW outperforms Neo4j vector at scale (>500K embeddings)
- Neptune handles graph traversal + native algorithms (PageRank, CC)
- Both serverless — scale independently, near-zero cost at idle
Ingestion Pipeline
flowchart LR
S[PubMed / PMC / bioRxiv / PDFs / CSVs] --> E[LLM Extraction<br/>Bedrock]
E --> J[JSONL on S3<br/>nodes + rels + embeddings]
J --> |Neo4j| N1[neo4j-admin load]
J --> |Neptune| N2[CSV bulk loader]
J --> |Aurora| N3[batch INSERT]
Extract-only mode: no DB needed during LLM extraction
Parallel: 8 queries × 3 sources simultaneously
Resumable with checkpointing
What's New (Recent Commits)
Neptune + Aurora migration — VectorService, QueryCoordinator (parallel search + RRF), MigrationController (dual-write with feature flags)
JSONL-native ETL — decouple extraction from DB writes, bulk load from S3
Mega parallel ingestion — saturate LLM throughput across sources
Neo4j → Neptune abstraction — OpenCypher translation layer
Cost (Monthly Estimate)
Prod
Alpha
Beta
Compute (ECS)
~$170
~$170
~$100
Database (idle)
included
+$30 Neptune
+$50 (Neptune + Aurora)
Database (active)
included
+$150 Neptune
+$250 (both scaling)
Total
~$200
$200–350
$150–350
Discussion
Alpha → Prod? Once bio-ingest is tuned and validated against gold set
Split worth it? Better scaling but more operational complexity — needed at our size?
Neptune cold starts — 5–15s after idle. Accept or keep-alive?
Standard ingestion — JSONL ETL (extract → S3 → bulk load) for all envs?
Next sprint — push new Docker image, tune bio-ingest, benchmark RRF vs two-phase agent
Meeting Agenda (30 min)
Time
Topic
Notes
0–5 min
Demo
Live query on prod, show graph viz + answer synthesis
5–10 min
Ingestion pipeline
JSONL ETL, parallel multi-source, extract-only mode, what's tunable
10–15 min
Database backends
Neo4j vs Neptune vs split (Aurora pgvector) — tradeoffs, cost, what we've learned
15–20 min
Query agent
Two-phase agent, QueryCoordinator + RRF merge, search depth modes
20–25 min
Environments & deployment
Prod/Alpha/Beta setup, CDK, Docker, how to iterate safely
25–30 min
Priorities & open questions
What to focus on next, who owns what, blockers