GraphRAG Multi-Environment Architecture¶

Three Environments¶

graph TB
    subgraph PROD["Prod (stable)"]
        PR[React] --> PA[FastAPI]
        PA --> PN[Neo4j<br/>607K nodes / 3.3M rels]
    end

    subgraph ALPHA["Alpha (expanding)"]
        AR[React] --> AA[FastAPI]
        AA --> AN[Neo4j psx]
        AA --> ANP[Neptune serverless]
    end

    subgraph BETA["Beta (new architecture)"]
        BR[React] --> BA[FastAPI]
        BA --> BN[Neptune serverless<br/>graph queries]
        BA --> BAU[Aurora pgvector<br/>vector search]
    end

    style PROD fill:#d4edda,stroke:#28a745
    style ALPHA fill:#fff3cd,stroke:#ffc107
    style BETA fill:#cce5ff,stroke:#007bff

	Prod	Alpha	Beta
Graph	Neo4j	Neo4j + Neptune	Neptune
Vectors	Neo4j index	Neo4j index	Aurora pgvector
Status	✅ Stable	🔧 Bio-ingest tuning	🔧 Wiring up
Data	607K nodes, 3.3M rels	Restoring psx dump + 1.7M Neptune	1.7M Neptune, Aurora pending

Beta: Split Architecture¶

sequenceDiagram
    participant U as User
    participant QC as QueryCoordinator
    participant N as Neptune
    participant A as Aurora pgvector
    participant LLM as Bedrock LLM

    U->>QC: Query
    par Parallel
        QC->>N: Graph traversal (OpenCypher)
        QC->>A: Vector similarity (HNSW)
    end
    N-->>QC: Graph results
    A-->>QC: Vector results
    QC->>QC: RRF merge
    QC->>LLM: Merged context
    LLM-->>U: Answer + sources

Why split? - pgvector HNSW outperforms Neo4j vector at scale (>500K embeddings) - Neptune handles graph traversal + native algorithms (PageRank, CC) - Both serverless — scale independently, near-zero cost at idle

Ingestion Pipeline¶

flowchart LR
    S[PubMed / PMC / bioRxiv / PDFs / CSVs] --> E[LLM Extraction<br/>Bedrock]
    E --> J[JSONL on S3<br/>nodes + rels + embeddings]
    J --> |Neo4j| N1[neo4j-admin load]
    J --> |Neptune| N2[CSV bulk loader]
    J --> |Aurora| N3[batch INSERT]

Extract-only mode: no DB needed during LLM extraction
Parallel: 8 queries × 3 sources simultaneously
Resumable with checkpointing

What's New (Recent Commits)¶

Neptune + Aurora migration — VectorService, QueryCoordinator (parallel search + RRF), MigrationController (dual-write with feature flags)
JSONL-native ETL — decouple extraction from DB writes, bulk load from S3
Mega parallel ingestion — saturate LLM throughput across sources
Neo4j → Neptune abstraction — OpenCypher translation layer

Cost (Monthly Estimate)¶

	Prod	Alpha	Beta
Compute (ECS)	~$170	~$170	~$100
Database (idle)	included	+$30 Neptune	+$50 (Neptune + Aurora)
Database (active)	included	+$150 Neptune	+$250 (both scaling)
Total	~$200	$200–350	$150–350

Discussion¶

Alpha → Prod? Once bio-ingest is tuned and validated against gold set
Split worth it? Better scaling but more operational complexity — needed at our size?
Neptune cold starts — 5–15s after idle. Accept or keep-alive?
Standard ingestion — JSONL ETL (extract → S3 → bulk load) for all envs?
Next sprint — push new Docker image, tune bio-ingest, benchmark RRF vs two-phase agent

Meeting Agenda (30 min)¶

Time	Topic	Notes
0–5 min	Demo	Live query on prod, show graph viz + answer synthesis
5–10 min	Ingestion pipeline	JSONL ETL, parallel multi-source, extract-only mode, what's tunable
10–15 min	Database backends	Neo4j vs Neptune vs split (Aurora pgvector) — tradeoffs, cost, what we've learned
15–20 min	Query agent	Two-phase agent, QueryCoordinator + RRF merge, search depth modes
20–25 min	Environments & deployment	Prod/Alpha/Beta setup, CDK, Docker, how to iterate safely
25–30 min	Priorities & open questions	What to focus on next, who owns what, blockers