Scaling Tradeoff: Neo4j (Prod) vs Neptune + Aurora (Beta)¶

Architecture Comparison¶

	Neo4j (Prod)	Neptune + Aurora (Beta)
Graph storage	Neo4j Community (single instance)	Neptune Serverless (managed)
Vector embeddings	Stored as node properties (768-dim float arrays)	Aurora pgvector (HNSW indexes, dedicated)
Vector search	Neo4j vector index (in-process)	Aurora pgvector (separate query path)
Query pattern	Single Bolt connection handles both	Parallel: graph via Neptune + vectors via Aurora, RRF merge

Scaling Characteristics¶

Neo4j: Embeddings as Node Properties¶

Edges        Nodes (est)   Embedding Storage    Total DB Size    Vector Search Latency
─────────────────────────────────────────────────────────────────────────────────────
100K         50K           ~150 MB              ~500 MB          <50ms (warm)
500K         200K          ~600 MB              ~2 GB            50-100ms
1M           400K          ~1.2 GB              ~4 GB            100-300ms
5M           1M            ~3 GB                ~12 GB           300-800ms
10M          2M            ~6 GB                ~25 GB           500ms-2s (page cache pressure)

Why it degrades: - Embeddings (768 × 4 bytes = 3KB per node) bloat node storage - Neo4j stores properties inline with nodes — vector index must fit in page cache - At ~1M nodes with embeddings, page cache (1-2GB on Fargate) starts thrashing - Vector index scans compete with graph traversals for memory - Community Edition: single instance, no read replicas, no sharding

Practical ceiling: ~500K-1M nodes with embeddings before latency becomes unacceptable on a 4GB Fargate task.

Neptune + Aurora: Separated Concerns¶

Edges        Nodes         Neptune (graph)      Aurora (vectors)   Combined Latency
─────────────────────────────────────────────────────────────────────────────────────
100K         50K           ~1 NCU ($0.12/hr)    0.5 ACU ($0.06/hr) <100ms (cold start)
500K         200K          ~1 NCU               0.5 ACU            80-120ms
1M           400K          ~1.5 NCU             0.5 ACU            80-150ms
5M           1M            ~2 NCU               1 ACU              100-200ms
10M          2M            ~2.5 NCU             2 ACU              120-250ms
50M          5M            ~4 NCU               4 ACU              150-300ms

Why it scales better: - Neptune: purpose-built graph engine, no embedding bloat in graph storage - Aurora pgvector: HNSW index is memory-optimized for vector search - Each service scales independently (serverless auto-scaling) - No memory contention between graph traversals and vector scans - Neptune handles 100K+ edges/sec for bulk loads

Practical ceiling: Neptune supports billions of edges. Aurora pgvector handles millions of vectors with sub-200ms search.

Expected Performance Curves¶

Scaling Tradeoff: Latency and Cost

Methodology & Assumptions¶

Latency estimates are based on:

Neo4j: Observed behavior of Neo4j Community 5.x on ECS Fargate (4GB RAM, 2 vCPU) with 768-dim embeddings stored as node properties. The vector index uses Neo4j's native HNSW implementation. Degradation above 500K nodes is driven by page cache pressure — embeddings consume ~3KB per node (768 floats × 4 bytes), so 1M nodes = 3GB of embedding data alone, exceeding the 1-2GB page cache allocation. Latency values at lower scales (50-120ms) are from our production system (607K nodes, 3.3M rels). Values above 1M edges are extrapolated from Neo4j Community benchmarks and page cache miss modeling.
Neptune + Aurora: Neptune serverless latency is based on AWS published benchmarks for OpenCypher queries on serverless clusters (1-2.5 NCU range). Aurora pgvector latency uses HNSW index performance from pgvector benchmarks at various dataset sizes (sub-100ms for <1M vectors with ef_search=40). The combined latency includes ~20ms overhead for the QueryCoordinator's parallel fetch + RRF merge. Cold start penalty (10-30s on first query after idle) is excluded from p95 — assumes warm steady-state.

Cost estimates are based on:

Neo4j: ECS Fargate pricing in eu-north-1 (vCPU: $0.04048/hr, GB RAM: $0.004445/hr) + EFS storage ($0.33/GB-month). At higher scales, RAM must increase (8GB at 2M edges, 16GB at 10M) to maintain acceptable page cache hit rates.
Neptune: Serverless NCU pricing ($0.1106/NCU-hour in eu-north-1). NCU consumption estimated from AWS documentation: 1 NCU handles ~500K edges with moderate query load, scaling sub-linearly. Storage is included in NCU pricing.
Aurora: Serverless v2 ACU pricing ($0.12/ACU-hour in eu-north-1). 0.5 ACU minimum handles up to ~500K vectors; scales based on concurrent query load and index size. Storage: $0.10/GB-month for the pgvector data.

Efficiency score = latency_ms × monthly_cost_usd / 1000. This penalizes architectures that are both slow and expensive. Lower is better.

Limitations: These are modeled estimates, not measured benchmarks from identical workloads. Actual performance depends on query complexity (multi-hop vs single lookup), write concurrency, embedding dimensionality, and caching behavior. The crossover zone (~1M edges) should be validated with production query patterns before migration.

Cost Breakdown at Key Scale Points¶

At 500K edges (current prod)¶

Component	Neo4j (Prod)	Neptune + Aurora (Beta)
Compute	Fargate 4GB: ~$120/mo	Neptune 1 NCU: ~$80/mo + Aurora 0.5 ACU: ~$45/mo
Storage	EFS: ~$15/mo	Neptune: included + Aurora: ~$5/mo
Total	~$135/mo	~$130/mo
Latency (p95)	~100ms	~120ms (parallel fetch + merge overhead)

Verdict: Roughly equivalent. Neo4j simpler, Neptune+Aurora slightly more complex but same cost.

At 2M edges (near-term target)¶

Component	Neo4j (Prod)	Neptune + Aurora (Beta)
Compute	Fargate 8GB needed: ~$240/mo	Neptune 1.5 NCU: ~$120/mo + Aurora 1 ACU: ~$90/mo
Storage	EFS: ~$40/mo	Neptune: included + Aurora: ~$10/mo
Total	~$280/mo	~$220/mo
Latency (p95)	~400ms (page cache pressure)	~150ms

Verdict: Neptune+Aurora wins on both cost and latency.

At 10M edges (long-term)¶

Component	Neo4j (Prod)	Neptune + Aurora (Beta)
Compute	Fargate 16GB or EC2: ~$500/mo	Neptune 2.5 NCU: ~$200/mo + Aurora 2 ACU: ~$180/mo
Storage	EFS: ~$100/mo	Neptune: included + Aurora: ~$20/mo
Total	~$600/mo	~$400/mo
Latency (p95)	~1-2s (unacceptable)	~250ms

Verdict: Neo4j Community is not viable at this scale without Enterprise (sharding, clustering).

Key Tradeoffs¶

Factor	Neo4j (Prod)	Neptune + Aurora (Beta)
Simplicity	✅ Single database, single query language	❌ Two services, query coordinator needed
Cypher ecosystem	✅ Full Cypher, APOC, GDS	⚠️ OpenCypher subset (no APOC, limited GDS)
Vector search quality	⚠️ Basic vector index	✅ HNSW with tunable ef_search, IVFFlat option
Operational overhead	✅ One EFS backup	⚠️ Two services to monitor
Scaling ceiling	❌ ~1M nodes with embeddings	✅ Billions of edges, millions of vectors
Cold start	✅ None (always running)	⚠️ Neptune serverless: 10-30s cold start
Write throughput	⚠️ ~5K writes/sec (single instance)	✅ Neptune: 100K+ writes/sec
Multi-tenancy	❌ Community = 1 database	✅ Neptune = label-based isolation
Graph algorithms	✅ GDS library (PageRank, Louvain, etc)	⚠️ Neptune `neptune.algo.*` (limited set)
Fulltext search	✅ Native Lucene indexes	✅ Aurora pg_trgm + tsvector

Recommendation¶

Scale (edges)     Recommended Architecture
────────────────────────────────────────────
< 500K           Neo4j (simpler, cheaper, full Cypher)
500K - 2M        Either works; Neptune+Aurora if growth expected
2M - 10M         Neptune + Aurora (Neo4j Community hits ceiling)
> 10M            Neptune + Aurora (only viable option without Neo4j Enterprise)

Current state: - Prod (607K nodes, 3.3M rels) is approaching the Neo4j ceiling - Beta (Neptune + Aurora) is the right architecture for continued growth - Migration path: dual-write via MigrationController, validate with QueryCoordinator, cut over when confident

Crossover point: ~1M edges is where Neptune+Aurora becomes clearly superior on latency. At ~2M edges, Neo4j Community becomes cost-inefficient (needs more RAM than Fargate easily provides).

Migration Strategy¶

Phase 1 (current): Prod on Neo4j, Beta on Neptune+Aurora
                    ↓ validate query parity
Phase 2:           Dual-write to both (MigrationController)
                    ↓ compare latency/accuracy
Phase 3:           Route reads to Neptune+Aurora, writes to both
                    ↓ confidence threshold met
Phase 4:           Cut over prod to Neptune+Aurora
                    Neo4j becomes read-only archive