Skip to content

AWS Credentials & Neo4j Tunnel

Once you have the project installed and your first knowledge graph built locally, you'll likely want to connect to the shared Neo4j instance running on AWS. This page covers configuring AWS credentials and opening an SSM tunnel to the remote database.


AWS Credentials

The project uses standard AWS credential resolution. The simplest approach for local development is an AWS config file with named profiles.

Configure ~/.aws/credentials

[default]
aws_access_key_id = YOUR_ACCESS_KEY
aws_secret_access_key = YOUR_SECRET_KEY
region = eu-north-1

[olink]
aws_access_key_id = YOUR_ACCESS_KEY
aws_secret_access_key = YOUR_SECRET_KEY
region = eu-north-1

Configure ~/.aws/config

[sso-session olink-sso]
sso_start_url = https://d-c367264306.awsapps.com/start#
sso_region = eu-north-1
sso_registration_scopes = sso:account:access

[profile default]
sso_session = olink-sso
sso_account_id = 357836458011
sso_role_name = AWSPowerUserWithRolePrivileges
region = eu-north-1

[profile dsinternal]
sso_session = olink-sso
sso_account_id = 357836458011
sso_role_name = AWSPowerUserWithRolePrivileges
region = eu-north-1

[profile ds_internal_development]
sso_session = olink-sso
sso_account_id = 209479279306
sso_role_name = ds_internal_development
region = eu-north-1

Using a named profile

Set the profile in your shell or .env:

# Shell
export AWS_PROFILE=olink

# Or in .env
AWS_PROFILE=olink

SSO profiles

If your org uses AWS IAM Identity Center (SSO), configure it with:

aws configure sso --profile olink
Then log in before running any commands:
aws sso login --profile olink

Verify access

aws sts get-caller-identity --profile olink

You should see your account ID, ARN, and user ID.


Syncing Secrets from AWS

The project stores secrets in AWS Secrets Manager and config in SSM Parameter Store. Use the sync script to pull them into your local .env:

# Pull secrets + config into .env
uv run python cdk_resources/scripts/sync_env.py

# Dry-run (preview without writing)
uv run python cdk_resources/scripts/sync_env.py --dry-run

# Push local .env values up to AWS (careful!)
uv run python cdk_resources/scripts/sync_env.py --push

Options:

Flag Description
--region AWS region (default: eu-north-1)
--secret Secrets Manager secret name (default: graphrag-secrets)
--branch SSM parameter path branch (default: main)
--dry-run Print what would be written without touching .env
--push Push local values to AWS instead of pulling
--include-config With --push, also push config params to SSM

Neo4j SSM Tunnel

The production Neo4j instance runs inside a private VPC on ECS. To connect from your local machine, the project provides an SSM port-forwarding tunnel via a lightweight EC2 bastion instance.

The script lives at cdk_resources/scripts/neo4j-tunnel.sh.

Prerequisites

  1. AWS CLI v2 installed and configured (see above)
  2. Session Manager plugin installed:
    # macOS
    brew install --cask session-manager-plugin
    
  3. Sufficient IAM permissions: ec2:RunInstances, ec2:DescribeInstances, ec2:TerminateInstances, ssm:StartSession, ecs:ListTasks, ecs:DescribeTasks
  4. CDK stack deployed — the script expects the security group and IAM profile created by the CDK infrastructure stack

Commands

# Create a bastion instance and open the tunnel
bash cdk_resources/scripts/neo4j-tunnel.sh create graphrag

# Reconnect to an existing instance
bash cdk_resources/scripts/neo4j-tunnel.sh connect

# Check instance and tunnel status
bash cdk_resources/scripts/neo4j-tunnel.sh status

# Start a stopped instance (without opening tunnel)
bash cdk_resources/scripts/neo4j-tunnel.sh start

# Terminate the bastion instance permanently
bash cdk_resources/scripts/neo4j-tunnel.sh destroy

What create does

  1. Discovers the Neo4j ECS task's private IP via aws ecs list-tasks / describe-tasks
  2. Launches a t3.micro EC2 instance in the same VPC with SSM access
  3. Opens two persistent SSM port-forwarding sessions:
    • Bolt: localhost:7687 → Neo4j task :7687
    • HTTP/Browser: localhost:7474 → Neo4j task :7474
  4. Runs a keep-alive loop (every 30s) to prevent SSM idle timeout
  5. Auto-reconnects if a tunnel drops

Connecting your app

Once the tunnel is running, update your .env:

NEO4J_URI=bolt://localhost:7687
NEO4J_USER=neo4j
NEO4J_PASSWORD=<password from Secrets Manager>

Then start the API or run ingestion as normal — traffic routes through the tunnel transparently.

Tunnel lifecycle

State What to do
First time ./neo4j-tunnel.sh create graphrag
Resuming work ./neo4j-tunnel.sh connect
Left overnight ./neo4j-tunnel.sh start then connect
Done for good ./neo4j-tunnel.sh destroy

Cost

The bastion is a t3.micro (~$0.01/hr). Remember to destroy when you're done for extended periods.

Troubleshooting

Problem Fix
No running tasks found Neo4j ECS service may be scaled to 0 — check the AWS console
Tunnel security group not found CDK stack hasn't been deployed — run cdk deploy in cdk_resources/
Session Manager plugin not found Install: brew install --cask session-manager-plugin
Tunnel drops repeatedly Check your AWS credentials haven't expired (aws sts get-caller-identity)
Port already in use Another tunnel or local Neo4j is using 7687/7474 — kill it or change BOLT_LOCAL_PORT in the script

Bootstrapping Secrets (First-Time Setup)

If you're setting up a fresh AWS environment (no existing secrets), use the bootstrap script:

bash cdk_resources/scripts/create-secrets.sh

This creates two Secrets Manager entries with placeholder values:

  • graphrag-secrets — API keys, passwords, org IDs
  • graphrag-entra-oidc-secrets — Entra ID OIDC credentials for Cognito

Then update the real values via the AWS Console or CLI, and pull them locally:

uv run python cdk_resources/scripts/sync_env.py

Summary

flowchart LR
    A[~/.aws/credentials] --> B[AWS CLI]
    B --> C[sync_env.py]
    C --> D[.env]
    B --> E[neo4j-tunnel.sh]
    E --> F[EC2 Bastion]
    F -->|SSM Port Forward| G[Neo4j ECS Task]
    D --> H[API / Ingestion]
    H -->|bolt://localhost:7687| F