Skip to content

Incremental Updates

Weekly maintenance workflow for keeping the knowledge graph current with new PubMed publications.

Quick Start

# Update with default settings (last 7 days, 100 abstracts max)
uv run python scripts/incremental_pubmed_update.py \
  --search-term "cardiovascular disease protein biomarker" \
  --database olink1

# Custom time window
uv run python scripts/incremental_pubmed_update.py \
  --search-term "kidney disease biomarker" --days 14 --database olink1

# Production (AWS Bedrock)
uv run python scripts/incremental_pubmed_update.py \
  --search-term "cancer protein marker" --service bedrock --database olink1

5-Step Pipeline

Step Duration What Happens
Fetch new abstracts 2-3 min Date-filtered PubMed query, PMID deduplication
Extract entities 5-8 min LLM-based extraction with token chunking
Consolidate entities 1-2 min UniProt/MONDO ID matching
Consolidate relationships 1-2 min Merge with existing edges, preserve evidence
Generate embeddings 2-3 min Only for new content
Total 11-18 min Target: <15 min for 100 abstracts

Using ingest_main.py with Date Filters

uv run python ingest_main.py \
  --queries-file data/queries.txt \
  --date-from $(date -v-7d +%Y-%m-%d) \
  --date-to $(date +%Y-%m-%d) \
  --database olink1 --service bedrock

uv run python ingest_main.py --consolidate-relationships --database olink1
uv run python ingest_main.py --add-graph-embeddings --database olink1 --service bedrock

Weekly Cron Job

0 2 * * 1 cd /path/to/olink_rag && uv run python ingest_main.py \
  --queries-file data/queries.txt \
  --date-from $(date -v-7d +\%Y-\%m-\%d) \
  --date-to $(date +\%Y-\%m-\%d) \
  --database olink1 --service bedrock >> logs/weekly_update.log 2>&1

For the full guide including troubleshooting, see Incremental PubMed Update.