Incremental Updates¶
Weekly maintenance workflow for keeping the knowledge graph current with new PubMed publications.
Quick Start¶
# Update with default settings (last 7 days, 100 abstracts max)
uv run python scripts/incremental_pubmed_update.py \
--search-term "cardiovascular disease protein biomarker" \
--database olink1
# Custom time window
uv run python scripts/incremental_pubmed_update.py \
--search-term "kidney disease biomarker" --days 14 --database olink1
# Production (AWS Bedrock)
uv run python scripts/incremental_pubmed_update.py \
--search-term "cancer protein marker" --service bedrock --database olink1
5-Step Pipeline¶
| Step | Duration | What Happens |
|---|---|---|
| Fetch new abstracts | 2-3 min | Date-filtered PubMed query, PMID deduplication |
| Extract entities | 5-8 min | LLM-based extraction with token chunking |
| Consolidate entities | 1-2 min | UniProt/MONDO ID matching |
| Consolidate relationships | 1-2 min | Merge with existing edges, preserve evidence |
| Generate embeddings | 2-3 min | Only for new content |
| Total | 11-18 min | Target: <15 min for 100 abstracts |
Using ingest_main.py with Date Filters¶
uv run python ingest_main.py \
--queries-file data/queries.txt \
--date-from $(date -v-7d +%Y-%m-%d) \
--date-to $(date +%Y-%m-%d) \
--database olink1 --service bedrock
uv run python ingest_main.py --consolidate-relationships --database olink1
uv run python ingest_main.py --add-graph-embeddings --database olink1 --service bedrock
Weekly Cron Job¶
0 2 * * 1 cd /path/to/olink_rag && uv run python ingest_main.py \
--queries-file data/queries.txt \
--date-from $(date -v-7d +\%Y-\%m-\%d) \
--date-to $(date +\%Y-\%m-\%d) \
--database olink1 --service bedrock >> logs/weekly_update.log 2>&1
For the full guide including troubleshooting, see Incremental PubMed Update.