Skip to content

Olink RAG Wiki

A knowledge graph system for protein-disease relationships built on Neo4j. Ingests scientific literature (PubMed, bioRxiv, PMC, PDFs, tabular data), extracts biological entities using LLMs, and exposes a REST API for graph-aware retrieval-augmented generation (RAG) queries.

Quick Navigation

Section Description
Getting Started Installation, configuration, first KG build
Architecture System design, data flow, component diagrams
Ingestion Pipeline Data sources, extraction, entity/relationship consolidation
Query System Agents, query modes, search depth, streaming
API Reference REST endpoints, request/response schemas, SSE
Knowledge Graph Tools CLI, MCP, LangChain tool interfaces
Infrastructure Docker, AWS CDK, Redis, Neo4j, monitoring
Testing Test strategy, markers, property-based tests
Operations Incremental updates, validation, maintenance
Advanced Features Multimodal, PPI, evidence weighting, communities
Integrations External databases, PubPeer, Retraction Watch, MCP
Development Contributing, onboarding, roadmap, restructuring

Tech Stack

  • Language: Python 3.12+ / Package Manager: uv
  • API: FastAPI + Granian (ASGI)
  • Database: Neo4j (graph) + Redis (cache/sessions)
  • LLM: LangChain → Ollama / AWS Bedrock / SageMaker / OpenAI
  • Embeddings: sentence-transformers (HuggingFace)
  • Frontend: React + Sigma.js (separate repo)
  • Infrastructure: Docker Compose (dev) / AWS CDK (prod)
  • Monitoring: Grafana + Prometheus + OpenTelemetry
  • Testing: pytest + Hypothesis (927 tests)
  • Linting: Ruff / Type Checking: mypy (strict)

Project Structure

├── api/                  # FastAPI REST API (domain-based routers)
├── pipeline/             # Ingestion, enrichment, and processing pipelines
├── src/                  # Core business logic, services, models, utilities
├── tests/                # All tests (unit, integration, properties, performance)
├── cdk_resources/        # AWS CDK infrastructure (separate uv project)
├── frontend/             # React frontend (git submodule, separate repo)
├── scripts/              # Operational scripts
├── data/                 # Query files, test datasets
├── docs/                 # Documentation (this wiki + original docs)
├── evaluation/           # Evaluation harness and golden datasets
├── examples/             # Example scripts
├── grafana/              # Dashboards and Prometheus config
├── ingest_main.py        # CLI: literature ingestion
├── enrichment_main.py    # CLI: CSV/Parquet/TXT enrichment
├── parallel_ingest.py    # CLI: parallel multi-source ingestion
└── pyproject.toml        # Project config, dependencies, tool settings