Olink RAG Wiki¶
A knowledge graph system for protein-disease relationships built on Neo4j. Ingests scientific literature (PubMed, bioRxiv, PMC, PDFs, tabular data), extracts biological entities using LLMs, and exposes a REST API for graph-aware retrieval-augmented generation (RAG) queries.
Quick Navigation¶
| Section | Description |
|---|---|
| Getting Started | Installation, configuration, first KG build |
| Architecture | System design, data flow, component diagrams |
| Ingestion Pipeline | Data sources, extraction, entity/relationship consolidation |
| Query System | Agents, query modes, search depth, streaming |
| API Reference | REST endpoints, request/response schemas, SSE |
| Knowledge Graph Tools | CLI, MCP, LangChain tool interfaces |
| Infrastructure | Docker, AWS CDK, Redis, Neo4j, monitoring |
| Testing | Test strategy, markers, property-based tests |
| Operations | Incremental updates, validation, maintenance |
| Advanced Features | Multimodal, PPI, evidence weighting, communities |
| Integrations | External databases, PubPeer, Retraction Watch, MCP |
| Development | Contributing, onboarding, roadmap, restructuring |
Tech Stack¶
- Language: Python 3.12+ / Package Manager: uv
- API: FastAPI + Granian (ASGI)
- Database: Neo4j (graph) + Redis (cache/sessions)
- LLM: LangChain → Ollama / AWS Bedrock / SageMaker / OpenAI
- Embeddings: sentence-transformers (HuggingFace)
- Frontend: React + Sigma.js (separate repo)
- Infrastructure: Docker Compose (dev) / AWS CDK (prod)
- Monitoring: Grafana + Prometheus + OpenTelemetry
- Testing: pytest + Hypothesis (927 tests)
- Linting: Ruff / Type Checking: mypy (strict)
Project Structure¶
├── api/ # FastAPI REST API (domain-based routers)
├── pipeline/ # Ingestion, enrichment, and processing pipelines
├── src/ # Core business logic, services, models, utilities
├── tests/ # All tests (unit, integration, properties, performance)
├── cdk_resources/ # AWS CDK infrastructure (separate uv project)
├── frontend/ # React frontend (git submodule, separate repo)
├── scripts/ # Operational scripts
├── data/ # Query files, test datasets
├── docs/ # Documentation (this wiki + original docs)
├── evaluation/ # Evaluation harness and golden datasets
├── examples/ # Example scripts
├── grafana/ # Dashboards and Prometheus config
├── ingest_main.py # CLI: literature ingestion
├── enrichment_main.py # CLI: CSV/Parquet/TXT enrichment
├── parallel_ingest.py # CLI: parallel multi-source ingestion
└── pyproject.toml # Project config, dependencies, tool settings