Skip to content

Integrations

External Database Integrations

Database Content Status Module
UniProt Protein function, structure, GO terms Integrated (ID mapping) pipeline/processors/uniprot_integrator.py
MONDO Disease ontology hierarchy Integrated (OBO ingestion) pipeline/processors/obo_structure_processor.py
DisGeNET Gene-disease associations with GDA scores Partial pipeline/processors/disgenet_integrator.py
Open Targets Drug targets, genetic associations Partial pipeline/processors/opentargets_integrator.py
STRING Protein-protein interactions Via PPI framework pipeline/ingest/ppi_integration_pipeline.py
FunCoup Functional coupling predictions Via PPI framework pipeline/ingest/multisource_ppi_pipeline.py
BioGRID Experimental PPIs Via PPI framework pipeline/ingest/multisource_ppi_pipeline.py

PubPeer Integration

Post-publication peer review comments linked to Abstract nodes:

  • Async API client with rate limiting and retry logic
  • PubPeerComment nodes with HAS_PUBPEER_COMMENT relationships
  • Privacy-aware handling of anonymous comments
  • Requires API key from contact@pubpeer.com
integrator = PubPeerIntegrator(db=db, api_key="your_key")
summary = await integrator.enrich_abstracts_with_comments(pmids=["12345678"])

See PubPeer Integration.

Retraction Watch Integration

Flags retracted papers and affected relationships:

  • RetractionNotice nodes with HAS_RETRACTION relationships
  • Tracks retraction nature: Retraction, Expression of Concern, Correction
  • Flags affected relationships with is_from_retracted_paper
  • Integrates with evidence weighting (retracted = 0.0 weight)
integrator = RetractionWatchIntegrator(db=db, api_key="your_key")
summary = await integrator.enrich_abstracts_with_retractions()
flagged = integrator.flag_affected_relationships()

See Retraction Watch Integration.

bioRxiv / medRxiv Preprints

Full preprint lifecycle management:

  • Unified API for both bioRxiv and medRxiv
  • Version tracking (all versions stored with timestamps)
  • Publication linking (automatic PMID/DOI linking when published)
  • Source type weighting (preprints weighted lower in evidence scoring)

See Preprint Integration.

MCP (Model Context Protocol)

Exposes KG tools to AI assistants via FastMCP:

  • 16 read-only tools (biomarkers, pathways, interactions, evidence, etc.)
  • Zero-overhead wrapper pattern (<0.001ms vs direct call)
  • Automatic JSON schema generation from type annotations
  • Works with Claude, VS Code, Kiro, Cursor

See MCP Integration Research and KG Tools Guide.

Olink-specific enrichment for proteomics panel data:

  • Panel definitions with protein targets
  • NPX (Normalised Protein eXpression) datasets
  • Assay validation data (CVs, specificity)
  • Cross-panel analysis

Processed via enrichment_main.py with Olink-specific column handlers.

Future Integrations

Planned but not yet implemented:

Integration Priority Description
Reactome Medium Biological pathway data
IntAct Medium Experimentally validated PPIs
KEGG Medium Metabolic/signalling pathways (licensing)
ClinVar Medium Clinical variant significance
Human Protein Atlas Medium Tissue expression data
AlphaFold Medium 3D protein structure features
ENCODE/JASPAR Low TF binding data
X/Twitter Low Social media trend detection

See Expansion Areas for the full roadmap.