Integrations¶
External Database Integrations¶
| Database | Content | Status | Module |
|---|---|---|---|
| UniProt | Protein function, structure, GO terms | Integrated (ID mapping) | pipeline/processors/uniprot_integrator.py |
| MONDO | Disease ontology hierarchy | Integrated (OBO ingestion) | pipeline/processors/obo_structure_processor.py |
| DisGeNET | Gene-disease associations with GDA scores | Partial | pipeline/processors/disgenet_integrator.py |
| Open Targets | Drug targets, genetic associations | Partial | pipeline/processors/opentargets_integrator.py |
| STRING | Protein-protein interactions | Via PPI framework | pipeline/ingest/ppi_integration_pipeline.py |
| FunCoup | Functional coupling predictions | Via PPI framework | pipeline/ingest/multisource_ppi_pipeline.py |
| BioGRID | Experimental PPIs | Via PPI framework | pipeline/ingest/multisource_ppi_pipeline.py |
PubPeer Integration¶
Post-publication peer review comments linked to Abstract nodes:
- Async API client with rate limiting and retry logic
- PubPeerComment nodes with HAS_PUBPEER_COMMENT relationships
- Privacy-aware handling of anonymous comments
- Requires API key from contact@pubpeer.com
integrator = PubPeerIntegrator(db=db, api_key="your_key")
summary = await integrator.enrich_abstracts_with_comments(pmids=["12345678"])
See PubPeer Integration.
Retraction Watch Integration¶
Flags retracted papers and affected relationships:
- RetractionNotice nodes with HAS_RETRACTION relationships
- Tracks retraction nature: Retraction, Expression of Concern, Correction
- Flags affected relationships with
is_from_retracted_paper - Integrates with evidence weighting (retracted = 0.0 weight)
integrator = RetractionWatchIntegrator(db=db, api_key="your_key")
summary = await integrator.enrich_abstracts_with_retractions()
flagged = integrator.flag_affected_relationships()
See Retraction Watch Integration.
bioRxiv / medRxiv Preprints¶
Full preprint lifecycle management:
- Unified API for both bioRxiv and medRxiv
- Version tracking (all versions stored with timestamps)
- Publication linking (automatic PMID/DOI linking when published)
- Source type weighting (preprints weighted lower in evidence scoring)
See Preprint Integration.
MCP (Model Context Protocol)¶
Exposes KG tools to AI assistants via FastMCP:
- 16 read-only tools (biomarkers, pathways, interactions, evidence, etc.)
- Zero-overhead wrapper pattern (<0.001ms vs direct call)
- Automatic JSON schema generation from type annotations
- Works with Claude, VS Code, Kiro, Cursor
See MCP Integration Research and KG Tools Guide.
Olink Data¶
Olink-specific enrichment for proteomics panel data:
- Panel definitions with protein targets
- NPX (Normalised Protein eXpression) datasets
- Assay validation data (CVs, specificity)
- Cross-panel analysis
Processed via enrichment_main.py with Olink-specific column handlers.
Future Integrations¶
Planned but not yet implemented:
| Integration | Priority | Description |
|---|---|---|
| Reactome | Medium | Biological pathway data |
| IntAct | Medium | Experimentally validated PPIs |
| KEGG | Medium | Metabolic/signalling pathways (licensing) |
| ClinVar | Medium | Clinical variant significance |
| Human Protein Atlas | Medium | Tissue expression data |
| AlphaFold | Medium | 3D protein structure features |
| ENCODE/JASPAR | Low | TF binding data |
| X/Twitter | Low | Social media trend detection |
See Expansion Areas for the full roadmap.