Skip to content

KG Extraction Model Comparison — Full Report

Date: 2026-05-22
Abstracts: 17 | Models: 8 | Runs per model: 3 | Total LLM calls: 408

Test Corpus

# PMID Title Link
1 39579765 Atlas of the plasma proteome in health and disease in 53,026 adults. PubMed
2 34857953 Large-scale integration of the plasma proteome with genetics and disea... PubMed
3 38057571 Organ aging signatures in the plasma proteome track health and disease... PubMed
4 34233864 MCP-1: Function, regulation, and involvement in disease. PubMed
5 27590293 Anti-LGI1 encephalitis: Clinical syndrome and long-term follow-up. PubMed
6 27928579 Protein-Protein Interface and Disease: Perspective from Biomolecular N... PubMed
7 34735869 Circulating C1q levels in health and disease, more than just a biomark... PubMed
8 22568099 [IgG4-related disease]. PubMed
9 38538601 S100A8/A9 as a prognostic biomarker with causal effects for post-acute... PubMed
10 24497460 Ensembles of protein termini and specific proteolytic signatures as ca... PubMed
11 17157395 Pregnancy-associated plasma protein-A: an emerging cardiac biomarker. PubMed
12 31037637 Clinical Applications Targeting Periostin. PubMed
13 33532887 STING-Mediated Lung Inflammation and Beyond. PubMed
14 31371977 Cholesterol-embolization syndrome: current perspectives. PubMed
15 38125621 YKL-40 as a biomarker in various inflammatory diseases: A review. PubMed
16 24631021 UPLC-MS(E) application in disease biomarker discovery: the discoveries... PubMed
17 32781555 Protein-Related Circular RNAs in Human Pathologies. PubMed

Model Performance Summary

Model Size Avg Entities Avg Rels Consistency Unique Entities Unique Rels Avg Latency Total Tokens
Llama 3.3 70B 70B 4.7 4.1 91.0% 75 81 2566ms 45,910
Claude Sonnet 4.6 ~70B 4.2 3.6 90.6% 73 61 5425ms 49,740
Llama 3.1 8B 8B 7.9 6.5 88.6% 119 161 1989ms 59,788
Ministral 14B 14B 9.1 8.3 78.1% 172 217 5214ms 72,403
Mistral Large 3 675B 7.7 6.4 77.8% 137 162 4920ms 60,222
Amazon Nova Pro ~30B 6.4 5.3 66.7% 135 138 3034ms 51,691
Amazon Nova Lite ~8B 10.8 7.9 66.1% 222 234 12459ms 68,390
Amazon Nova Micro ~3B 7.3 5.3 61.8% 153 184 1737ms 54,372

Entity Overlap (Jaccard Similarity)

Higher = more agreement between models on what to extract.

Nova Lite Nova Micro Nova Pro C.Sonnet 4.6 Llama 3.1 8B Llama 3.3 70B Ministral 14B Mis.Large 3
Nova Lite 38% 41% 22% 30% 22% 42% 33%
Nova Micro 38% 35% 30% 35% 27% 31% 41%
Nova Pro 41% 35% 36% 37% 30% 48% 48%
C.Sonnet 4.6 22% 30% 36% 37% 47% 32% 41%
Llama 3.1 8B 30% 35% 37% 37% 38% 28% 37%
Llama 3.3 70B 22% 27% 30% 47% 38% 27% 32%
Ministral 14B 42% 31% 48% 32% 28% 27% 44%
Mis.Large 3 33% 41% 48% 41% 37% 32% 44%

Core Entities (found by ALL models)

29 entities out of 379 total unique (7.7% agreement)

alzheimer's disease, anti-lgi1 encephalitis, c-reactive protein, c1q, canakinumab, cancer, cancers, cardiovascular diseases, cholesterol-embolization syndrome, colchicine, corticosteroids, covid-19, crp, cyclophosphamide, dermatomyositis, heart failure, igg4-related disease, il-6, il1, ip-10, lgi1, mcp-1, nlrp3, periostin, rheumatoid arthritis, s100a8/a9, sting1, systemic lupus erythematosus, ykl-40

Entity Type Distribution

Model Anatomical structure Association Biomarker Cell CellType Condition Total
Amazon Nova Lite 0 2 1 8 0 0 549
Amazon Nova Micro 0 2 0 6 0 2 370
Amazon Nova Pro 0 6 3 10 4 3 327
Claude Sonnet 4.6 0 0 0 0 0 0 212
Llama 3.1 8B 3 0 0 0 0 9 404
Llama 3.3 70B 0 0 0 0 0 0 238
Ministral 14B 0 4 0 12 0 0 465
Mistral Large 3 0 0 0 0 0 0 393

Model-Exclusive Entities

Entities found by only one model (not extracted by any other):

Model Exclusive Count Examples
Amazon Nova Lite 51 11 major organs, 5,676 adults, adaptive immunity, age-related diseases, anti-inflammatory agents
Amazon Nova Micro 27 adaptive immune response, adults, atlas of the plasma proteome in health and disease, brain aging, cc chemokines
Amazon Nova Pro 8 1,000 proteins with sex and age heterogeneity, 183 diseases with auc > 0.80, 26 promising targets with favorable safety profiles, 53,026 individuals, 650 proteins shared among at least 50 diseases
Claude Sonnet 4.6 3 circrna, extracellular matrix, type i interferon
Llama 3.1 8B 19 cardiovascular disease, castleman's disease, cognition, development, endocrinological disease
Llama 3.3 70B 11 ccs, endocrinological, gastrointestinal, gwas, immunological
Ministral 14B 21 45,334 lead associations in the gwas catalog, 938 genes encoding potential drug targets, chitinase-3-like protein 1 (chi3l1), circrna-derived peptides, diseased tissues
Mistral Large 3 9 cardiac biomarker, chitinase protein family 18, chitinase protein family 18, subfamily a, disease state, incident diseases

Top 20 Entities (across all models)

Entity Mentions Models
IL-6 62 8/8
Alzheimer's Disease 53 8/8
rheumatoid arthritis 42 8/8
cancer 41 8/8
cancers 39 8/8
cardiovascular diseases 38 8/8
BRCA1 33 3/8
diseases 30 6/8
heart failure 27 8/8
inflammatory diseases 27 6/8
plasma proteome 25 6/8
Cholesterol-embolization syndrome 25 8/8
MCP-1 24 8/8
COVID-19 24 8/8
IP-10 24 8/8
LGI1 24 8/8
C1q 24 8/8
IgG4-related disease 24 8/8
S100A8/A9 24 8/8
NLRP3 24 8/8

Key Findings

  1. Llama 3.3 70B and Claude Sonnet 4.6 are the most consistent (91%/90.6%) but extract fewer entities — conservative, high-precision extraction
  2. Llama 3.1 8B offers the best cost/quality tradeoff: 88.6% consistency with 67% more entities than 70B, at 2s latency
  3. Ministral 14B extracts the most entities (9.1 avg) with decent consistency (78%) — good for recall-oriented tasks
  4. Amazon Nova models are the least consistent (62-67%) despite varying sizes — architecture matters more than scale
  5. Only 7.7% of entities are agreed upon by ALL models — model choice significantly impacts KG content
  6. Highest pairwise agreement: Nova Pro ∩ Ministral 14B (48.3%) and Claude ∩ Llama 3.3 (46.5%)
  7. Zero garbage for most models — the structured prompt with anti-examples works well

Recommendations

Use Case Recommended Model Reason
Production ingestion (millions of papers) Llama 3.1 8B Best speed/cost/quality balance
High-precision extraction Llama 3.3 70B Highest consistency, zero garbage
Maximum recall Ministral 14B Most entities extracted
Ensemble (golden dataset) Llama 3.1 8B + Claude 4.6 Complementary strengths

Overlap Visualization

Entity Overlap Network

Edge thickness = Jaccard similarity. Node size = unique entity count.