Date: 2026-05-22
Abstracts: 17 | Models: 8 | Runs per model: 3 | Total LLM calls: 408
Test Corpus
#
PMID
Title
Link
1
39579765
Atlas of the plasma proteome in health and disease in 53,026 adults.
PubMed
2
34857953
Large-scale integration of the plasma proteome with genetics and disea...
PubMed
3
38057571
Organ aging signatures in the plasma proteome track health and disease...
PubMed
4
34233864
MCP-1: Function, regulation, and involvement in disease.
PubMed
5
27590293
Anti-LGI1 encephalitis: Clinical syndrome and long-term follow-up.
PubMed
6
27928579
Protein-Protein Interface and Disease: Perspective from Biomolecular N...
PubMed
7
34735869
Circulating C1q levels in health and disease, more than just a biomark...
PubMed
8
22568099
[IgG4-related disease].
PubMed
9
38538601
S100A8/A9 as a prognostic biomarker with causal effects for post-acute...
PubMed
10
24497460
Ensembles of protein termini and specific proteolytic signatures as ca...
PubMed
11
17157395
Pregnancy-associated plasma protein-A: an emerging cardiac biomarker.
PubMed
12
31037637
Clinical Applications Targeting Periostin.
PubMed
13
33532887
STING-Mediated Lung Inflammation and Beyond.
PubMed
14
31371977
Cholesterol-embolization syndrome: current perspectives.
PubMed
15
38125621
YKL-40 as a biomarker in various inflammatory diseases: A review.
PubMed
16
24631021
UPLC-MS(E) application in disease biomarker discovery: the discoveries...
PubMed
17
32781555
Protein-Related Circular RNAs in Human Pathologies.
PubMed
Model
Size
Avg Entities
Avg Rels
Consistency
Unique Entities
Unique Rels
Avg Latency
Total Tokens
Llama 3.3 70B
70B
4.7
4.1
91.0%
75
81
2566ms
45,910
Claude Sonnet 4.6
~70B
4.2
3.6
90.6%
73
61
5425ms
49,740
Llama 3.1 8B
8B
7.9
6.5
88.6%
119
161
1989ms
59,788
Ministral 14B
14B
9.1
8.3
78.1%
172
217
5214ms
72,403
Mistral Large 3
675B
7.7
6.4
77.8%
137
162
4920ms
60,222
Amazon Nova Pro
~30B
6.4
5.3
66.7%
135
138
3034ms
51,691
Amazon Nova Lite
~8B
10.8
7.9
66.1%
222
234
12459ms
68,390
Amazon Nova Micro
~3B
7.3
5.3
61.8%
153
184
1737ms
54,372
Entity Overlap (Jaccard Similarity)
Higher = more agreement between models on what to extract.
Nova Lite
Nova Micro
Nova Pro
C.Sonnet 4.6
Llama 3.1 8B
Llama 3.3 70B
Ministral 14B
Mis.Large 3
Nova Lite
—
38%
41%
22%
30%
22%
42%
33%
Nova Micro
38%
—
35%
30%
35%
27%
31%
41%
Nova Pro
41%
35%
—
36%
37%
30%
48%
48%
C.Sonnet 4.6
22%
30%
36%
—
37%
47%
32%
41%
Llama 3.1 8B
30%
35%
37%
37%
—
38%
28%
37%
Llama 3.3 70B
22%
27%
30%
47%
38%
—
27%
32%
Ministral 14B
42%
31%
48%
32%
28%
27%
—
44%
Mis.Large 3
33%
41%
48%
41%
37%
32%
44%
—
Core Entities (found by ALL models)
29 entities out of 379 total unique (7.7% agreement)
alzheimer's disease, anti-lgi1 encephalitis, c-reactive protein, c1q, canakinumab, cancer, cancers, cardiovascular diseases, cholesterol-embolization syndrome, colchicine, corticosteroids, covid-19, crp, cyclophosphamide, dermatomyositis, heart failure, igg4-related disease, il-6, il1, ip-10, lgi1, mcp-1, nlrp3, periostin, rheumatoid arthritis, s100a8/a9, sting1, systemic lupus erythematosus, ykl-40
Entity Type Distribution
Model
Anatomical structure
Association
Biomarker
Cell
CellType
Condition
Total
Amazon Nova Lite
0
2
1
8
0
0
549
Amazon Nova Micro
0
2
0
6
0
2
370
Amazon Nova Pro
0
6
3
10
4
3
327
Claude Sonnet 4.6
0
0
0
0
0
0
212
Llama 3.1 8B
3
0
0
0
0
9
404
Llama 3.3 70B
0
0
0
0
0
0
238
Ministral 14B
0
4
0
12
0
0
465
Mistral Large 3
0
0
0
0
0
0
393
Model-Exclusive Entities
Entities found by only one model (not extracted by any other):
Model
Exclusive Count
Examples
Amazon Nova Lite
51
11 major organs, 5,676 adults, adaptive immunity, age-related diseases, anti-inflammatory agents
Amazon Nova Micro
27
adaptive immune response, adults, atlas of the plasma proteome in health and disease, brain aging, cc chemokines
Amazon Nova Pro
8
1,000 proteins with sex and age heterogeneity, 183 diseases with auc > 0.80, 26 promising targets with favorable safety profiles, 53,026 individuals, 650 proteins shared among at least 50 diseases
Claude Sonnet 4.6
3
circrna, extracellular matrix, type i interferon
Llama 3.1 8B
19
cardiovascular disease, castleman's disease, cognition, development, endocrinological disease
Llama 3.3 70B
11
ccs, endocrinological, gastrointestinal, gwas, immunological
Ministral 14B
21
45,334 lead associations in the gwas catalog, 938 genes encoding potential drug targets, chitinase-3-like protein 1 (chi3l1), circrna-derived peptides, diseased tissues
Mistral Large 3
9
cardiac biomarker, chitinase protein family 18, chitinase protein family 18, subfamily a, disease state, incident diseases
Top 20 Entities (across all models)
Entity
Mentions
Models
IL-6
62
8/8
Alzheimer's Disease
53
8/8
rheumatoid arthritis
42
8/8
cancer
41
8/8
cancers
39
8/8
cardiovascular diseases
38
8/8
BRCA1
33
3/8
diseases
30
6/8
heart failure
27
8/8
inflammatory diseases
27
6/8
plasma proteome
25
6/8
Cholesterol-embolization syndrome
25
8/8
MCP-1
24
8/8
COVID-19
24
8/8
IP-10
24
8/8
LGI1
24
8/8
C1q
24
8/8
IgG4-related disease
24
8/8
S100A8/A9
24
8/8
NLRP3
24
8/8
Key Findings
Llama 3.3 70B and Claude Sonnet 4.6 are the most consistent (91%/90.6%) but extract fewer entities — conservative, high-precision extraction
Llama 3.1 8B offers the best cost/quality tradeoff: 88.6% consistency with 67% more entities than 70B, at 2s latency
Ministral 14B extracts the most entities (9.1 avg) with decent consistency (78%) — good for recall-oriented tasks
Amazon Nova models are the least consistent (62-67%) despite varying sizes — architecture matters more than scale
Only 7.7% of entities are agreed upon by ALL models — model choice significantly impacts KG content
Highest pairwise agreement : Nova Pro ∩ Ministral 14B (48.3%) and Claude ∩ Llama 3.3 (46.5%)
Zero garbage for most models — the structured prompt with anti-examples works well
Recommendations
Use Case
Recommended Model
Reason
Production ingestion (millions of papers)
Llama 3.1 8B
Best speed/cost/quality balance
High-precision extraction
Llama 3.3 70B
Highest consistency, zero garbage
Maximum recall
Ministral 14B
Most entities extracted
Ensemble (golden dataset)
Llama 3.1 8B + Claude 4.6
Complementary strengths
Overlap Visualization
Edge thickness = Jaccard similarity. Node size = unique entity count.