KG Extraction Model Comparison — Full Report¶

Date: 2026-05-22
Abstracts: 17 | Models: 8 | Runs per model: 3 | Total LLM calls: 408

Test Corpus¶

#	PMID	Title	Link
1	39579765	Atlas of the plasma proteome in health and disease in 53,026 adults.	PubMed
2	34857953	Large-scale integration of the plasma proteome with genetics and disea...	PubMed
3	38057571	Organ aging signatures in the plasma proteome track health and disease...	PubMed
4	34233864	MCP-1: Function, regulation, and involvement in disease.	PubMed
5	27590293	Anti-LGI1 encephalitis: Clinical syndrome and long-term follow-up.	PubMed
6	27928579	Protein-Protein Interface and Disease: Perspective from Biomolecular N...	PubMed
7	34735869	Circulating C1q levels in health and disease, more than just a biomark...	PubMed
8	22568099	[IgG4-related disease].	PubMed
9	38538601	S100A8/A9 as a prognostic biomarker with causal effects for post-acute...	PubMed
10	24497460	Ensembles of protein termini and specific proteolytic signatures as ca...	PubMed
11	17157395	Pregnancy-associated plasma protein-A: an emerging cardiac biomarker.	PubMed
12	31037637	Clinical Applications Targeting Periostin.	PubMed
13	33532887	STING-Mediated Lung Inflammation and Beyond.	PubMed
14	31371977	Cholesterol-embolization syndrome: current perspectives.	PubMed
15	38125621	YKL-40 as a biomarker in various inflammatory diseases: A review.	PubMed
16	24631021	UPLC-MS(E) application in disease biomarker discovery: the discoveries...	PubMed
17	32781555	Protein-Related Circular RNAs in Human Pathologies.	PubMed

Model Performance Summary¶

Model	Size	Avg Entities	Avg Rels	Consistency	Unique Entities	Unique Rels	Avg Latency	Total Tokens
Llama 3.3 70B	70B	4.7	4.1	91.0%	75	81	2566ms	45,910
Claude Sonnet 4.6	~70B	4.2	3.6	90.6%	73	61	5425ms	49,740
Llama 3.1 8B	8B	7.9	6.5	88.6%	119	161	1989ms	59,788
Ministral 14B	14B	9.1	8.3	78.1%	172	217	5214ms	72,403
Mistral Large 3	675B	7.7	6.4	77.8%	137	162	4920ms	60,222
Amazon Nova Pro	~30B	6.4	5.3	66.7%	135	138	3034ms	51,691
Amazon Nova Lite	~8B	10.8	7.9	66.1%	222	234	12459ms	68,390
Amazon Nova Micro	~3B	7.3	5.3	61.8%	153	184	1737ms	54,372

Entity Overlap (Jaccard Similarity)¶

Higher = more agreement between models on what to extract.

	Nova Lite	Nova Micro	Nova Pro	C.Sonnet 4.6	Llama 3.1 8B	Llama 3.3 70B	Ministral 14B	Mis.Large 3
Nova Lite	—	38%	41%	22%	30%	22%	42%	33%
Nova Micro	38%	—	35%	30%	35%	27%	31%	41%
Nova Pro	41%	35%	—	36%	37%	30%	48%	48%
C.Sonnet 4.6	22%	30%	36%	—	37%	47%	32%	41%
Llama 3.1 8B	30%	35%	37%	37%	—	38%	28%	37%
Llama 3.3 70B	22%	27%	30%	47%	38%	—	27%	32%
Ministral 14B	42%	31%	48%	32%	28%	27%	—	44%
Mis.Large 3	33%	41%	48%	41%	37%	32%	44%	—

Core Entities (found by ALL models)¶

29 entities out of 379 total unique (7.7% agreement)

alzheimer's disease, anti-lgi1 encephalitis, c-reactive protein, c1q, canakinumab, cancer, cancers, cardiovascular diseases, cholesterol-embolization syndrome, colchicine, corticosteroids, covid-19, crp, cyclophosphamide, dermatomyositis, heart failure, igg4-related disease, il-6, il1, ip-10, lgi1, mcp-1, nlrp3, periostin, rheumatoid arthritis, s100a8/a9, sting1, systemic lupus erythematosus, ykl-40

Entity Type Distribution¶

Model	Anatomical structure	Association	Biomarker	Cell	CellType	Condition	Total
Amazon Nova Lite	0	2	1	8	0	0	549
Amazon Nova Micro	0	2	0	6	0	2	370
Amazon Nova Pro	0	6	3	10	4	3	327
Claude Sonnet 4.6	0	0	0	0	0	0	212
Llama 3.1 8B	3	0	0	0	0	9	404
Llama 3.3 70B	0	0	0	0	0	0	238
Ministral 14B	0	4	0	12	0	0	465
Mistral Large 3	0	0	0	0	0	0	393

Model-Exclusive Entities¶

Entities found by only one model (not extracted by any other):

Model	Exclusive Count	Examples
Amazon Nova Lite	51	11 major organs, 5,676 adults, adaptive immunity, age-related diseases, anti-inflammatory agents
Amazon Nova Micro	27	adaptive immune response, adults, atlas of the plasma proteome in health and disease, brain aging, cc chemokines
Amazon Nova Pro	8	1,000 proteins with sex and age heterogeneity, 183 diseases with auc > 0.80, 26 promising targets with favorable safety profiles, 53,026 individuals, 650 proteins shared among at least 50 diseases
Claude Sonnet 4.6	3	circrna, extracellular matrix, type i interferon
Llama 3.1 8B	19	cardiovascular disease, castleman's disease, cognition, development, endocrinological disease
Llama 3.3 70B	11	ccs, endocrinological, gastrointestinal, gwas, immunological
Ministral 14B	21	45,334 lead associations in the gwas catalog, 938 genes encoding potential drug targets, chitinase-3-like protein 1 (chi3l1), circrna-derived peptides, diseased tissues
Mistral Large 3	9	cardiac biomarker, chitinase protein family 18, chitinase protein family 18, subfamily a, disease state, incident diseases

Top 20 Entities (across all models)¶

Entity	Mentions	Models
IL-6	62	8/8
Alzheimer's Disease	53	8/8
rheumatoid arthritis	42	8/8
cancer	41	8/8
cancers	39	8/8
cardiovascular diseases	38	8/8
BRCA1	33	3/8
diseases	30	6/8
heart failure	27	8/8
inflammatory diseases	27	6/8
plasma proteome	25	6/8
Cholesterol-embolization syndrome	25	8/8
MCP-1	24	8/8
COVID-19	24	8/8
IP-10	24	8/8
LGI1	24	8/8
C1q	24	8/8
IgG4-related disease	24	8/8
S100A8/A9	24	8/8
NLRP3	24	8/8

Key Findings¶

Llama 3.3 70B and Claude Sonnet 4.6 are the most consistent (91%/90.6%) but extract fewer entities — conservative, high-precision extraction
Llama 3.1 8B offers the best cost/quality tradeoff: 88.6% consistency with 67% more entities than 70B, at 2s latency
Ministral 14B extracts the most entities (9.1 avg) with decent consistency (78%) — good for recall-oriented tasks
Amazon Nova models are the least consistent (62-67%) despite varying sizes — architecture matters more than scale
Only 7.7% of entities are agreed upon by ALL models — model choice significantly impacts KG content
Highest pairwise agreement: Nova Pro ∩ Ministral 14B (48.3%) and Claude ∩ Llama 3.3 (46.5%)
Zero garbage for most models — the structured prompt with anti-examples works well

Recommendations¶

Use Case	Recommended Model	Reason
Production ingestion (millions of papers)	Llama 3.1 8B	Best speed/cost/quality balance
High-precision extraction	Llama 3.3 70B	Highest consistency, zero garbage
Maximum recall	Ministral 14B	Most entities extracted
Ensemble (golden dataset)	Llama 3.1 8B + Claude 4.6	Complementary strengths

Overlap Visualization¶

Entity Overlap Network

Edge thickness = Jaccard similarity. Node size = unique entity count.