Multimodal Processing¶
Vision model integration for extracting entities from figures, tables, and diagrams in scientific PDFs.
Overview¶
Scientific papers contain critical information in non-text elements — pathway diagrams, Western blots, protein structure figures, and data tables. The multimodal processor uses vision models to extract structured entities from these elements.
Capabilities¶
| Element | Model | Extracts |
|---|---|---|
| Pathway diagrams | Llama-3.2-11B-Vision (Bedrock) | Proteins, interactions, pathways |
| Protein structures | Llama-3.2-11B-Vision (Bedrock) | Domains, binding sites, modifications |
| Western blots | Llama-3.2-11B-Vision (Bedrock) | Proteins, expression levels |
| Microscopy | Llama-3.2-11B-Vision (Bedrock) | Cell types, markers, localization |
| Tables | PyMuPDF + LLM | Proteins, diseases, measurements |
Quick Start¶
# Process PDFs with multimodal extraction
uv run python -m pipeline.processors.multimodal_processor \
--pdf-dir ./papers/ \
--database olink1 \
--service bedrock
# With checkpointing (resume on failure)
uv run python -m pipeline.processors.multimodal_processor \
--pdf-dir ./papers/ \
--database olink1 \
--service bedrock \
--checkpoint-db ./checkpoints.sqlite
Processing Pipeline¶
flowchart TD
PDF["PDF Document"]
subgraph extract["Element Extraction"]
IMG["Extract images<br/>(PyMuPDF)"]
TBL["Extract tables<br/>(PyMuPDF table detection)"]
end
subgraph vision["Vision Analysis"]
V1["Classify image type"]
V2["Extract entities<br/>from figures"]
V3["Generate captions"]
end
subgraph table["Table Analysis"]
T1["Column classification<br/>(LLM)"]
T2["Entity mapping"]
T3["Relationship inference"]
end
subgraph merge["Integration"]
M1["Merge with text entities"]
M2["Deduplicate"]
M3["Store in graph"]
end
PDF --> extract
IMG --> vision
TBL --> table
vision --> merge
table --> merge
Configuration¶
# Parallel processing with checkpointing
config = {
"workers": 4, # Parallel document processing
"checkpoint_db": "cp.sqlite", # Resume on failure
"max_images_per_doc": 20, # Limit vision API calls
"min_image_size": (100, 100), # Skip tiny images
}
Batch Processing¶
The processor supports batch processing with SQLite-based checkpointing:
- PENDING → IN_PROGRESS → COMPLETED or FAILED
- Resume capability: skip completed files, retry failed ones
- File change detection via MD5 hashing
- Statistics tracking per document
Key Files¶
| File | Role |
|---|---|
pipeline/processors/multimodal_processor.py |
Main orchestrator |
pipeline/processors/multimodal_chunker.py |
Element-aware chunking |
pipeline/processors/vision_processor.py |
Vision model integration |
pipeline/processors/table_processor.py |
Table extraction and analysis |