212
Few-Shot Multimodal Medical Imaging: A Theoretical Framework
arXiv:2511.01140v2 Announce Type: replace-cross
Abstract: Medical imaging often operates under limited labeled data, especially in rare disease and low resource clinical environments. Existing multimodal and meta learning approaches improve performance in these settings but lack a theoretical explanation of why or when they succeed. This paper presents a unified theoretical framework for few shot multimodal medical imaging that jointly characterizes sample complexity, uncertainty quantification, and interpretability. Using PAC learning, VC theory, and PAC Bayesian analysis, we derive bounds that describe the minimum number of labeled samples required for reliable performance and show how complementary modalities reduce effective capacity through an information gain term. We further introduce a formal metric for explanation stability, proving that explanation variance decreases at an inverse n rate. A sequential Bayesian interpretation of Chain of Thought reasoning is also developed to show stepwise posterior contraction. To illustrate these ideas, we implement a controlled multimodal dataset and evaluate an additive CNN MLP fusion model under few shot regimes, confirming predicted multimodal gains, modality interference at larger sample sizes, and shrinking predictive uncertainty. Together, the framework provides a principled foundation for designing data efficient, uncertainty aware, and interpretable diagnostic models in low resource settings.
Abstract: Medical imaging often operates under limited labeled data, especially in rare disease and low resource clinical environments. Existing multimodal and meta learning approaches improve performance in these settings but lack a theoretical explanation of why or when they succeed. This paper presents a unified theoretical framework for few shot multimodal medical imaging that jointly characterizes sample complexity, uncertainty quantification, and interpretability. Using PAC learning, VC theory, and PAC Bayesian analysis, we derive bounds that describe the minimum number of labeled samples required for reliable performance and show how complementary modalities reduce effective capacity through an information gain term. We further introduce a formal metric for explanation stability, proving that explanation variance decreases at an inverse n rate. A sequential Bayesian interpretation of Chain of Thought reasoning is also developed to show stepwise posterior contraction. To illustrate these ideas, we implement a controlled multimodal dataset and evaluate an additive CNN MLP fusion model under few shot regimes, confirming predicted multimodal gains, modality interference at larger sample sizes, and shrinking predictive uncertainty. Together, the framework provides a principled foundation for designing data efficient, uncertainty aware, and interpretable diagnostic models in low resource settings.
No comments yet.