Abstract
This work proposes a multimodal contrastive learning framework inspired by CLIP, designed to align seven heterogeneous clinical modalities related to obesity: fat composition, muscle composition, biochemistry, anthropometry, demographics, metabolic profile, and cardiovascular physiology. Each modality is encoded through a dedicated neural network and projected into a shared latent space, with fat composition used as the anchor modality during training. A control parameter is introduced to balance the intra-modality contribution and fat–modality alignment, enhancing cluster separability and semantic coherence. Furthermore, the framework is evaluated for its capacity to reconstruct missing modalities in the latent space through similarity-based imputation.The method is benchmarked against traditional imputation approaches such as KNN and MICE, showing competitive or superior performance across multiple clinical targets. The results demonstrate the framework’s potential to effectively align complex clinical data and enable meaningful patient stratification, contributing to more personalized and accurate obesity phenotyping in clinical and research settings.

Image 1. Visual representation of the latent space before and after training.

Image 2. Cosine similarity matrix between modalities after training.
BibTeX
@article{Bernabe2025,
title={Multimodal Contrastive Learning for Clinical Data Alignment via Fat Composition Representations},
author={},
journal={Ucami2025},
year={2025},
url={https://github.com/Bernabe19/multimodal-contrastive-learning-for-clinical-data-alignment-via-fat-composition-representations}
}