VAP-Diffusion: Enriching Descriptions with MLLMs for Enhanced Medical Image Generation

Published in MICCAI, 2025

As the appearance of medical images is influenced by multiple underlying factors, generative models require rich attribute informationbeyond labels to produce realistic and diverse images. For instance,generating an image of skin lesion with specific patterns demands descriptionsthat go beyond diagnosis, such as shape, size, texture, and color.However, such detailed descriptions are not always accessible. To addressthis, we explore a framework, termed Visual Attribute Prompts (VAP)-Diffusion, to leverage external knowledge from pre-trained Multi-modalLarge Language Models (MLLMs) to improve the quality and diversity of medical image generation. First, to derive descriptions from MLLMs without hallucination, we design a series of prompts following Chainof-Thoughts for common medical imaging tasks, including dermatologic, colorectal, and chest X-ray images. Generated descriptions are utilized during training and stored across different categories. During testing, descriptions are randomly retrieved from the corresponding category for inference. Moreover, to make the generator robust to unseen combination of descriptions at the test time, we propose a Prototype Condition Mechanism that restricts test embeddings to be similar to those from training. Experiments on three common types of medical imaging across four datasets verify the effectiveness of VAP-Diffusion.

Recommended citation: updating...
Download Paper

Share on

Twitter Facebook LinkedIn

Bowen Guo

Share on