Multimodal Image Fusion Network Based on Class Activation and Multiscale Edge Gradients

Shenglin Yang

Abstract: Image fusion has garnered significant attention for effectively addressing the inadequate information representation of single-modality images. However, existing methods often overemphasize visual effects while neglecting the requirements of downstream high-level vision tasks. To tackle this issue, this paper proposes a Semantic Information Driven Multimodal Image Fusion network (SIDM-Fusion). First, a semantic-driven fusion framework is constructed to dynamically guide the fusion process by establishing the correspondence between modal features and class activation weights. Second, a Multiscale Edge Gradient Block (MEGB) is designed to adaptively extract multiscale local features while reinforcing edge information in combination with the Sobel operator. Finally, a Semantic Prior Classification Network (SPC-Net) is proposed to extract multi-level semantic information by establishing long-range dependencies. By introducing class activation weights into the feature fusion process, an adaptive and fully learnable fusion rule is achieved, avoiding the need for manual design. Experimental results demonstrate that, compared with current state-of-the-art methods, the proposed network achieves superior fusion performance across multiple datasets, with improvements of 22% in Average Gradient (AG) and 14% in Mutual Information (MI).

Keywords: Multiscale features, Infrared and visible images, Multimodal image fusion, Deep learning

View Article PDF

Rate This Article