International Journal of Scientific Engineering and Research (IJSER)
Call for Papers | Fully Refereed | Open Access | Double Blind Peer Reviewed | ISSN: 2347-3878


Downloads: 2

China | Computer and Mathematical Sciences | Volume 13 Issue 6, June 2025 | Pages: 81 - 87


Multimodal Semantic Interaction for Text Image Super-Resolution

Xin Jiang

Abstract: To address the issues in existing methods where text image feature representation lacks scale adaptability and suffers from insufficient resolution, which makes it difficult for the recognizer to extract the correct textual information for guiding the reconstruction network, we propose a multimodal semantic interaction-based text image super-resolution reconstruction method. By using the attention mask in the semantic reasoning module, we correct the textual content information, obtain semantic prior knowledge, and constrain and guide the network to reconstruct semantically accurate text super-resolution images. To enhance the network's representation capability and adapt to text images of different shapes and lengths, we design a multimodal semantic interaction block. Its basic components include a visual dual-stream integration block, a cross-modal adaptive fusion block, and an orthogonal bidirectional gated recurrent unit. Experimental results show that, on the Textzoom test set, our proposed method outperforms other mainstream methods in terms of PSNR and SSIM quantitative metrics, with average recognition accuracy improvements of 2.9%, 3.6%, and 3.7% on three recognizers (ASTER, MORAN, and CRNN) compared to the TPGSR model. These results demonstrate that the text image super-resolution reconstruction method based on multimodal semantic interaction can effectively improve text recognition accuracy.

Keywords: super-resolution reconstruction; text image; feature semantic prior; multi-modal



Citation copied to Clipboard!

Rate this Article

5

Characters: 0

Received Comments

No approved comments available.

Rating submitted successfully!


Top