Investigating Text-Guided Cross-Region Feature Alignment for Multimodal Disease Localization in Chest X-Ray Images
Publication Date : Nov-01-2025
Author(s) :
Volume/Issue :
Abstract :
Deep learning object detection techniques have been extensively applied to lung- and chest-related healthcare applications. Recent advances in text-guided object detection techniques have led to substantial performance improvements over image-based detection techniques. While such models employing traditional region–text similarity have been explored for detecting abnormalities in chest X-rays, the efficacy of models leveraging the concept of region-region similarity in this domain remains largely unexamined. Although such architectures have demonstrated effectiveness in natural scene contexts, their applicability to chest X-rays has been restricted due to the inherent challenges of the medical object detection task. This gap prompts the question of whether chest X-ray-based disease detection can be performed by training cross-region feature alignment architectures. In this study, this question is addressed by systematically investigating a text-guided region-region similarity based object detection architecture, dubbed CXR-CoDet. For this, this work investigates multiple training hyperparameter configurations (with varying learning rate, batch size, number of training iterations), number of support images needed for co-occurrence computation, different pretrained weights, different granularity of disease descriptions, and incorporation of medical information through the text encoder. This work also underscores the limitations of region-region similarity-based object detection architectures, particularly applied in medical imaging, and provides recommendations for improvements. Code is available at: https://github.com/souryatech/TGCRFA-CXR.git
