Abstract |
In recent years, crowd counting has attracted widespread attention due to its enormous potential in the public safety inspection process. Visual-based measurement techniques have been widely applied in crowd counting, which is a fundamental and challenging subtask in crowd analysis and smart city systems. With the popularization of infrared sensors, an RGB-T crowd counting method that collaboratively utilizes RGB and thermal information to estimate the number of crowds has been developed. However, the failure to effectively explore the complementary information between RGB and thermal modality features has hindered the further development of RGB-T crowding counting. Most existing methods attempt to achieve the effective fusion of RGB and thermal features through various fusion strategies. However, most of these methods ignore the inherent modality differences between the RGB and thermal information, resulting in inaccurate crowd counting. For this, we propose a bidirectional gated and dynamic fusion network (BGDFNet) for RGB-T crowd counting. Specifically, we propose a novel bidirectional gated module (BGM) to make full use of supplementary information between modalities to bridge the modality gaps. In addition, we propose a multiscale dynamic fusion (MSDF) module to achieve effective feature fusion by dynamically selecting the weights of multiscale convolutions. Finally, an effective crowd density map is obtained through the decoder and fully connected layer. On the available RGB-T crowd counting dataset, compared to the existing state-of-the-art (SOTA) methods, the proposed BGDFNet has achieved 12.35%, 8.23%, 2.02%, and 12.01% performance improvements on GAME (0), GAME (1), GAME (2), and root-mean-square error (RMSE), respectively. The code and models are available at https://github.com/ZhengxuanXie/BGDFNet. |