Title |
Efficiently adapting large pre-trained models for real-time violence recognition in smart city surveillance |
ID_Doc |
38939 |
Authors |
Ren, XH; Fan, WZ; Wang, YH |
Title |
Efficiently adapting large pre-trained models for real-time violence recognition in smart city surveillance |
Year |
2024 |
Published |
Journal Of Real-Time Image Processing, 21, 4 |
DOI |
10.1007/s11554-024-01486-w |
Abstract |
Recently, the concept of smart cities has gained prominence, aiming to enhance urban efficiency, safety, and quality of life through advanced technologies. A critical component of this infrastructure is the extensive use of surveillance systems to monitor public spaces for violent behavior detection. As the scale of data and models grows, large-scale pre-trained models demonstrate remarkable capabilities across a wide range of applications. However, adapting these models for violence recognition in surveillance videos poses several challenges, including the fine-tuning cost, lack of temporal modeling, and inference overhead. In this paper, we propose an efficient recognition framework to adapt pre-trained models for violence behavior recognition, which consists of two paths, named spatial path and motion path. Our proposed framework allows for real-time parameter updating and real-time inference, which is adaptable to various ViT-based pre-trained models. Both paths adopt the pipeline of parameter-efficient fine-tuning to ensure the real-time performance of the model updating. What's more, within the motion path, as multiple frames need to be processed to capture temporal features, the real-time performance of the model is a challenge. Considering this, to improve the efficiency of inference, we compress multiple frames into the size of a single standard image, ensuring the real-time performance of inference. Experiments on five datasets demonstrate that our framework achieves state-of-the-art performance, efficiently transferring pre-trained large models to violence behavior recognition. |
Author Keywords |
Smart city; Surveillance video; Real-time violence recognition; Large pre-trained model; Parameter-efficient fine-tuning |
Index Keywords |
Index Keywords |
Document Type |
Other |
Open Access |
Open Access |
Source |
Science Citation Index Expanded (SCI-EXPANDED) |
EID |
WOS:001248059400002 |
WoS Category |
Computer Science, Artificial Intelligence; Engineering, Electrical & Electronic; Imaging Science & Photographic Technology |
Research Area |
Computer Science; Engineering; Imaging Science & Photographic Technology |
PDF |
|