Real-Time Fall Detection Using Compressed Video Analytics
Main Article Content
Abstract
In video analytics, the detection of a human fall event is a critical application of video surveillance systems installed for the safety of children and senior citizens. In this paper, a novel method is proposed for real-time human fall event detection in a compressed video, which employs motion vectors (MV), and Transformer Prediction Heads -You only look once version five (TPH-YOLOv5) with fuzzy logic based multi object tracking. It is termed as FMVCNN. The video compression formats, namely, MPEG-4 and H.264 are examined for validating this method. The proposed method can be adapted to any format of video codecs and any type of camera settings without any prior setup.Numerous algorithms have been explored in the literature for detecting human fall events within compressed domain video, but they suffer from limitations on account of (i) keyframes set at a constant interval, (ii) utilization of only P frames, and (iii) setup specially for a given particular codec i.e. need to resetup every time codec changes. The proposed method addresses these limitations by using keyframe intervals of variable length, utilization of P/B frames, and setting up different codec variants. Further the crucial step of event box prediction in video frames is done using fuzzy logic, where in the motion vectors that constitute event box is a fuzzy set representing uncertainty in motion vectors related to an event. The experimental setup takes into account the benchmark datasets for fall events, which are Le2i, UR, and Multiple Cameras. The experimental outcomes of the proposed FMVCNN approach encouraging and compare well with those in recent literature for raw (uncompressed) video data. The proposed FMVCNN surpasses existing contemporary approaches executed in the compressed domain by markedly enhancing both the accuracy and speed of event detection. The ablation study considered various FMVCNN variants resulting from different video codecs, fuzzy representations, video datasets, and YOLO architectures.