Advancing Activity Recognition in Tennis: Employing Bag of Words Approach for Enhanced Video Analysis
Main Article Content
Abstract
The study presents a novel video representation technique for activity recognition, focusing on modeling video dynamics with activity attributes. The video sequence is divided into short-term segments characterized by its dynamic features. These segments are then represented using a dictionary of attribute dynamics templates based on a generative model known as the binary dynamic system (BDS). The process involves learning the dictionary of BDSs from a training dataset and quantizing attribute sequences extracted from videos into BDS codewords, resulting in a histogram known as the bag-of-words for attribute dynamics (BoWAD). Extensive experimental evaluation demonstrates the superiority of the BoWAD representation compared to other state-of-the-art methods in capturing temporal structure for complex activity recognition in videos. The proposed approach offers a robust and effective means to model video dynamics, thereby enhancing the accuracy and performance of activity recognition systems. The experimental analysis highlighted the impressive performance of the proposed approach in accurately identifying the tennis events' Bounce, Net, and Hit. The model achieved outstanding accuracy (87.92%), recall (92.08%), and precision (87.92%).