Deep Learning vs. Statistical Methods: Evaluating Anomaly Detection Techniques in IoT Telemetry Data
Main Article Content
Abstract
The exponential growth on the Internet of Things (IoT) devices' deployment across various sectors, from healthcare to smart cities, has underscored the criticality of monitoring their health and functionality. While IoT devices bring efficiency and automation, they also introduce vulnerabilities. Anomalies in their telemetry data can indicate device malfunctions, security breaches, or other significant operational issues. Therefore, developing and evaluating robust anomaly detection techniques are imperative to maintain device integrity and the security of interconnected systems. In this study, we delve deep into the comparative analysis of four anomaly detection methodologies: Moving Average and Interquartile Range (IQR) statistical methods and the machine learning techniques of Autoencoders and One-Class SVMs. Our primary dataset is synthetically generated telemetry data, mirroring real-world IoT device outputs like temperature, battery level, and signal strength. The anomalies were systematically injected into the dataset to mimic typical outlier patterns observed in genuine telemetry data, facilitating a controlled environment for method evaluation. Our findings present a compelling narrative on the efficacy of various anomaly detection methodologies. The deep learning-based Autoencoder emerged as the top performer, achieving an F1-score of 0.7279. This demonstrates that with adequate training and data representation, deep learning models can effectively discern intricate patterns in the data, potentially highlighting their suitability for complex IoT telemetry datasets. The Moving Average, a more straightforward statistical method, also showcased commendable performance with an F1-score of 0.4388, reinforcing that simpler ways should not be overlooked, especially when interpretability is a priority. Conversely, the more advanced One-Class SVM and the conventional statistical method, IQR, trailed in performance. This deviation from expected outcomes underlines the premise that algorithm selection should be highly contextual based on the specific nature and nuances of the dataset at hand. While One-Class SVMs have been effective in other domains, they might not be optimal for this synthetic IoT dataset. Similarly, the IQR's lower performance might indicate that the injected anomalies do not always result in significant deviations from the interquartile values, emphasizing the importance of understanding data distribution when applying statistical anomaly detection methods. The continued integration of IoT devices across industries necessitates robust and accurate anomaly detection techniques. Our study highlights the importance of methodological evaluation in the context of specific datasets. The surprising superiority of the Autoencoder underscores the potential of deep learning in discerning complex patterns, while the commendable performance of Moving Average reiterates the relevance of traditional methods. As we venture further into an interconnected digital age, iterative evaluations like ours will be pivotal in guiding method selection, ensuring the reliability and security of IoT devices.