Network anomaly detection using Machine Learning techniques

During the last decade different scientists have presented multiple proposals to define outliers in a dataset, without reaching a clear agreement to date. Despite this ambiguity in the definition of outliers, society attempts to avoid the presence of outliers in any environment or scenario. A topical issue is network anomalies, which are data that are outside the norm, and it is not possible to give a strict definition due to the wide variety of anomalies that can occur. By using anomaly detection systems, network administrators are able to resolve anomalies in considerably less time than without the use of these systems. Early and effective detection of network anomalies is a field of research that most companies and public institutions need in order to avoid service failures or breakdowns. For this reason, an adequate anomaly detection system is of vital importance in any field, but especially in communications networks.To solve this problem and contribute research to the scientific community, different machine learning algorithms are developed to objectively and accurately detect network anomalies. To achieve this purpose, a network topology suitable for its characterization was designed and implemented in an emulator to generate network traffic. Subsequently, a dataset was generated by sending multimedia content in a controlled environment, collecting information about the network status and metrics obtained through a network probe.Then, an exploratory analysis of the data was performed and several machine learning models were developed to implement an anomaly detection system. For the supervised models, a threshold for anomaly labeling was established and the models were trained to identify anomalous data. In addition, techniques were developed to explain model predictions and unsupervised techniques, such as clustering, that do not require labeled data.Finally, the results obtained with both techniques were compared. By using algorithms such as Logistic Regression, Random Forest, Support Vector Machine, accuracy values from 91% to 97% were achieved. This shows that, through a preliminary analysis and training of the network, it is possible to establish effective patterns for anomaly detection. The results obtained validate that the developed system provides objective, accurate and automatic detections, contributing significantly to the field of network monitoring and security.

​During the last decade different scientists have presented multiple proposals to define outliers in a dataset, without reaching a clear agreement to date. Despite this ambiguity in the definition of outliers, society attempts to avoid the presence of outliers in any environment or scenario. A topical issue is network anomalies, which are data that are outside the norm, and it is not possible to give a strict definition due to the wide variety of anomalies that can occur. By using anomaly detection systems, network administrators are able to resolve anomalies in considerably less time than without the use of these systems. Early and effective detection of network anomalies is a field of research that most companies and public institutions need in order to avoid service failures or breakdowns. For this reason, an adequate anomaly detection system is of vital importance in any field, but especially in communications networks.To solve this problem and contribute research to the scientific community, different machine learning algorithms are developed to objectively and accurately detect network anomalies. To achieve this purpose, a network topology suitable for its characterization was designed and implemented in an emulator to generate network traffic. Subsequently, a dataset was generated by sending multimedia content in a controlled environment, collecting information about the network status and metrics obtained through a network probe.Then, an exploratory analysis of the data was performed and several machine learning models were developed to implement an anomaly detection system. For the supervised models, a threshold for anomaly labeling was established and the models were trained to identify anomalous data. In addition, techniques were developed to explain model predictions and unsupervised techniques, such as clustering, that do not require labeled data.Finally, the results obtained with both techniques were compared. By using algorithms such as Logistic Regression, Random Forest, Support Vector Machine, accuracy values from 91% to 97% were achieved. This shows that, through a preliminary analysis and training of the network, it is possible to establish effective patterns for anomaly detection. The results obtained validate that the developed system provides objective, accurate and automatic detections, contributing significantly to the field of network monitoring and security. Read More