Introduction and Performance Comparison of Various Outlier Detection Models

1. Introduction

Anomaly or outlier detection is the process of identifying data points, observations, or events that deviate from normal behaviours or distribution in datasets. Anomalous data can indicate potential critical incidents, such as fraudulent transactions, network intrusion, technical failure, etc. In contrast to standard classification or prediction tasks, anomaly detection is often applied on unlabelled dataset, taking only the internal structure and correlation of the dataset into account.

Photo by Will Myers on Unsplash

Numerous machine learning models are suitable for outlier detection. However, supervised models are more constraining than unsupervised models as they need to be provided with labelled datasets. This requirement is particularly expensive when…

A Python library to achieve that with only one line of code

Photo by David Becker on Unsplash

Pandas Library

Pandas is the most popular library for data wrangling and processing in Python. It has a lot of different functions that make data manipulation and transformation quite simple and flexible. But Pandas is known to have issues about scalability and efficiency.

By default, Pandas executes its functions as a single process using only one CPU core, so it does not natively take advantage of all of the cores on your system and computing power effectively. When it comes to handling large datasets or extensive calculations, Pandas will become very slow.

Modin Library

Modin is a new lightweight library designed to parallelize and…

Nate Dong, Ph.D.

A full stack data scientist

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store