A Simple Way to Find Outliers in an array with Python

Using a basic definition of an outlier we can write a simple Python function to detect such values and highlight them on a plot.

In statistics, an outlier is an observation point that is distant from other observations. An outlier may be due to variability in the measurement or it may indicate experimental error.

The outlier definition is somewhat vague and subjective, so it's helpful to have a rule to help in considering if a data point truly is an outlier. One rule that is very simple to apply utilizes the interquartile range (or IQR): IQR = Q_3 - Q_1, where Q_1, Q_3 - the lower and upper quartiles.

The rule of thumb is to define as a suspected outlier any data point outside the interval [Q_1 - 1.5 * IQR, Q_3 + 1.5 * IQR].

Any potential outlier obtained by this method should be examined in the context of the entire set of data. Outliers are not necessary erroneous, sometimes they can be an indicator of something interesting in data and give a knowledge you can act on.