Statistics 简明教程

Statistics - Outlier Function

概率分布函数中的异常值是指比数据集长度的1.5倍还多的数字,远离下四分位数或上四分位数。具体来说,如果一个数字小于${Q_1 - 1.5 \times IQR}$或大于${Q_3 + 1.5 \times IQR}$,则它是一个异常值。

An outlier in a probability distribution function is a number that is more than 1.5 times the length of the data set away from either the lower or upper quartiles. Specifically, if a number is less than ${Q_1 - 1.5 \times IQR}$ or greater than ${Q_3 + 1.5 \times IQR}$, then it is an outlier.

异常值由以下概率函数定义和给出:

Outlier is defined and given by the following probability function:

Formula

其中——

Where −

  1. ${Q_1}$ = First Quartile

  2. ${Q_2}$ = Third Quartile

  3. ${IQR}$ = Inter Quartile Range

Example

Problem Statement:

Problem Statement:

考虑一个数据集,该数据集表示8个不同学生的周期性任务计数。任务计数信息集为11、13、15、3、16、25、12和14。从学生的周期性任务计数中找出异常数据。

Consider a data set that represents the 8 different students periodic task count. The task count information set is, 11, 13, 15, 3, 16, 25, 12 and 14. Discover the outlier data from the students periodic task counts.

Solution:

Solution:

给定数据集为:

Given data set is:

11

13

15

3

16

25

12

14

按升序排列:

Arrange it in ascending order:

3

11

12

13

14

15

16

25

第一四分位数的值(${Q_1}$)

First Quartile Value() ${Q_1}$

第三四分位数 ()${Q_3}$

Third Quartile Value() ${Q_3}$

较低离群范围 (L)

Lower Outlier Range (L)

较大离群范围 (L)

Upper Outlier Range (L)

在给定的信息中,5.5 和 21.5 比给定数据集中的其他值较大,即除了 3 和 25,因为 3 大于 5.5,而 25 小于 21.5。

In the given information, 5.5 and 21.5 is more greater than the other values in the given data set i.e. except from 3 and 25 since 3 is greater than 5.5 and 25 is lesser than 21.5.

通过这种方式,我们将 3 和 25 用作离群值。

In this way, we utilize 3 and 25 as the outlier values.