Statistics 简明教程

Statistics - Kolmogorov Smirnov Test

这种方法用于需要对观测样本分布和理论分布进行比较的情况。

This test is used in situations where a comparison has to be made between an observed sample distribution and theoretical distribution.

K-S One Sample Test

此检验可用作拟合优度检验，且当样本量较小时很理想。它会针对指定分布的变量比较累积分布函数。零假设认为观察到的分布与理论分布之间不存在差异，检验统计量“D”的值计算如下：

This test is used as a test of goodness of fit and is ideal when the size of the sample is small. It compares the cumulative distribution function for a variable with a specified distribution. The null hypothesis assumes no difference between the observed and theoretical distribution and the value of test statistic 'D' is calculated as:

Formula

其中——

Where −

${F_o(X)}$ = Observed cumulative frequency distribution of a random sample of n observations.
and ${F_o(X) = \frac{k}{n}}$ = (No.of observations ≤ X)/(Total no.of observations).
${F_r(X)}$ = The theoretical frequency distribution.

${D}$ 的临界值可从 K-S 表中单样本检验的值中找到。

The critical value of ${D}$ is found from the K-S table values for one sample test.

Acceptance Criteria: 如果计算值小于临界值，则接受零假设。

Acceptance Criteria: If calculated value is less than critical value accept null hypothesis.

Rejection Criteria: 如果计算值大于表值，则拒绝零假设。

Rejection Criteria: If calculated value is greater than table value reject null hypothesis.

Example

Problem Statement:

在一项针对某所大学各个专业的 60 位学生进行的研究中，从每个专业抽取数量相等的学生，对他们进行了采访，并记录了他们加入大学戏剧俱乐部的意愿。

In a study done from various streams of a college 60 students, with equal number of students drawn from each stream, are we interviewed and their intention to join the Drama Club of college was noted.

	B.Sc.	B.A.	B.Com	M.A.	M.Com
No. in each class	5	9	11	16	19

预期每个班级的 12 位学生会加入戏剧俱乐部。使用 K-S 检验来找出学生班级在加入戏剧俱乐部的意愿方面是否存在差异。

It was expected that 12 students from each class would join the Drama Club. Using the K-S test to find if there is any difference among student classes with regard to their intention of joining the Drama Club.

Solution:

${H_o}$：不同专业学生在加入戏剧俱乐部的意愿方面不存在差异。

${H_o}$: There is no difference among students of different streams with respect to their intention of joining the drama club.

我们计算观察分布和理论分布的累积频率。

We develop the cumulative frequencies for observed and theoretical distributions.

Streams

No. of students interested in joining

${F_O(X)}$

${F_T(X)}$

F_O(X)-F_T(X)

Observed (O)

Theoretical (T)

B.Sc.

5/60

12/60

7/60

B.A.

14/60

24/60

10/60

B.COM.

25/60

36/60

11/60

M.A.

41/60

48/60

7/60

M.COM.

60/40

60/60

Total

n=60

检验统计量 ${|D|}$ 计算如下：

Test statistic ${|D|}$ is calculated as:

在 5% 显著性水平下的 D 表值为

The table value of D at 5% significance level is given by

由于计算值大于临界值，因此我们拒绝零假设，并得出结论：不同专业学生在加入戏剧俱乐部的意愿方面存在差异。

Since the calculated value is greater than the critical value, hence we reject the null hypothesis and conclude that there is a difference among students of different streams in their intention of joining the Club.

K-S Two Sample Test

当存在两个独立样本而非一个样本时，可以使用 K-S 二样本检验来检验两个累积分布之间的一致性。零假设指出两个分布之间不存在差异。D 统计量的计算方式与 K-S 单样本检验相同。

When instead of one, there are two independent samples then K-S two sample test can be used to test the agreement between two cumulative distributions. The null hypothesis states that there is no difference between the two distributions. The D-statistic is calculated in the same manner as the K-S One Sample Test.

Formula

其中——

Where −

${n_1}$ = Observations from first sample.
${n_2}$ = Observations from second sample.

可以看出，当累积分布显示出较大的最大偏差 ${|D|}$ 时，表示两个样本分布之间存在差异。

It has been seen that when the cumulative distributions show large maximum deviation ${|D|}$ it is indicating towards a difference between the two sample distributions.

如果样本的${n_1 = n_2}$并且⇐40，则使用两样本案例的K-S表格。如果${n_1}$和/或${n_2}$>40，则应该使用两样本检验的大样本的K-S表格。如果计算值小于表格值，则接受原假设，反之亦然。

The critical value of D for samples where ${n_1 = n_2}$ and is ≤ 40, the K-S table for two sample case is used. When ${n_1}$ and/or ${n_2}$ > 40 then the K-S table for large samples of two sample test should be used. The null hypothesis is accepted if the calculated value is less than the table value and vice-versa.

因此，使用任何这些非参数检验可以帮助研究者在目标人群的特征未知或没有对它们做出任何假设时测试其结果的显著性。

Thus use of any of these nonparametric tests helps a researcher to test the significance of his results when the characteristics of the target population are unknown or no assumptions had been made about them.