Statistics 简明教程

Statistics - Goodness of Fit

正态分佈 Goodness of Fit 检验用于检验样本数据是否符合总体分佈。总体可能具有正态分布或威布尔分布。简而言之,这意味着样本数据正确地表示了我们期望从实际总体中发现的数据。统计人员通常使用以下检验:

The Goodness of Fit test is used to check the sample data whether it fits from a distribution of a population. Population may have normal distribution or Weibull distribution. In simple words, it signifies that sample data represents the data correctly that we are expecting to find from actual population. Following tests are generally used by statisticians:

  1. Chi-square

  2. Kolmogorov-Smirnov

  3. Anderson-Darling

  4. Shipiro-Wilk

Chi-square Test

卡方检验是最常用于检验拟合优度的检验,并且用于离散分布,如二项式分布和泊松分布,而Kolmogorov-Smirnov和Anderson-Darling 拟合优度检验用于连续分布。

The chi-square test is the most commonly used to test the goodness of fit tests and is used for discrete distributions like the binomial distribution and the Poisson distribution, whereas The Kolmogorov-Smirnov and Anderson-Darling goodness of fit tests are used for continuous distributions.

Formula

其中——

Where −

  1. ${O_i}$ = observed value of i th level of variable.

  2. ${E_i}$ = expected value of i th level of variable.

  3. ${X^2}$ = chi-squared random variable.

Example

一家玩具公司制造足球运动员玩具。该公司声称 30% 的卡片是中场球员、60% 是后卫,10% 是前锋。考虑 100 个玩具的随机样本,其中有 50 个是中场球员、45 个是后卫,5 个是前锋。给定 0.05 的显著性水平,您可以为公司的主张辩护吗?

A toy company builts football player toys. It claims that 30% of the cards are mid-fielders, 60% defenders, and 10% are forwards. Considering a random sample of 100 toys has 50 mid-fielders, 45 defenders, and 5 forwards. Given 0.05 level of significance, can you justify company’s claim?

Solution:

Solution:

Determine Hypotheses

  1. *Null hypothesis $ H_0 $ * - The proportion of mid-fielders, defenders, and forwards is 30%, 60% and 10%, respectively.

  2. *Alternative hypothesis $ H_1 $ * - At least one of the proportions in the null hypothesis is false.

Determine Degree of Freedom

自由度 DF 等于分类变量的级别数 (k) 减 1:DF = k - 1。这里级别为 3。因此

The degrees of freedom, DF is equal to the number of levels (k) of the categorical variable minus 1: DF = k - 1. Here levels are 3. Thus

Determine chi-square test statistic

Determine p-value

P 值是具有 2 个自由度的卡方统计量 $X^2$ 比 19.58 更极端的概率。使用卡方分布计算器找到 $P(X^2 \gt 19.58) = 0.0001$。

P-value is the probability that a chi-square statistic,$ X^2 $ having 2 degrees of freedom is more extreme than 19.58. Use the Chi-Square Distribution Calculator to find $ { P(X^2 \gt 19.58) = 0.0001 } $.

Interpret results

由于 P 值 (0.0001) 远小于显著性水平 (0.05),因此零假设无法接受。因此,公司的说法是无效的。

As the P-value (0.0001) is quite less than the significance level (0.05), the null hypothesis can not be accepted. Thus company claim is invalid.