Python Data Science 简明教程

Python - Chi-Square Test

卡方检验是一种统计方法,用于确定两个分类变量之间是否存在显著相关性。这两个变量都应该来自同一群体,并且它们应该是分类的,例如 − 是/否、男/女、红/绿等。例如,我们可以建立一个数据集,其中包含人们的冰淇淋购买模式的观察结果,并尝试将一个人的性别与他们喜欢的冰淇淋口味联系起来。如果发现相关性,我们可以通过了解到访者的性别数量来计划适当的口味库存。

Chi-Square test is a statistical method to determine if two categorical variables have a significant correlation between them. Both those variables should be from same population and they should be categorical like − Yes/No, Male/Female, Red/Green etc. For example, we can build a data set with observations on people’s ice-cream buying pattern and try to correlate the gender of a person with the flavour of the ice-cream they prefer. If a correlation is found we can plan for appropriate stock of flavours by knowing the number of gender of people visiting.

我们在 numpy 库中使用各种函数来执行卡方检验。

We use various functions in numpy library to carry out the chi-square test.

from scipy import stats
import numpy as np
import matplotlib.pyplot as plt

x = np.linspace(0, 10, 100)
fig,ax = plt.subplots(1,1)

linestyles = [':', '--', '-.', '-']
deg_of_freedom = [1, 4, 7, 6]
for df, ls in zip(deg_of_freedom, linestyles):
  ax.plot(x, stats.chi2.pdf(x, df), linestyle=ls)

plt.xlim(0, 10)
plt.ylim(0, 0.4)

plt.xlabel('Value')
plt.ylabel('Frequency')
plt.title('Chi-Square Distribution')

plt.legend()
plt.show()

它的 output 如下所示 −

Its output is as follows −

chisquare