Python Data Science 简明教程

Python - P-Value

p 值与假设的力度有关。我们基于某些统计模型构建假设,并使用 p 值比较模型的有效性。获得 p 值的一种方法是使用 T 检验。

The p-value is about the strength of a hypothesis. We build hypothesis based on some statistical model and compare the model’s validity using p-value. One way to get the p-value is by using T-test.

这是一种针对空假设的双边检验,即独立观测样本 “a” 的期望值(平均值)等于给定的总体平均值 “ popmean ”。让我们考虑以下示例。

This is a two-sided test for the null hypothesis that the expected value (mean) of a sample of independent observations ‘a’ is equal to the given population mean, popmean. Let us consider the following example.

from scipy import stats
rvs = stats.norm.rvs(loc = 5, scale = 10, size = (50,2))
print stats.ttest_1samp(rvs,5.0)

上述程序将生成以下输出。

The above program will generate the following output.

Ttest_1sampResult(statistic = array([-1.40184894, 2.70158009]),
pvalue = array([ 0.16726344, 0.00945234]))

Comparing two samples

在以下示例中,有两个样本,它们可以来自相同或不同分布,我们希望测试这些样本是否具有相同的统计特性。

In the following examples, there are two samples, which can come either from the same or from different distribution, and we want to test whether these samples have the same statistical properties.

ttest_ind − 计算两个独立分数样本均值的 T 检验。这是一个双侧检验,用于检验两个独立样本具有相同的平均(预期)值的原假设。此检验默认情况下假设总体具有相同的方差。

ttest_ind − Calculates the T-test for the means of two independent samples of scores. This is a two-sided test for the null hypothesis that two independent samples have identical average (expected) values. This test assumes that the populations have identical variances by default.

如果我们观察到来自相同或不同总体中的两个独立样本,我们可以使用此检验。让我们考虑以下示例。

We can use this test, if we observe two independent samples from the same or different population. Let us consider the following example.

from scipy import stats
rvs1 = stats.norm.rvs(loc = 5,scale = 10,size = 500)
rvs2 = stats.norm.rvs(loc = 5,scale = 10,size = 500)
print stats.ttest_ind(rvs1,rvs2)

上述程序将生成以下输出。

The above program will generate the following output.

Ttest_indResult(statistic = -0.67406312233650278, pvalue = 0.50042727502272966)

您可以使用长度相同但具有不同均值的新数组来测试相同的内容。在 loc 中使用不同的值并测试相同的内容。

You can test the same with a new array of the same length, but with a varied mean. Use a different value in loc and test the same.