Python Data Science 简明教程

Python - Measuring Variance

在统计学中,方差是数据集中一个值与平均值之间的距离的度量。换句话说,它表示值的分散程度。它通过使用标准差来测量。另一种常用的方法是偏度。

In statistics, variance is a measure of how far a value in a data set lies from the mean value. In other words, it indicates how dispersed the values are. It is measured by using standard deviation. The other method commonly used is skewness.

这两个值都是通过使用 pandas 库中提供的函数计算的。

Both of these are calculated by using functions available in pandas library.

Measuring Standard Deviation

标准差是方差的平方根。方差是数据集中值的平方差平均值与平均值之差。在 python 中,我们可以使用来自 pandas 库的函数 std() 计算此值。

Standard deviation is square root of variance. variance is the average of squared difference of values in a data set from the mean value. In python we calculate this value by using the function std() from pandas library.

import pandas as pd

#Create a Dictionary of series
d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack',
   'Lee','Chanchal','Gasper','Naviya','Andres']),
   'Age':pd.Series([25,26,25,23,30,25,23,34,40,30,25,46]),
   'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65])}

#Create a DataFrame
df = pd.DataFrame(d)

# Calculate the standard deviation
print df.std()

它的 output 如下所示 −

Its output is as follows −

Age       7.265527
Rating    0.661628
dtype: float64

Measuring Skewness

它用于确定数据是对称还是偏斜。如果索引在 -1 和 1 之间,则分布是对称的。如果索引不超过 -1,则它向左偏斜;如果它至少为 1,则它向右偏斜。

It used to determine whether the data is symmetric or skewed. If the index is between -1 and 1, then the distribution is symmetric. If the index is no more than -1 then it is skewed to the left and if it is at least 1, then it is skewed to the right

import pandas as pd

#Create a Dictionary of series
d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack',
   'Lee','Chanchal','Gasper','Naviya','Andres']),
   'Age':pd.Series([25,26,25,23,30,25,23,34,40,30,25,46]),
   'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65])}

#Create a DataFrame
df = pd.DataFrame(d)
print df.skew()

它的 output 如下所示 −

Its output is as follows −

Age       1.443490
Rating   -0.153629
dtype: float64

因此,年龄评级的分布是对称的,而年龄的分布向右偏斜。

So the distribution of age rating is symmetric while the distribution of age is skewed to the right.