Machine Learning 简明教程

Machine Learning - Skewness and Kurtosis

偏度和峰度是机器学习中概率分布形状的两个重要衡量指标。

Skewness and kurtosis are two important measures of the shape of a probability distribution in machine learning.

偏度是指分布的不对称程度。如果分布相对于其均值不是对称的，则该分布就是偏度的。偏度可以是正数，指示分布的尾部在右侧更长，也可以是负数，指示分布的尾部在左侧更长。偏度为 0 指示该分布完全对称。

Skewness refers to the degree of asymmetry of a distribution. A distribution is said to be skewed if it is not symmetrical about its mean. Skewness can be positive, indicating that the tail of the distribution is longer on the right-hand side, or negative, indicating that the tail of the distribution is longer on the left-hand side. A skewness of zero indicates that the distribution is perfectly symmetrical.

峰度是指分布的尖锐程度。具有高峰度的分布具有比正态分布更尖锐的峰值和更重的尾部，而具有低峰度的分布具有更平坦的峰值和更轻的尾部。峰度可以是正数，指示比正常值更高的峰值，也可以是负数，指示比正常值更低的峰值。峰度为 0 指示正态分布。

Kurtosis refers to the degree of peakedness of a distribution. A distribution with high kurtosis has a sharper peak and heavier tails than a normal distribution, while a distribution with low kurtosis has a flatter peak and lighter tails. Kurtosis can be positive, indicating a higher-than-normal peak, or negative, indicating a lower than normal peak. A kurtosis of zero indicates a normal distribution.

偏度和峰度都会对机器学习算法产生重要影响，因为它们可能影响模型的假设和预测的准确性。例如，高度偏斜的分布可能需要数据转换或使用非参数方法，而高度峰度的分布可能需要不同的统计模型或更稳健的估计方法。

Both skewness and kurtosis can have important implications for machine learning algorithms, as they can affect the assumptions of the models and the accuracy of the predictions. For example, a highly skewed distribution may require data transformation or the use of non-parametric methods, while a highly kurtotic distribution may require different statistical models or more robust estimation methods.

Example

在 Python 中，SciPy 库提供了计算数据集偏度和峰度的函数。例如，以下代码使用 skew() 和 kurtosis() 函数计算数据集的偏度和峰度−

In Python, the SciPy library provides functions for calculating skewness and kurtosis of a dataset. For example, the following code calculates the skewness and kurtosis of a dataset using the skew() and kurtosis() functions −

import numpy as np
from scipy.stats import skew, kurtosis

# Generate a random dataset
data = np.random.normal(0, 1, 1000)

# Calculate the skewness and kurtosis of the dataset
skewness = skew(data)
kurtosis = kurtosis(data)

# Print the results
print('Skewness:', skewness)
print('Kurtosis:', kurtosis)

此代码从均值为 0、标准偏差为 1 的正态分布中生成了 1000 个样本的随机数据集。然后，它使用 SciPy 库中的 skew() 和 kurtosis() 函数计算数据集的偏度和峰度。最后，它将结果打印到控制台。

This code generates a random dataset of 1000 samples from a normal distribution with mean 0 and standard deviation 1. It then calculates the skewness and kurtosis of the dataset using the skew() and kurtosis() functions from the SciPy library. Finally, it prints the results to the console.

Output

执行此代码后，您将获得以下输出 −

On executing this code, you will get the following output −

Skewness: -0.04119418903611285
Kurtosis: -0.1152250196054534

对于正态分布，所得的偏度和峰度值应该接近于 0。

The resulting skewness and kurtosis values should be close to zero for a normal distribution.