Machine Learning 简明教程

Machine Learning - Percentiles

百分位数是机器学习中用来描述数据集分布的统计概念。百分位数是一种衡量标准,它表示一组观测中给定百分比观测值下降到的数值以下。

Percentiles are a statistical concept used in machine learning to describe the distribution of a dataset. A percentile is a measure that indicates the value below which a given percentage of observations in a group of observations falls.

例如,第 25 个百分位数(也称为第一四分位数)是数据集中的 25% 的观测值下降到的数值以下,而第 75 个百分位数(也称为第三四分位数)是数据集中的 75% 的观测值下降到的数值以下。

For example, the 25th percentile (also known as the first quartile) is the value below which 25% of the observations in the dataset fall, while the 75th percentile (also known as the third quartile) is the value below which 75% of the observations in the dataset fall.

百分位数可以用来概括数据集的分布并识别异常值。在机器学习中,百分位数通常用于数据预处理和探索性数据分析,以深入了解数据。

Percentiles can be used to summarize the distribution of a dataset and identify outliers. In machine learning, percentiles are often used in data preprocessing and exploratory data analysis to gain insights into the data.

Python 提供了几个用于计算百分位数的库,包括 NumPy 和 Pandas。

Python provides several libraries for calculating percentiles, including NumPy and Pandas.

Calculating Percentiles using NumPy

以下是如何使用 NumPy 计算百分位数的示例 −

Below is an example of how to calculate percentiles using NumPy −

Example

import numpy as np

data = np.array([1, 2, 3, 4, 5])
p25 = np.percentile(data, 25)
p75 = np.percentile(data, 75)
print('25th percentile:', p25)
print('75th percentile:', p75)

在此示例中,我们使用 NumPy 创建了一个样本数据集,然后使用 np.percentile() 函数计算了第 25 个和第 75 个百分位数。

In this example, we create a sample dataset using NumPy and then calculate the 25th and 75th percentiles using the np.percentile() function.

输出显示了数据集的百分位数的值。

The output shows the values of the percentiles for the dataset.

25th percentile: 2.0
75th percentile: 4.0

Calculating Percentiles using Pandas

以下是如何使用 Pandas 计算百分位的示例 −

Below is an example of how to calculate percentiles using Pandas −

Example

import pandas as pd

data = pd.Series([1, 2, 3, 4, 5])
p25 = data.quantile(0.25)
p75 = data.quantile(0.75)

print('25th percentile:', p25)
print('75th percentile:', p75)

在这个示例中,我们将创建一个 Pandas 系列对象,然后使用系列对象的 quantile() 方法计算第 25 和第 75 个百分位。

In this example, we create a Pandas series object and then calculate the 25th and 75th percentiles using the quantile() method of the series object.

输出显示了数据集的百分位数的值。

The output shows the values of the percentiles for the dataset.

25th percentile: 2.0
75th percentile: 4.0