Matplotlib 简明教程

Matplotlib - Box Plots

箱线图在一个图形中表示数据集的分布。它显示数据集的摘要统计信息,包括最小值、第一个四分位数 (Q1)、中位数 (Q2)、第三个四分位数 (Q3) 和最大值。该箱表示第一个和第三个四分位数之间的四分位间距 (IQR),而胡须则从方框延伸到最小值和最大值。如果存在异常值,它们可能会显示为胡须之外的单独点。

A box plot represents the distribution of a dataset in a graph. It displays the summary statistics of a dataset, including the minimum, first quartile (Q1), median (Q2), third quartile (Q3), and maximum. The box represents the interquartile range (IQR) between the first and third quartiles, while whiskers extend from the box to the minimum and maximum values. Outliers, if present, may be displayed as individual points beyond the whiskers.

想象一下,你有来自三个班级的学生的考试分数。箱线图是一种显示这些分数如何分布的方法 −

Imagine you have the exam scores of students from three classes. A box plot is a way to show how these scores are spread out −

  1. Minimum and Maximum − The smallest and largest scores are shown as the ends of the plot.

  2. Quartiles (Q1, Q2, Q3) − The scores are split into four parts. The middle score is the median (Q2). The scores below the median are the first quartile (Q1), and those above are the third quartile (Q3). It helps you see where most of the scores lie.

  3. Interquartile Range (IQR) − The range between Q1 and Q3 is called the interquartile range.

  4. Box − The box in the middle represents the interquartile range. So, it is showing you where half of the scores are.

  5. Whiskers − Lines (whiskers) extend from the box to the smallest and largest scores, helping you see how spread out the scores are.

  6. Outliers − If there are any scores way above or below the rest, they might be shown as dots beyond the whiskers. These are like the standout scores.

box plots1

Box Plot in Matplotlib

我们可以使用 boxplot() 函数在 Matplotlib 中创建箱形图。该函数允许我们自定义箱形图的外观,例如更改晶须长度、添加缺口并指定极端值显示方式。

We can create a box plot in Matplotlib using the boxplot() function. This function allows us to customize the appearance of the box plot, such as changing the whisker length, adding notches, and specifying the display of outliers.

The boxplot() Function

Matplotlib 中的 boxplot() 函数将一个或多个数据集作为输入,并针对每个数据集生成一个箱形图。

The boxplot() function in Matplotlib takes one or more datasets as input and generates a box plot for each dataset.

以下是 Matplotlib 中 boxplot() 函数的语法:

Following is the syntax of boxplot() function in Matplotlib −

plt.boxplot(x, notch=None, patch_artist=None, widths=None, labels=None, ...)

其中,

Where,

  1. x is the dataset or a list of datasets for which the box plot is to be created.

  2. If notch (optional) is True, it creates a vertical box plot; if False, creates a horizontal box plot.

  3. If patch_artist (optional) is True, it fills the box with color.

  4. widths (optional) represents the width of the boxes.

  5. labels (optional) sets labels for each dataset, useful when plotting multiple box plots.

这些只是一些参数;还有更多可选参数可用于自定义。

These are just a few parameters; there are more optionals parameters available for customization.

Horizontal Box Plot with Notches

我们可以创建一个水平箱形图,其中包含切口,以水平方向显示数据集的分布。它在中位线周围包含切口,以直观地估计中值周围的不确定性。

We can create a horizontal box plot with notches to display the distribution of a dataset in a horizontal orientation. It includes notches around the median lines, providing a visual estimate of the uncertainty around the median values.

Example

在以下示例中,我们为三个数据集创建了一个水平箱形图,其中包含围绕中值的切口,每个方框在 y 轴类别上代表一组值:

In the following example, we are creating a horizontal box plot with notches around the medians for three data sets, where each box represents a set of values along the y-axis categories −

import matplotlib.pyplot as plt

# Data
data = [[1, 2, 3, 4, 5], [3, 6, 8, 10, 12], [5, 10, 15, 20, 25]]

# Creating a horizontal box plot with notches
plt.boxplot(data, vert=False, notch=True)
plt.title('Horizontal Box Plot with Notches')
plt.xlabel('Values')
plt.ylabel('Categories')
plt.show()

执行上面的代码后,我们得到以下输出: -

After executing the above code, we get the following output −

box plots2

Box Plot with Custom Colors

我们可以创建一个具有自定义颜色的箱形图,对数据进行图形化处理,并使用不同的颜色填充方框。每个方框表示类别中值的分布,并且通过添加自定义颜色,我们引入了样式化效果,使得区分类别更加容易。

We can create a box plot with custom colors, graphically representing the data with different colors to fill the boxes. Each box represents the distribution of values within a category, and by adding a custom color, we introduce a stylistic touch that makes it easier to differentiate between categories.

Example

在这里,我们通过使用自定义颜色(即天蓝色)填充方框来增强箱形图:

In here, we are enhancing the box plot by filling the boxes with a custom color i.e. skyblue −

import matplotlib.pyplot as plt
data = [[1, 2, 3, 4, 5], [3, 6, 8, 10, 12], [5, 10, 15, 20, 25]]

# Creating a box plot with custom colors
plt.boxplot(data, patch_artist=True, boxprops=dict(facecolor='skyblue'))
plt.title('Box Plot with Custom Colors')
plt.xlabel('Categories')
plt.ylabel('Values')
plt.show()

以下是上面代码的输出: -

Following is the output of the above code −

box plots3

Grouped Box Plot

我们可以创建一个分组箱形图来并排比较多个组的分布。每个组有自己的一组箱形,其中每个箱形表示该组中值分布。

We can create a grouped box plot to compare the distributions of multiple groups side by side. Each group has its own set of boxes, where each box represents the distribution of values within that group.

Example

现在,我们正在创建一个分组箱形图来比较来自三个不同班级(A、B 和 C)的学生的考试成绩。每个箱形图表示一个班级中分数的分布,使我们能轻松观察和比较三个班级中代表趋势、分布和潜在异常值的中央趋势 -

Now, we are creating a grouped box plot to compare the exam scores of students from three different classes (A, B, and C). Each box represents the distribution of scores within a class, allowing us to easily observe and compare the central tendencies, spreads, and potential outliers across the three classes −

import matplotlib.pyplot as plt
import numpy as np
class_A_scores = [75, 80, 85, 90, 95]
class_B_scores = [70, 75, 80, 85, 90]
class_C_scores = [65, 70, 75, 80, 85]

# Creating a grouped box plot
plt.boxplot([class_A_scores, class_B_scores, class_C_scores], labels=['Class A', 'Class B', 'Class C'])
plt.title('Exam Scores by Class')
plt.xlabel('Classes')
plt.ylabel('Scores')
plt.show()

执行上述代码,我们将得到以下输出 −

On executing the above code we will get the following output −

box plots4

Box Plot with Outliers

带有异常值箱形图的数据图形表示形式,其中包括有关数据集中极值的其他信息。在标准箱形图中,我们把相对于多数值显著不同的数据点,表示为超出了从箱形图中延伸出来的“须”(whiskers)的单独点。

A box plot with outliers is a graphical representation of data that includes additional information about extreme values in the dataset. In a standard box plot, we represent outliers, data points significantly different from the majority, as individual points beyond the "whiskers" that extend from the box.

此图有助于识别可能对数据的总分布产生重大影响的异常值。

This plot helps in identifying exceptional values that may have a significant impact on the overall distribution of the data.

Example

在以下示例中,我们正在创建一个箱形图,该图提供每个产品的销售分布的可视化表示,并且异常值突出显示了销售额特别高或低的月份-

In the example below, we are creating a box plot that provides a visual representation of the sales distribution for each product, and the outliers highlight months with exceptionally high or low sales −

import matplotlib.pyplot as plt
import numpy as np

# Data for monthly sales
product_A_sales = [100, 110, 95, 105, 115, 90, 120, 130, 80, 125, 150, 200]
product_B_sales = [90, 105, 100, 98, 102, 105, 110, 95, 112, 88, 115, 250]
product_C_sales = [80, 85, 90, 78, 82, 85, 88, 92, 75, 85, 200, 95]

# Introducing outliers
product_A_sales.extend([300, 80])
product_B_sales.extend([50, 300])
product_C_sales.extend([70, 250])

# Creating a box plot with outliers
plt.boxplot([product_A_sales, product_B_sales, product_C_sales], sym='o')
plt.title('Monthly Sales Performance by Product with Outliers')
plt.xlabel('Products')
plt.ylabel('Sales')
plt.show()

执行上述代码,我们将得到以下输出 −

On executing the above code we will get the following output −

box plots5