Python Data Science 简明教程

Python - Box Plots

箱形图是对数据集中数据的分布情况的一种衡量。它将数据集划分为三个四分位数。此图表示数据集中的最小值、最大值、中位数、第一四分位数和第三四分位数。通过针对每个数据集绘制箱形图,它还有助于比较不同数据集的数据分布。

Boxplots are a measure of how well distributed the data in a data set is. It divides the data set into three quartiles. This graph represents the minimum, maximum, median, first quartile and third quartile in the data set. It is also useful in comparing the distribution of data across data sets by drawing boxplots for each of them.

Drawing a Box Plot

可以通过调用 Series.box.plot() 和 DataFrame.box.plot(),或 DataFrame.boxplot() 来绘制箱形图,以可视化每一列中值分布。

Boxplot can be drawn calling Series.box.plot() and DataFrame.box.plot(), or DataFrame.boxplot() to visualize the distribution of values within each column.

例如,这里是一个箱线图,表示在 [0,1) 上的均匀随机变量的 10 次观测的五次试验。

For instance, here is a boxplot representing five trials of 10 observations of a uniform random variable on [0,1).

import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.rand(10, 5), columns=['A', 'B', 'C', 'D', 'E'])
df.plot.box(grid='True')

它的 output 如下所示 −

Its output is as follows −

boxplot