Plotly 简明教程

Plotly - Histogram

直方图由卡尔·皮尔森引入,是数字数据分布的准确表示,即连续变量概率分布的估计值(CORAL)。它看起来类似于条形图,但是,条形图关联两个变量,而直方图仅关联一个变量。

Introduced by Karl Pearson, a histogram is an accurate representation of the distribution of numerical data which is an estimate of the probability distribution of a continuous variable (CORAL). It appears similar to bar graph, but, a bar graph relates two variables, whereas a histogram relates only one.

直方图需要 bin (或 bucket ),它将整个值范围分成一系列区间,然后统计落在每个区间中的值的个数。这些直方通常指定为变量的连续、不重叠的区间。这些直方必须相邻,并且通常具有相同的大小。在直方上面竖起一个矩形,其高度与频率成正比,即每个直方中的情况数。

A histogram requires bin (or bucket) which divides the entire range of values into a series of intervals—and then count how many values fall into each interval. The bins are usually specified as consecutive, non-overlapping intervals of a variable. The bins must be adjacent, and are often of equal size. A rectangle is erected over the bin with height proportional to the frequency—the number of cases in each bin.

Plotly 函数返回直方图跟踪对象。它的定制由各种参数或属性完成。一个基本参数是 x 或 y,设置到以下列表中: numpy arrayPandas dataframe object ,它要以 bin 方式分布。

Histogram trace object is returned by go.Histogram() function. Its customization is done by various arguments or attributes. One essential argument is x or y set to a list, numpy array or Pandas dataframe object which is to be distributed in bins.

默认情况下,Plotly 以自动调整大小的 bin 方式分布数据点。但是,您可以定义自定义 bin 大小。为此,您需要将 autobins 设为 false,指定 nbins (bin 的数量)、它的起始值、结束值和大小。

By default, Plotly distributes the data points in automatically sized bins. However, you can define custom bin size. For that you need to set autobins to false, specify nbins (number of bins), its start and end values and size.

下列代码生成一个简单的直方图,在 bin 中显示一班学生成绩分布(自动调整大小)−

Following code generates a simple histogram showing distribution of marks of students in a class inbins (sized automatically) −

import numpy as np
x1 = np.array([22,87,5,43,56,73,55,54,11,20,51,5,79,31,27])
data = [go.Histogram(x = x1)]
fig = go.Figure(data)
iplot(fig)

输出如下所示:

The output is as shown below −

histnorm

go.Histogram() 函数接受 histnorm ,它指定用于此直方图跟踪的正态化类型。默认值是 “”,每个条的范围对应于出现的次数(即位于 bin 内的数据点的数量)。如果将它赋值给 "percent" / "probability" ,则每个条的范围对应于相对于样本点总数出现的百分比/分数。如果它等于 “ density ”,则每个条的范围对应于位于 bin 中出现的次数除以 bin 间隔的大小。

The go.Histogram() function accepts histnorm, which specifies the type of normalization used for this histogram trace. Default is "", the span of each bar corresponds to the number of occurrences (i.e. the number of data points lying inside the bins). If assigned "percent" / "probability", the span of each bar corresponds to the percentage / fraction of occurrences with respect to the total number of sample points. If it is equal to "density", the span of each bar corresponds to the number of occurrences in a bin divided by the size of the bin interval.

此外还有 histfunc 参数,其默认值是 count 。结果,位于 bin 上的矩形的高度对应于数据点的计数。它可以设为 sum、avg、min 或 max。

There is also histfunc parameter whose default value is count. As a result, height of rectangle over a bin corresponds to count of data points. It can be set to sum, avg, min or max.

可以将 histogram() 函数设为显示连续 bin 中值的累积分布。为此,您需要将 cumulative property 设为 enabled。结果如下所示 −

The histogram() function can be set to display cumulative distribution of values in successive bins. For that, you need to set cumulative property to enabled. Result can be seen as below −

data=[go.Histogram(x = x1, cumulative_enabled = True)]
fig = go.Figure(data)
iplot(fig)

输出如下所述 −

The output is as mentioned below −

cumulative property