Matplotlib 简明教程
Matplotlib - Histogram
直方图类似于可视化摘要,显示不同值在数据集中的出现频率。假设您有一组数字,例如人的年龄。直方图将这些数字分成称为“区间”的组,然后使用条形表示有多少数字落在每个区段内。条形越高,该组中的数字就越多。
A histogram is like a visual summary that shows how often different values appear in a set of data. Imagine you have a collection of numbers, like ages of people. A histogram divides these numbers into groups, called "bins," and then uses bars to represent how many numbers fall into each bin. The taller the bar, the more numbers are in that group.
Histogram in Matplotlib
我们可以使用 hist() 函数在 Matplotlib 中创建一个直方图。此函数允许我们自定义直方图的各个方面,例如区间的数量、颜色和透明度。Matplotlib 中的直方图用于表示数值数据的分布,帮助您识别模式。
We can create a histogram in Matplotlib using the hist() function. This function allows us to customize various aspects of the histogram, such as the number of bins, color, and transparency. Histogram in Matplotlib is used to represent the distribution of numerical data, helping you to identify patterns.
The hist() Function
Matplotlib 中的 hist() 函数将数据集作为输入,并将其分成区间(区间)。然后,它将落在每个区间内的数据点的频率(计数)显示为条形图。
The hist() function in Matplotlib takes a dataset as input and divides it into intervals (bins). It then displays the frequency (count) of data points falling within each bin as a bar graph.
以下是 Matplotlib 中 hist() 函数的语法——
Following is the syntax of hist() function in Matplotlib −
plt.hist(x, bins=None, range=None, density=False, cumulative=False, color=None, edgecolor=None, ...)
其中,
Where,
-
x is the input data for which the histogram is determined.
-
bins (optional) is the number of bins or the bin edges.
-
range (optional) is the lower and upper range of the bins. Default is the minimum and maximum of x
-
If density (optional) is True, the histogram represents a probability density function. Default is False.
-
If cumulative (optional) is True, a cumulative histogram is computed. Default is False.
这些只是一些参数;还有更多可选参数可用于自定义。
These are just a few parameters; there are more optionals parameters available for customization.
Creating a Vertical Histogram
在 Matplotlib 中,创建垂直直方图包括绘制数据集的频率分布的图形表示,其中的条形图沿 y 轴垂直定向。每个条形图表示落在 x 轴上的特定间隔或区间内的数据点的频率或数量。
In Matplotlib, creating a vertical histogram involves plotting a graphical representation of the frequency distribution of a dataset, with the bars oriented vertically along the y-axis. Each bar represents the frequency or count of data points falling within a particular interval or bin along the x-axis.
Example
在以下示例中,我们通过在 hist() 函数中将“orientation”参数设置为“vertical”来创建垂直直方图 −
In the following example, we are creating a vertical histogram by setting the "orientation" parameter to "vertical" within the hist() function −
import matplotlib.pyplot as plt
plt.rcParams["figure.figsize"] = [7.50, 3.50]
plt.rcParams["figure.autolayout"] = True
x = [1, 2, 3, 1, 2, 3, 4, 1, 3, 4, 5]
plt.hist(x, orientation="vertical")
plt.show()
我们得到了如下输出 −
We get the output as shown below −
Customized Histogram with Density
当创建具有密度的直方图时,我们将提供数据分布的直观摘要。我们使用此图形了解不同数字发生的可能性,而 density 选项确保直方图下的总面积归一化为一。
When we create a histogram with density, we are providing a visual summary of how data is distributed. We use this graph to see how likely different numbers are occurring, and the density option makes sure the total area under the histogram is normalized to one.
Example
在以下示例中,我们将随机数据可视化为 30 个区间大小的直方图,并以绿色显示并带有黑色边框。我们使用 density=True 参数来表示概率密度 −
In the following example, we are visualizing random data as a histogram with 30 bins, displaying it in green with a black edge. We are using the density=True parameter to represent the probability density −
import matplotlib.pyplot as plt
import numpy as np
# Generate random data
data = np.random.randn(1000)
# Create a histogram with density and custom color
plt.hist(data, bins=30, density=True, color='green', edgecolor='black', alpha=0.7)
plt.xlabel('Values')
plt.ylabel('Probability Density')
plt.title('Customized Histogram with Density')
plt.show()
执行上面的代码后,我们得到以下输出: -
After executing the above code, we get the following output −
Cumulative Histogram
当创建累积直方图时,我们将图示化表示到某一点处的值的总出现次数。它显示落在某一值以下或等于某一值的数据点的数量。
When we create a cumulative histogram, we graphically represent the total number of occurrences of values up to a certain point. It shows how many data points fall below or equal to a certain value.
Example
在此处,我们使用直方图,其中每个条形图表示考试成绩的一个范围,而条形图的高度告诉我们在此范围内总共有多少学生获得该分数。通过在 hist() 函数中设置 cumulative=True 参数,我们可以确保直方图显示分数的累积进度 −
In here, we are using a histogram where each bar represents a range of exam scores, and the height of the bar tells us how many students, in total, scored within that range. By setting the cumulative=True parameter in the hist() function, we make sure that the histogram shows the cumulative progression of scores −
import matplotlib.pyplot as plt
import numpy as np
# Generate random exam scores (out of 100)
exam_scores = np.random.randint(0, 100, 150)
# Create a cumulative histogram
plt.hist(exam_scores, bins=20, cumulative=True, color='orange', edgecolor='black', alpha=0.7)
plt.xlabel('Exam Scores')
plt.ylabel('Cumulative Number of Students')
plt.title('Cumulative Histogram of Exam Scores')
plt.show()
以下是上面代码的输出: -
Following is the output of the above code −
Histogram with Different Color and Edge Color
在创建直方图时,我们可以自定义填充颜色和边框颜色,添加视觉效果以表示数据分布。通过这样做,我们将直方图与时尚且独特的外观融为一体。
When creating a histogram, we can customize the fill color and edge color, adding a visual touch to represent the data distribution. By doing this, we blend the histogram with a stylish and distinctive appearance.
Example
现在,我们正在为随机数据生成一个带有 25 个区间的直方图,并且我们以紫色和蓝色边框呈现。
Now, we are generating a histogram for random data with 25 bins, and we are presenting it in purple color with blue edges −
import matplotlib.pyplot as plt
import numpy as np
data = np.random.randn(1000)
# Creating a histogram with different color and edge color
plt.hist(data, bins=25, color='purple', edgecolor='blue')
plt.xlabel('Values')
plt.ylabel('Frequency')
plt.title('Histogram with Different Color and Edge Color')
plt.show()
执行上述代码,我们将得到以下输出 −
On executing the above code we will get the following output −
Example
若要绘制带颜色的直方图,我们还可以从 setp() 方法中的“cm”参数中提取颜色。
To plot a histogram with colors, we can also extract colors from the "cm" parameter in the setp() method.
import numpy as np
from matplotlib import pyplot as plt
plt.rcParams["figure.figsize"] = [7.00, 3.50]
plt.rcParams["figure.autolayout"] = True
data = np.random.random(1000)
n, bins, patches = plt.hist(data, bins=25, density=True, color='red', rwidth=0.75)
col = (n-n.min())/(n.max()-n.min())
cm = plt.cm.get_cmap('RdYlBu')
for c, p in zip(col, patches):
plt.setp(p, 'facecolor', cm(c))
plt.show()
执行上述代码,我们将得到以下输出 −
On executing the above code we will get the following output −
Example
在此处,我们通过在区间数的范围内进行迭代并为每个条形图设置随机面部颜色来指定 matplotlib 直方图中不同条形图的不同颜色 −
In here, we are specifying different colors for different bars in a matplotlib histogram by iterating in the range of number of bins and setting random facecolor for each bar −
import numpy as np
import matplotlib.pyplot as plt
import random
import string
# Set the figure size
plt.rcParams["figure.figsize"] = [7.50, 3.50]
plt.rcParams["figure.autolayout"] = True
# Figure and set of subplots
fig, ax = plt.subplots()
# Random data
data = np.random.rand(100)
# Plot a histogram with random data
N, bins, patches = ax.hist(data, edgecolor='black', linewidth=1)
# Random facecolor for each bar
for i in range(len(N)):
patches[i].set_facecolor("#" + ''.join(random.choices("ABCDEF" + string.digits, k=6)))
# Display the plot
plt.show()
执行上述代码,我们将得到以下输出 −
On executing the above code we will get the following output −
Stacked Histogram with Multiple Datasets
具有多个数据集的堆叠直方图是一种将两个或更多组数据的分布结合在一起的可视化表示。条形图彼此叠加,便于比较不同的数据集如何为整体分布做出贡献。
A stacked histogram with multiple datasets is a visual representation that combines the distributions of two or more sets of data. The bars are stacked on top of each other, allowing for a comparison of how different datasets contribute to the overall distribution.
Example
在下面的示例中,我们用特定值表示两个不同的数据集“data1”和“data2”,并显示它们在不同颜色(天蓝色和鲑鱼色)中的分布 −
In the example below, we represent two different datasets "data1" and "data2" with specific values, showing their distributions in different colors (skyblue and salmon) −
import matplotlib.pyplot as plt
import numpy as np
# Sample data for two datasets
data1 = np.array([2, 4, 5, 7, 9, 10, 11, 13, 14, 15])
data2 = np.array([6, 7, 8, 10, 11, 12, 13, 14, 15, 16])
# Creating a stacked histogram with different colors
plt.hist([data1, data2], bins=10, stacked=True, color=['skyblue', 'salmon'], edgecolor='black')
plt.xlabel('Values')
plt.ylabel('Frequency')
plt.title('Stacked Histogram with Multiple Datasets')
plt.legend(['Dataset 1', 'Dataset 2'])
plt.show()
执行上述代码,我们将得到以下输出 −
On executing the above code we will get the following output −