Machine Learning 简明教程
Machine Learning - Scatter Matrix Plot
散点矩阵图是多个变量之间关系的图形表示。它是机器学习中用于可视化数据集中特征之间关联关系的有用工具。此图也被称为成对图,用于识别数据集中两个或更多变量之间的关联关系。
Scatter Matrix Plot is a graphical representation of the relationship between multiple variables. It is a useful tool in machine learning for visualizing the correlation between features in a dataset. This plot is also known as a Pair Plot, and it is used to identify the correlation between two or more variables in a dataset.
散点矩阵图显示了数据集中每对特征的散点图。每个散点图表示两个变量之间的关系。还可以向图中添加对角线,显示每个变量的分布。
A Scatter Matrix Plot displays the scatter plot of each pair of features in a dataset. Each scatter plot represents the relationship between two variables. It is also possible to add a diagonal line to the plot that shows the distribution of each variable.
Python Implementation of Scatter Matrix Plot
在此,我们将用 Python 实现散点矩阵图。对于下面给出的示例,我们将使用 Sklearn 的 Iris 数据集。
Here, we will implement the Scatter Matrix Plot in Python. For our example given below, we will be using Sklearn’s Iris dataset.
Iris 数据集是机器学习中的经典数据集。它包含四个特征:花萼长度、花萼宽度、花瓣长度和花瓣宽度。该数据集有 150 个样本,每个样本都标记为三个物种之一:Setosa、Versicolor 或 Virginica。
The Iris dataset is a classic dataset in machine learning. It contains four features: Sepal Length, Sepal Width, Petal Length, and Petal Width. The dataset has 150 samples, and each sample is labeled as one of three species: Setosa, Versicolor, or Virginica.
我们将使用 Seaborn 库来实现散点矩阵图。Seaborn 是一个构建在 Matplotlib 库之上的 Python 数据可视化库。
We will use the Seaborn library to implement the Scatter Matrix Plot. Seaborn is a Python data visualization library that is built on top of the Matplotlib library.
Example
以下是实现散点矩阵图的 Python 代码 -
Below is the Python code to implement the Scatter Matrix Plot −
import seaborn as sns
import pandas as pd
# load iris dataset
iris = sns.load_dataset('iris')
# create scatter matrix plot
sns.pairplot(iris, hue='species')
# show plot
plt.show()
在此代码中,我们首先导入必需的库,Seaborn 和 Pandas。然后,我们使用 sns.load_dataset() 函数加载 Iris 数据集。此函数从 Seaborn 库加载 Iris 数据集。
In this code, we first import the necessary libraries, Seaborn and Pandas. Then, we load the Iris dataset using the sns.load_dataset() function. This function loads the Iris dataset from the Seaborn library.
接下来,我们使用 sns.pairplot() 函数创建散点矩阵图。hue 参数用于指定数据集中应用于颜色编码的列。在此情况下,我们使用 species 列根据每个样本的物种给点着色。
Next, we create the Scatter Matrix Plot using the sns.pairplot() function. The hue parameter is used to specify the column in the dataset that should be used for color encoding. In this case, we use the species column to color the points according to the species of each sample.
最后,我们使用 plt.show() 函数显示该图。
Finally, we use the plt.show() function to display the plot.
此代码的输出将形成散点矩阵图,该图显示鸢尾花数据集中的每对特性的散点图。
The output of this code will be a Scatter Matrix Plot that shows the scatter plots of each pair of features in the Iris dataset.
请注意,每个散点图都将按照每个样本的物种进行着色编码。
Notice that each scatter plot is color-coded according to the species of each sample.