Machine Learning 简明教程

Machine Learning - Confusion Matrix

这是衡量分类问题的性能的最简单方法，其中输出可以是两种或更多类型的类别。混淆矩阵只不过是一个具有两个维度“实际”和“预测”的表，此外，这两个维度都具有下文所示的“真阳性（TP）”、“真阴性（TN）”、“假阳性（FP）”、“假阴性（FN）” -

It is the easiest way to measure the performance of a classification problem where the output can be of two or more type of classes. A confusion matrix is nothing but a table with two dimensions viz. "Actual" and "Predicted" and furthermore, both the dimensions have "True Positives (TP)", "True Negatives (TN)", "False Positives (FP)", "False Negatives (FN)" as shown below −

与混淆矩阵相关的术语的解释如下 -

Explanation of the terms associated with confusion matrix are as follows −

True Positives (TP) − It is the case when both actual class & predicted class of data point is 1.
True Negatives (TN) − It is the case when both actual class & predicted class of data point is 0.
False Positives (FP) − It is the case when actual class of data point is 0 & predicted class of data point is 1.
False Negatives (FN) − It is the case when actual class of data point is 1 & predicted class of data point is 0.

How to Implement Confusion Matrix in Python?

要在 Python 中实现混淆矩阵，我们可以使用 scikit-learn 库的 sklearn.metrics 模块中的 confusion_matrix() 函数。以下是如何使用 confusion_matrix() 函数的一个简单示例 −

To implement the confusion matrix in Python, we can use the confusion_matrix() function from the sklearn.metrics module of the scikit-learn library. Here is an simple example of how to use the confusion_matrix() function −

from sklearn.metrics import confusion_matrix

# Actual values
y_actual = [0, 1, 0, 1, 1, 0, 0, 1, 1, 1]

# Predicted values
y_pred = [0, 1, 0, 1, 0, 1, 0, 0, 1, 1]

# Confusion matrix
cm = confusion_matrix(y_actual, y_pred)
print(cm)

在这个示例中，我们有两个数组： y_actual 包含目标变量的实际值， y_pred 包含目标变量的预测值。然后，我们调用 confusion_matrix() 函数，将 y_actual 和 y_pred 作为参数传递进去。该函数返回一个表示混淆矩阵的 2D 数组。

In this example, we have two arrays: y_actual contains the actual values of the target variable, and y_pred contains the predicted values of the target variable. We then call the confusion_matrix() function, passing in y_actual and y_pred as arguments. The function returns a 2D array that represents the confusion matrix.

上述代码的 output 将如下所示 −

The output of the code above will look like this −

[[3 1]
 [2 4]]

我们还可以使用热图来可视化混淆矩阵。以下是我们可以使用 seaborn 库中的 heatmap() 函数来实现此目标的方法

We can also visualize the confusion matrix using a heatmap. Below is how we can do that using the heatmap() function from the seaborn library

import seaborn as sns

# Plot confusion matrix as heatmap
sns.heatmap(cm, annot=True, cmap='summer')

这将生成一个显示混淆矩阵的热图 −

This will produce a heatmap that shows the confusion matrix −

在此热图中，x 轴表示预测值，y 轴表示实际值。热图中每个方块的颜色指示落入每个类别的样本数。

In this heatmap, the x-axis represents the predicted values, and the y-axis represents the actual values. The color of each square in the heatmap indicates the number of samples that fall into each category.