Machine Learning 简明教程

Machine Learning - Cost Function

在机器学习中,成本函数是一种衡量机器学习模型执行效果的指标。它是一个数学函数,它采用模型的预测值和数据的真实值,并输出一个标量值,表示模型预测的成本或错误。训练机器学习模型的目标是最小化成本函数。

In machine learning, a cost function is a measure of how well a machine learning model is performing. It is a mathematical function that takes in the model’s predicted values and the true values of the data and outputs a single scalar value that represents the cost or error of the model’s predictions. The goal of training a machine learning model is to minimize the cost function.

成本函数的选择取决于要解决的具体问题。例如,在目标是预测数据点是否属于两个类别之一的二元分类任务中,最常用的成本函数是二元交叉熵函数。在目标是预测连续值的回归任务中,通常使用均方误差函数。

The choice of cost function depends on the specific problem being solved. For example, in binary classification tasks, where the goal is to predict whether a data point belongs to one of two classes, the most commonly used cost function is the binary cross-entropy function. In regression tasks, where the goal is to predict a continuous value, the mean squared error function is commonly used.

让我们仔细看看二元交叉熵函数。给定一个具有两个类别的二元分类问题,我们称之为类 0 和类 1,我们将模型预测的类 1 概率表示为“p(y = 1 | x)”。每个数据点的真实标签是 0 或 1。我们可以将二元交叉熵成本函数定义如下−

Let’s take a closer look at the binary cross-entropy function. Given a binary classification problem with two classes, let’s call them class 0 and class 1, and let’s denote the model’s predicted probability of class 1 as "p(y=1|x)". The true label of each data point is either 0 or 1. We can define the binary cross-entropy cost function as follows −

J = -1 / m) × Σ(y × log(p) + (1 - y) × log(1 - p)

J=-\left ( \frac{1}{m} \right )\times \Sigma \left ( y\times log\left ( p \right )+\left ( 1-y \right )\times log\left ( 1-p \right ) \right )

其中“m”是数据点的数量,“y”是每个数据点的真实标签,“p”是类 1 的预测概率。

where "m" is the number of data points, "y" is the true label of each data point, and "p" is the predicted probability of class 1.

二元交叉熵函数具有几个可取的属性。首先,它是一个凸函数,这意味着它具有一个唯一的全局最小值,可以使用优化技术找到该最小值。其次,它是一个严格的正函数,这意味着它会对错误的预测进行惩罚。第三,它是一个可微函数,这意味着它可以用基于梯度的优化算法。

The binary cross-entropy function has several desirable properties. First, it is a convex function, which means that it has a unique global minimum that can be found using optimization techniques. Second, it is a strictly positive function, which means that it penalizes incorrect predictions. Third, it is a differentiable function, which means that it can be used with gradient-based optimization algorithms.

Implementation in Python

现在让我们看看如何使用 NumPy 在 Python 中实现二元交叉熵函数 −

Now let’s see how to implement the binary cross-entropy function in Python using NumPy −

import numpy as np

def binary_cross_entropy(y_pred, y_true):
   eps = 1e-15
   y_pred = np.clip(y_pred, eps, 1 - eps)
   return -(y_true * np.log(y_pred) + (1 - y_true) * np.log(1 - y_pred)).mean()

在此实现中,我们先对预测的概率进行剪裁以避免对数的数字问题。然后,我们使用 NumPy 函数计算二进制交叉熵损失,并返回所有数据点的平均值。

In this implementation, we first clip the predicted probabilities to avoid numerical issues with logarithms. We then compute the binary cross-entropy loss using NumPy functions and return the mean over all data points.

一旦定义了成本函数,就可以使用梯度下降等优化技术来训练机器学习模型。优化的目标是找到使成本函数最小化的模型参数集。

Once we have defined a cost function, we can use it to train a machine learning model using optimization techniques such as gradient descent. The goal of optimization is to find the set of model parameters that minimizes the cost function.

Example

以下是如何使用二进制交叉熵函数在 Iris 数据集上使用 scikit-learn 训练逻辑回归模型的示例:

Here is an example of using the binary cross-entropy function to train a logistic regression model on the Iris dataset using scikit-learn −

from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

# Load the Iris dataset
iris = load_iris()

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.3, random_state=42)

# Train a logistic regression model
logreg = LogisticRegression()
logreg.fit(X_train, y_train)

# Make predictions on the testing set
y_pred = logreg.predict(X_test)

# Compute the binary cross-entropy loss
loss = binary_cross_entropy(logreg.predict_proba(X_test)[:, 1], y_test)
print('Loss:', loss)

在上面的示例中,我们首先使用 scikit-learn 中的 load_iris 函数加载 Iris 数据集。然后,我们使用受训模型的 train_test _splitfunction. We train a logistic regression model on the training set using theLogisticRegressionclass from scikit-learn. We then make predictions on the testing set using thepredict 方法将数据分割为训练集和测试集。

In the above example, we first load the Iris dataset using the load_iris function from scikit-learn. We then split the data into training and testing sets using the train_test _splitfunction. We train a logistic regression model on the training set using theLogisticRegressionclass from scikit-learn. We then make predictions on the testing set using thepredict method of the trained model.

要计算二进制交叉熵损失,我们使用逻辑回归模型的 predict_proba 方法来获取测试集中每个数据点的第 1 类预测概率。然后,我们使用索引提取第 1 类的概率,并将它们与测试集的真实标签一起传递给我们的 binary_cross_entropy 函数。该函数计算损失并返回它,我们将其显示在终端上。

To compute the binary cross-entropy loss, we use the predict_proba method of the logistic regression model to get the predicted probabilities of class 1 for each data point in the testing set. We then extract the probabilities for class 1 using indexing and pass them to our binary_cross_entropy function along with the true labels of the testing set. The function computes the loss and returns it, which we display on the terminal.

执行此代码时,将生成以下输出 −

When you execute this code, it will produce the following output −

Loss: 1.6312339784720309

二进制交叉熵损失衡量的是逻辑回归模型在预测测试集中每个数据点的类别方面的表现。较低损失表示较好表现,0 的损失表示完美表现。

The binary cross-entropy loss is a measure of how well the logistic regression model is able to predict the class of each data point in the testing set. A lower loss indicates better performance, and a loss of 0 would indicate perfect performance.