Gen-ai 简明教程

Generative AI Models - Maximum Likelihood Estimation

最大似然估计 (MLE) 是一种统计方法,它提供了一种主方法来估计最能描述给定数据集的概率分布的参数。MLE 假设指定分布生成数据。简单来说,MLE 是一种方法,用于找出模型的未知参数的最可能值,例如一组数据点的平均值或分布。这有点像我们猜测序列中缺失的数字,以便它与我们已知的数字模式相符。

在生成式 AI 领域,尤其是在对抗生成网络 (GAN) 和变分自动编码器 (VAE) 等生成式模型中,MLE 找到了广泛的应用。例如,在生成手写数字 (0-9) 的图像时,我们希望我们的模型生成类似于我们数据集(例如 MNIST)中的图像。我们可以通过最大化在给定模型参数的情况下观察到我们训练数据的可能性来实现这一点。

maximize Σ log P(x | θ)

当我们使用 Python 编程语言创建第一个 GAN 模型时,我们稍后将详细介绍这一点。阅读本章节以了解最大似然估计的概念、它在生成式建模中的重要作用、MLE 在生成式建模中的应用以及它的 Python 实现。

Understanding Maximum Likelihood Estimation (MLE)

最大似然估计 (MLE) 是一种强大的统计方法,用于基于观察到的数据来估计概率分布的参数。让我们借助它的数学基础来更详细地了解一下它 -

Mathematical Foundation of MLE

MLE 的核心在于似然函数:$\mathrm{L(\theta | x)}$。其中,$\mathrm{\theta}$ 表示分布的参数,x 表示观察到的数据。

似然函数量化了在给定特定参数值的情况下观察到数据的概率。在数学上,它表示为观察数据的联合概率密度函数 (PDF) 或概率质量函数 (PMF)。

\mathrm{L(\theta | x) \: = \: f(x | \theta)}

To keep the computation simple, we usually work with the log-likelihood function $\mathrm{l(\theta | x)}$, which is the natural logarithm of the likelihood function −

\mathrm{l(\theta | x) \: = \: \log L(\theta | x)}

Actually, the goal of MLE is to find the parameter values $\mathrm{\hat{\theta}}$ that maximize the likelihood function $\mathrm{L(\theta | x)}$ or equivalently, the log- likelihood function $\mathrm{l(\theta | x)}$ −

\mathrm{\hat{\theta} \: = \: argmax_{\theta} L(\theta | x)}

或者,

\mathrm{\hat{\theta} \: = \: argmax_{\theta} l(\theta | x)}

Now, to obtain the maximum likelihood estimates $\mathrm{\hat{\theta}}$, we differentiate the log- likelihood function $\mathrm{l(\theta | x)}$ with respect to the parameters $\mathrm{\theta}$ and set the derivatives equal to zero −

\mathrm{\frac{\partial \: l(\theta | x)}{\partial \: \theta} \: = \: 0}

Solving the above equation gives the MLE $\mathrm{\hat{\theta}}$.

MLE in Generative Modeling

Generative modeling, as we discussed earlier, involves capturing the underlying distribution of data and generates new data comparable to the original training data. In training generative models, MLE plays a crucial role by estimating the parameters of the underlying probability distribution.

Let’s see how MLE is applied in generative modeling −

Model Selection

We first need to choose a probabilistic model that captures the underlying data distribution. Some of the common models are Gaussian distributions, mixture models, neural networks, etc.

Likelihood Function

Next we need to define the likelihood function. This likelihood function measures the probability of observing the given data. For example, for a given dataset $\mathrm{D \: = \: \lbrace x_{1},x_{2},x_{3},\: \dots \: x_{n} \rbrace}$, the likelihood function $\mathrm{L(\theta | D)}$ depends on the model parameter $\mathrm{\theta}$ and is given by the product of the probabilities of observing each data point −

\mathrm{L(\theta | D) \: = \: \prod_{i=1}^N p(x_{i} | \theta)}

Maximization

Now we need to maximize the likelihood function with respect to the model parameters $\mathrm{\theta}$. Maximization involves finding the values of $\mathrm{\theta}$ that make the observed data most likely under the model.

Parameter Estimation

Finally, when the likelihood function is maximized, the resulting parameter values are used as the estimates for the parameters of the generative model. These estimated parameters define the learned distribution, which can then be used to generate new data points comparable to the observed data.

Applications of MLE in Generative Modeling

MLE is having wide-range applications across various domains of generative modeling. Given below are some of the significant applications −

  1. Gaussian Mixture Models (GMMs) − MLE is used to estimate the parameters of Gaussian components in GMMs. These parameters enable the modeling of complex data distributions with multiple modes.

  2. Variational Autoencoders (VAEs) − In VAEs, MLE is used to learn the parameters of the latent variable distribution. It allows the model to generate new data samples by sampling from this learned distribution.

  3. Generative Adversarial Networks (GANs) − GAN 不会直接优化似然函数,但 MLE 用于 GAN 的培训,以指导学习流程并提高样本质量。

Implementing Maximum Likelihood Estimation using Python

我们可以使用 Python 实施 MLE,并使用 Matplotlib 等库对其进行可视化。下面是一个从给定数据集中估计高斯分布的参数的简单 MLE 执行示例 −

Example

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm

# Sample dataset (you can replace this with your own data)
data = np.random.normal(loc=2, scale=1, size=2000)

# Maximum Likelihood Estimation for a Gaussian distribution
def maximum_likelihood_estimation(data):
   # Calculate the mean and standard deviation of the data
   mu = np.mean(data)
   sigma = np.std(data)
   return mu, sigma

# Perform Maximum Likelihood Estimation
estimated_mu, estimated_sigma = maximum_likelihood_estimation(data)

# Generate x values for plotting
x = np.linspace(min(data), max(data), 1000)

# Plot histogram of the data
plt.figure(figsize=(7.2, 5.5))
plt.hist(data, bins=30, density=True, alpha=0.6, color='blue', label='Data Histogram')

# Plot the true Gaussian distribution
plt.plot(x, norm.pdf(x, loc=2, scale=1), color='red', linestyle='--', label='True Gaussian Distribution')

# Plot the estimated Gaussian distribution using MLE
plt.plot(x, norm.pdf(x, loc=estimated_mu, scale=estimated_sigma), color='green', linestyle='-', label='Estimated Gaussian Distribution (MLE)')

plt.xlabel('Value')
plt.ylabel('Probability Density')
plt.title('Maximum Likelihood Estimation for Gaussian Distribution')
plt.legend()
plt.grid(True)

plt.show()

上述代码会生成一个图,其中显示数据的直方图、真实的高斯分布和使用最大似然估计 (MLE) 获得的估计高斯分布。

maximum likelihood estimation

Conclusion

在本章中,我们强调了 MLE 在生成建模中的重要性。在生成建模中,MLE 是用于学习数据分布和生成新样本的主干。

模型选择、似然函数、最大化和参数估计是我们可以借助其在生成建模中应用 MLE 的步骤。