Gen-ai 简明教程

The Role of Probability Distribution in Generative Models

机器学习 (ML) 和深度学习 (DL) 进步使机器能够从过去的数据中学习，甚至可以从看不见的数据中进行预测。此类进步之一是生成模型，它捕获数据的底层分布并生成与原始训练数据相当的新数据。但他们是如何做到的呢？

Machine Learning (ML) and Deep Learning (DL) advancements empower machines to learn from past data and predict even from unseen data. One such advancement is Generative Models that capture the underlying distribution of data and generate new data comparable to the original training data. But how do they do it?

借助概率分布，生成模型可以管理数据中的不确定性和变化。阅读本章以了解概率分布、其类型、其在生成建模中的用途及其应用。

It’s Probability Distribution with the help of which generative models can manage the uncertainties and variations in the data. Read this chapter to understand probability distribution, its types, its uses in Generative Modeling, and its applications.

What is Probability Distribution?

概率分布是一种数学函数，它表示给定范围内随机变量的不同可能值的概率。我们可以使用图形或概率表来描述概率分布。

Probability Distribution is a mathematical function that represents the probability of different possible values of a random variable within a given range. We can use either graphs or probability tables to depict probability distribution.

例如，想象一下抛硬币，有一个概率分布告诉我们正面或反面的几率。以下概率表对其进行了描述 −

For example, imagine flipping a coin, there is a probability distribution that tells us the chances of getting heads or tails. The following probability table describes it −

Outcomes

Probability

Heads

0.5

Tails

0.5

概率分布是频数分布 (FD) 的理论表示。在统计中，FD 描述了数据集中的变量出现的次数。另一方面，概率分布除了频数外，还为它们分配概率。

A probability distribution is a theoretical representation of frequency distribution (FD). In statistics, FD describes the number of occurrences of a variable in a dataset. On the other hand, probability distribution, along with the frequencies of number of occurrences, also assigns probabilities to them.

众所周知，概率表示某件事发生的可能性是一个数字，介于 0（表示不可能）和 1（表示肯定）之间。这就是为什么一个值越有可能是它在样本中出现的频率越高。

As we know probability, that says how likely something is to occur is a number, is between 0 (means impossible) and 1 (means certain). That’s why the higher probability of a value represents its higher frequency in a sample.

Types of Probability Distributions

有两种类型的概率分布 −

There are two types of probability distributions −

Discrete Probability Distributions
Continuous Probability Distributions

让我们仔细看看这两种类型的概率分布。

Let’s take a closer look at these two types of probability distributions.

Discrete Probability Distributions

离散概率分布是描述离散或分类随机变量中不同事件概率的数学函数。

Discrete probability distributions are mathematical functions that describe the probabilities of different occurrences from a discrete or categorial random variables.

离散概率分布仅包含具有可能概率的值。简单来说，它不包括任何零概率的值。例如，5.5 不是掷骰子的可能结果，因此它不包括在掷骰子的概率分布中。

Discrete probability distribution includes only those values with a possible probability. In simple words, it does not include any value with zero probability. For example, 5.5 is not a possible outcome of dice rolls, hence it does not include as a probability distribution of dice rolls.

离散概率分布中所有可能值的概率总和始终为 1。

The total of the probabilities of all possible values in a discrete probability distribution is always one.

让我们看看一些 common discrete probability distributions −

Let’s see some common discrete probability distributions −

Discrete Probability Distribution

Explanation

Example

Bernoulli Distribution

It describes the probability of success (1) or failure (0) in a single experiment.

The outcome of a single coin flip.

Binomial Distribution

It models the number of successes in a fixed number of trials n with p probability.

The number of times it comes heads when you toss a coin 10 times.

Poisson Distribution

It predicts the k number of events occurring in a fixed interval of time or space.

The number of emails messages received per day.

Geometric Distribution

It represents the number of trials needed to achieve the first success in a sequence of trials.

The number of times a coin is flipped until it lands on heads.

Hypergeometric Distribution

It calculates the probability of drawing a specific number of successes from a finite population.

The number of red balls drawn from a bag of mixed colored balls.

Continuous Probability Distributions

顾名思义，连续概率分布是描述连续数值范围中不同事件发生概率的数学函数。

As the name implies, continuous probability distributions are mathematical functions that describe the probabilities of different occurrences within a continuous range of values.

连续概率分布包括无限数量的可能值。例如，在区间 [4, 5] 中，4 和 5 之间有无限个值。

Continuous probability distribution includes an infinite number of possible values. For example, in the interval [4, 5] there are infinite values between 4 and 5.

我们来看一些常见的连续概率分布 −

Let’s see some common continuous probability distributions −

Continuous Probability Distribution

Explanation

Example

Continuous Uniform Distribution

It assigns equal probability to all values within equal-sized interval.

The height of a person between 5 to 6 feet.

Normal (Gaussian) Distribution

It forms a bell-shaped curve and describes the data clustered around the mean and symmetrical tails.

IQ scores

Exponential Distribution

It models the time between events in a Poisson process, where events occur at a constant rate.

The time until the next customer arrives.

Log-normal Distribution

It represents the right-skewed data when plotted on a logarithmic scale.

Stock prices, income distributions, etc.

Beta Distribution

It describes the random variables constrained to a finite interval. It is often used in Bayesian statistics.

The probability of success in a binomial trial.

Use of Probability Distributions in Generative Modeling

概率分布生成模型中扮演了一个至关重要的角色。让我们看看概率分布如何在生成模型中被使用的一些重要方式——

Probability distributions play a crucial role in generative modeling. Let’s check out some of the important ways in which probability distributions are used in generative modeling −

Data Distribution − Generative Models aim to capture the underlying probability distribution of data from which the samples are taken.
Generating New Samples − Once understanding the data distribution is done, generative models can generate new data comparable to the original dataset.
Evaluation and Training − Probability distributions are used to evaluate and train generative models. Evaluation metrics such as likelihood, perplexity, and Wasserstein distance are used to evaluate the quality of generated samples compared to the original dataset.
Variability and Uncertainty − Probability distributions are used to find the variability and uncertainty present in the data. Generative models can use this information to generate distinct and realistic samples.

Applications of Probability Distribution

在各个领域中有广泛的生成模型任务使用了概率分布，其中一些如下所列——

There is a wide range of generative modeling tasks across various domains that use probability distributions, some of which are listed below −

Image Generation − Generative models such as Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs) use probability distributions to generate realistic images from scratch. This has applications in computer graphics, creative design, and content generation.
Text Synthesis − Language models, such as OpenAI’s ChatGPT, use probability distributions to generate relevant text output based on a given prompt or input. This has applications in chatbots, virtual assistants, and automated content generation systems.
Anomaly Detection − Generative model, by learning the underlying probability distribution of normal data, can be used for anomaly detection and outlier identification in datasets. This has applications in fraud detection, network security, and medical diagnostics.

Conclusion

在本章中，我们解释了生成模型中概率分布的关键作用。我们首先涵盖了概率分布的类型，即离散概率分布和连续概率分布以及它们是什么。

In this chapter, we explained the critical role of probability distribution in generative modeling. We first covered what probability distributions are along with their types, Discrete and Continuous Probability Distribution.

离散概率分布描述了离散随机变量或分类随机变量中不同事件发生的概率，而连续概率分布描述了连续变量值范围内内不同事件发生的概率。我们还重点介绍了一些常见的离散概率分布和连续概率分布。

Discrete probability distributions describe the probabilities of different occurrences from discrete or categorical random variables, whereas continuous probability distributions describe the probabilities of different occurrences within a continuous range of values. We also highlighted some of the common probability distributions that come under discreate and continuous probability distributions.

我们论证了数据分布、生成新样本、评估和训练是如何成为概率分布被用来生成模型生成新样本的重要方法。我们还重点介绍了概率分布在图像生成、文本合成和异常检测等生成模型任务中的不同应用。

We demonstrated how data distribution, generating new samples, evaluation, and training are some of the important ways in which probability distributions are used in generative modeling to generate new samples. We also highlighted the diverse applications of probability distributions in generative modeling tasks such as image generation, text synthesis, and anomaly detection.