Gen-ai 简明教程
Generative Adversarial Network - Architecture and Types
生成对抗网络 (GAN) 通常利用诸如卷积神经网络 (CNN) 的架构。生成对抗网络框架由两个神经网络组成: Generator 和 Discriminator 。这些网络扮演着重要的作用,其中生成器专注于创建新数据而判别器评估它。阅读此章节来了解生成对抗网络架构、它们的组件、类型以及让它们如此强大的机制。
A Generative Adversarial Network (GAN) typically utilizes architectures such as convolutional neural networks (CNN). GAN framework is composed of two neural networks: Generator and Discriminator. These networks play an important role where the generator focuses on creating new data and the discriminator evaluates it. Read this chapter to learn the architecture of GANs, their components, types and mechanism that make them so powerful.
The Role of Generator in GAN Architecture
生成对抗网络架构的第一部分是生成器。我们看看它的功能和结构 −
The first primary part of GAN architecture is the Generator. Let’s see its function and structure −
Generator: Function and Structure
生成器的主要目标是生成新的数据样本,这些样本旨在与数据集中的真实数据类似。它从一个随机噪声向量开始,并通过全连接层(如 Dense 或卷积层)对其进行转换以生成合成数据样本。
The primary goal of the generator is to generate new data samples that are intended to resemble real data from the dataset. It begins with a random noise vector and transforms it through fully connected layers like Dense or Convolutional layers to generate synthetic data sample.
Generator: Layers and Components
以下是生成器神经网络的层和组件列表−
Listed below are the layers and components of the generator neural network −
-
Input Layer − The generator receives a low dimensionality random noise vector or input data as input.
-
Fully Connected Layers − The FLC is used to increase the input noise vector dimensionality.
-
Transposed Convolutional Layers − These layers are also known as deconvolutional layers. It is used for upsampling i.e., to generate an output feature map having greater spatial dimension than the input feature map.
-
Activation Functions − Two commonly used activations functions are: Leaky ReLU and Tanh. The Leaky ReLU activation function helps in decreasing the dying ReLU problem, while the Tanh activation function makes sure that the output is within a specific range.
-
Output Layer − It produces the final data output like an image of a certain resolution.
Generator: Objective Function
生成器神经网络的目标是创建判别器无法与真实数据区分开来的数据。这是通过最小化生成器的损失函数实现的 −
The goal of generator neural network is to create data that the discriminator cannot distinguish from real data. This is achieved by minimizing the generator’s loss function −
\mathrm{L_{G} \: = \: \log(1 \: - \: D(G(Z)))}
此处,G(z) 是生成的数据,D(⋅) 表示判别器的输出。
Here, G(z) is the generated data and D(⋅) represents the discriminator’s output.
The Role of Discriminator in GAN Architecture
GAN 架构的第二部分是判别器。让我们看一下它的功能和结构 −
The second part of GAN architecture is the Discriminator. Let’s see its function and structure −
Discriminator: Function and Structure
判别器的主要目标是将输入数据分类为真实数据或由生成器生成的数据。它将数据样本作为输入,并给出输出概率,表示样本是真实的还是假的。
The primary goal of the discriminator is to classify the input data as real or generated by the generator. It takes a data sample as input and gives a probability as output that indicates whether the sample is real or fake.
Discriminator: Layers and Components
以下是判别器神经网络的层和组件列表 −
Listed below are the layers and components of the discriminator neural network −
-
Input Layer − The discriminator receives a data sample from either the real dataset or the generator as input.
-
Convolutional Layers − It is used for downsampling the input data to extract relevant features.
-
Fully Connected Layers − The FLC is used to process the extracted features and make a final classification.
-
Activation Functions − It uses Leaky ReLU activation function to address the vanishing gradient problem. It also introduces non-linearity.
-
Output Layer − As name implies, it gives a single probability value between 0 and 1 as output that indicates whether the sample is real or fake.
Discriminator: Objective Function
判别器神经网络的目标是最大化其正确区分真实数据和生成数据的可能性。这是通过最小化判别器的损失函数实现的 −
The goal of discriminator neural network is to maximize its ability to correctly distinguish real data from generated data. This is achieved by minimizing the discriminator’s loss function −
\mathrm{L_{D} \: = \: -(\log D(X) \: + \: \log(1 \: - \: D(G(Z))))}
本文中,“x” 是真实的数据样本。
Here, "x" is a real data sample.
Types of Generative Adversarial Networks
根据生成器和判别器网络相互作用的方式,我们可以构建出不同类型的 GAN 模型。以下是几个著名的变体 −
We can have different types of GAN models based on the way the generator and the discriminator networks interact with each other. Here are some notable variations −
Vanilla GAN
原始 GAN 是生成对抗网络 (GAN) 最简单的形式。它可以提供 GAN 工作原理的基本概念。“原始”一词表示这是未进行任何高级修改或增强操作的最简单形式。
Vanilla GAN represents the simplest form of generative adversarial networks (GANs). It provides a fundamental understanding of how GANs work. The term "Vanilla" implies that this is the simplest form without any advanced modifications or enhancements.
Deep Convolutional GANs (DCGANs)
DCGAN 是 GAN 最流行的实现之一。它在 ConvNets 中由 multi-layer perceptron 构成,以稳定 GAN 训练过程。这些指导方针显著稳定了 GAN 训练,尤其是在图像生成任务中。
DCGANs is one of the most popular implementations of GANs. It is composed of ConvNets in the place of multi-layer perceptron to stabilize GAN training. These guidelines have significantly stabilized GAN training particularly for image generation tasks.
DCGAN 的一些关键功能包括使用:
Some of the key features of DCGANs include the use of:
-
Strided Convolutions
-
Batch Normalization
-
The removal of fully connected hidden layers
Conditional GANs (cGANs)
条件 GAN (cGAN) 将附加条件信息(例如类标签、属性甚至其他数据样本)包含到生成器和判别器中。在这些条件信息的帮助下,条件 GAN 让我们可以控制生成输出的特征。
Conditional GAN (cGAN) includes additional condition information like class labels, attributes, or even other data samples, into both generator and discriminator. With the help of these conditioning information, Conditional GANs provide us the control over the characteristic of the generated output.
CycleGANs
CycleGAN 专门用于不配对图像到图像的转换任务,其中输入和输出图像之间没有关系。循环一致性损失可确保从一个域转换到另一个域再返回时,产生一致的结果。
CycleGANs are designed for unpaired image-to-image translation tasks where there is no relation between the input and output images. A cycle consistency loss is used to ensure that translating from one domain to another and back again produces consistent results.
Progressive GANs (ProGANs)
ProGAN 通过在训练过程中逐步提高生成器和判别器的分辨率来生成高分辨率图像。使用这种方法,您可以创建更详细、质量更高的图像。
ProGANs generate high-resolution images by progressively increasing the resolution of both the generator and discriminator during training. With this approach, you can create more detailed and higher-quality images.
StyleGANs
NVIDIA 开发的 StyleGAN 专门用于生成逼真的高质量图像。它们引入了一些用于改善图像合成工作的创新技术,并且对特定属性拥有更好的控制。
StyleGANs, developed by NVIDIA, is specifically designed for generating photo-realistic high-quality images. They introduced some innovative techniques for improved image synthesis and have some better control over specific attributes.
Laplacian Pyramid GAN (LAPGAN)
拉普拉斯金字塔 GAN (LAPGAN) 是一种使用多分辨率方法生成高质量图像的生成对抗网络。它使用拉普拉斯金字塔框架,即在多个尺度上生成图像。
Laplacian Pyramid GAN (LAPGAN) is a type of generative adversarial network that uses a multi-resolution approach to generate high-quality images. It uses a Laplacian pyramid framework where images are generated at multiple scales.
与标准 GAN 相比,LAPGAN 主要在创建详细而逼真的图像方面更有效。
LAPGANs are mainly effective in creating detailed and realistic images as compared to standard GANs.