Gen-ai 简明教程

Positional Encoding in Transformer Models

借助输入嵌入,转换器获得离散标记(如单词、子单词或字符)的向量表示。然而,这些向量表示并未提供有关这些标记在序列中位置的信息。这就是在输入嵌入子层之后立即在变换器的架构中使用名为“位置编码”的关键组件的原因。

With the help of input embeddings, transformers get vector representations of discrete tokens like words, sub-words, or characters. However, these vector representations do not provide information about the position of these tokens within the sequence. That’s the reason a critical component named "positional encoding" is used in the architecture of the Transformer just after the input embedding sub-layer.

位置编码使模型能够通过向输入序列中的每个标记提供有关其位置的信息来理解序列顺序。在本章中,我们将了解什么是位置编码、为什么需要它、它的工作原理以及它在 Python 编程语言中的实现。

Positional encoding enables the model to understand the sequence order by providing each token in the input sequence with information about its position. In this chapter, we will understand what positional encoding is, why we need it, its working and its implementation in Python programming language.

What is Positional Encoding?

位置编码是变换器中用于提供输入序列中标记顺序信息的一种机制。在变换器架构中,位置编码组件被添加到输入嵌入子层之后。

Positional Encoding is a mechanism used in Transformer to provide information about the order of tokens within an input sequence. In the Transformer architecture, positional encoding component is added after the input embedding sub-layer.

查看下图;它是原始变换器架构的一部分,表示位置编码组件的结构 -

Take a look at the following diagram; it is a part of the original transformer architecture, representing the structure of positional encoding component −

structure of positional encoding component

Why Do We Need Positional Encoding in the Transformer Model?

尽管变换器具有强大的自注意力机制,但它缺乏固有的顺序感。与按特定顺序处理序列的循环神经网络 (RNN) 和长短期记忆 (LSTM) 不同,变换器的并行处理不提供有关输入序列中标记位置的信息。由于这个原因,模型无法理解上下文,特别是在单词顺序很重要的任务中。

The Transformer, despite having powerful self-attention mechanism, lacks an inherent sense of order. Unlike Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTMs) that process a sequence in a specific order, the Transformer’s parallel processing does not provide information about the position of the tokens within the input sequence. Due to this, the model cannot understand the context, particularly in the tasks where the order of words is important.

为了克服此限制,启用了位置编码为输入序列中的每个标记提供有关其位置的信息。然后将这些编码添加到输入嵌入中,从而确保转换器处理标记及其位置上下文。

To overcome this limitation, positional encoding is introduced that gives each token in the input sequence information about its position. These encodings are then added to the input embeddings which ensures that the Transformer processes the tokens along with their positional context.

How Positional Encoding Works?

我们在上一章中讨论了,对于位置编码函数输出的每个向量表示,转换器都期望一个固定大小的维度空间(它可能是 dmodel = 512 或任何其他常数值)。

We have discussed in previous chapter that Transformer expects a fixed size dimensional space (it may be dmodel = 512 or any other constant value) for each vector representation of the output of the positional encoding function.

例如,让我们看一下以下句子:

As an example, let’s see the sentence given below −

I am playing with the brown ball and my brother is playing with the red ball.

单词“brown”和“red”可能很相似,但在该句子中,它们相距甚远。单词“brown”位于第 6 位 (pos = 6),而单词“red”位于第 15 位 (pos = 15)。

The words "brown" and "red" may be similar but in this sentence, they are far apart. The word "brown" is in position 6 (pos = 6) and the word "red" is in position 15 (pos = 15).

这里的问题是我们需要找到一种方法,将值添加到输入句子中每个单词的单词嵌入中,以便它具有有关其序列的信息。但是,对于每个单词嵌入,我们需要找到一种方法来提供 (0, 512) 范围内的信息。

Here, the problem is that we need to find a way to add a value to the word embeddings of each word in the input sentence so that it has the information about its sequence. However, for each word embedding, we need to find a way to provide information in the range of (0, 512).

位置编码可以通过多种方式实现,但 Vashwani 等人。(2017) 在原始 Transformer 模型中使用了一种基于正弦函数的特定方法为序列中的每个位置生成唯一的位置编码。

Positional encoding can be achieved in many ways, but Vashwani et. al. (2017) in the original Transformer model used a specific method based on sinusoidal functions to generate a unique position encoding for each position in the sequence.

以下公式显示了如何定义给定位置 pos 和维数 i 的位置编码:

The equations below show how the positional encoding for a given position pos and dimension i can be defined −

\mathrm{PE_{pos \: 2i} \: = \: sin\left(\frac{pos}{10000^{\frac{2i}{d_{model}}}}\right)}

\mathrm{PE_{pos \: 2i+1} \: = \: cos\left(\frac{pos}{10000^{\frac{2i}{d_{model}}}}\right)}

其中,dmodel 是嵌入的维度。

Here, dmodel is the dimension of embeddings.

Creating Positional Encodings Using Sinusoidal Functions

以下是使用正弦函数创建位置编码的 Python 脚本:

Given below is a Python script to create positional encodings using sinusoidal functions −

def positional_encoding(max_len, d_model):
   pe = np.zeros((max_len, d_model))
   position = np.arange(0, max_len).reshape(-1, 1)
   div_term = np.exp(np.arange(0, d_model, 2) * -(np.log(10000.0) / d_model))
   pe[:, 0::2] = np.sin(position * div_term)
   pe[:, 1::2] = np.cos(position * div_term)
   return pe

# Parameters
max_len = len(tokens)

# Generate positional encodings
pos_encodings = positional_encoding(max_len, embed_dim)

# Adjust the length of the positional encodings to match the input
input_embeddings_with_pos = input_embeddings + pos_encodings[:len(tokens)]

print("Positional Encodings:\n", pos_encodings)
print("Input Embeddings with Positional Encoding:\n", input_embeddings_with_pos)

现在,让我们看看如何将它们添加到我们在上一章中实现的输入嵌入中:

Now, let’s see how we can add them to the input embeddings we have implemented in the previous chapter −

import numpy as np

# Example text and tokenization
text = "Transformers revolutionized the field of NLP"
tokens = text.split()

# Creating a vocabulary
vocab = {word: idx for idx, word in enumerate(tokens)}

# Example input (sequence of token indices)
input_indices = np.array([vocab[word] for word in tokens])

print("Vocabulary:", vocab)
print("Input Indices:", input_indices)

# Parameters
vocab_size = len(vocab)
embed_dim = 512		# Dimension of the embeddings

# Initialize the embedding matrix with random values
embedding_matrix = np.random.rand(vocab_size, embed_dim)

# Get the embeddings for the input indices
input_embeddings = embedding_matrix[input_indices]

print("Embedding Matrix:\n", embedding_matrix)
print("Input Embeddings:\n", input_embeddings)

def positional_encoding(max_len, d_model):
   pe = np.zeros((max_len, d_model))
   position = np.arange(0, max_len).reshape(-1, 1)
   div_term = np.exp(np.arange(0, d_model, 2) * -(np.log(10000.0) / d_model))
   pe[:, 0::2] = np.sin(position * div_term)
   pe[:, 1::2] = np.cos(position * div_term)
   return pe

# Parameters
max_len = len(tokens)

# Generate positional encodings
pos_encodings = positional_encoding(max_len, embed_dim)

# Adjust the length of the positional encodings to match the input
input_embeddings_with_pos = input_embeddings + pos_encodings[:len(tokens)]

print("Positional Encodings:\n", pos_encodings)
print("Input Embeddings with Positional Encoding:\n", input_embeddings_with_pos)

Output

在运行上述脚本后,我们将得到以下输出:

After running the above script, we will get the following output −

Vocabulary: {'Transformers': 0, 'revolutionized': 1, 'the': 2, 'field': 3, 'of': 4, 'NLP': 5}
Input Indices: [0 1 2 3 4 5]
Embedding Matrix:
 [[0.71034683 0.08027048 0.89859858 ... 0.48071898 0.76495253 0.53869711]
 [0.71247114 0.33418585 0.15329225 ... 0.61768814 0.32710687 0.89633072]
 [0.11731439 0.97467007 0.66899319 ... 0.76157481 0.41975638 0.90980636]
 [0.42299987 0.51534082 0.6459627  ... 0.58178494 0.13362482 0.13826352]
 [0.2734792  0.80146145 0.75947837 ... 0.15180679 0.93250566 0.43946461]
 [0.5750698  0.49106984 0.56273384 ... 0.77180581 0.18834177 0.6658962 ]]
Input Embeddings:
 [[0.71034683 0.08027048 0.89859858 ... 0.48071898 0.76495253 0.53869711]
 [0.71247114 0.33418585 0.15329225 ... 0.61768814 0.32710687 0.89633072]
 [0.11731439 0.97467007 0.66899319 ... 0.76157481 0.41975638 0.90980636]
 [0.42299987 0.51534082 0.6459627  ... 0.58178494 0.13362482 0.13826352]
 [0.2734792  0.80146145 0.75947837 ... 0.15180679 0.93250566 0.43946461]
 [0.5750698  0.49106984 0.56273384 ... 0.77180581 0.18834177 0.6658962 ]]

Positional Encodings:
[[ 0.00000000e+00  1.00000000e+00  0.00000000e+00 ...  1.00000000e+00
   0.00000000e+00  1.00000000e+00]
 [ 8.41470985e-01  5.40302306e-01  8.21856190e-01 ...  9.99999994e-01
   1.03663293e-04  9.99999995e-01]
 [ 9.09297427e-01 -4.16146837e-01  9.36414739e-01 ...  9.99999977e-01
   2.07326584e-04  9.99999979e-01]
 [ 1.41120008e-01 -9.89992497e-01  2.45085415e-01 ...  9.99999948e-01
   3.10989874e-04  9.99999952e-01]
 [-7.56802495e-01 -6.53643621e-01 -6.57166863e-01 ...  9.99999908e-01
   4.14653159e-04  9.99999914e-01]
 [-9.58924275e-01  2.83662185e-01 -9.93854779e-01 ...  9.99999856e-01
   5.18316441e-04  9.99999866e-01]]

Input Embeddings with Positional Encoding:
[[0.71034683  1.08027048  0.89859858 ...  1.48071898  0.76495253  1.53869711]
 [1.55394213  0.87448815  0.97514844 ...  1.61768813  0.32721053  1.89633072]
 [1.02661182  0.55852323  1.60540793 ...  1.76157479  0.4199637   1.90980634]
 [0.56411987 -0.47465167  0.89104811 ...  1.58178489  0.13393581  1.13826347]
 [-0.4833233   0.14781783  0.1023115  ... 1.15180669  0.93292031  1.43946452]
 [-0.38385447  0.77473203 -0.43112094 ... 1.77180567  0.18886009  1.66589607]]

Conclusion

在本章中,我们介绍了位置编码的基础知识、它的必要性、工作原理、Python 实现以及在 Transformer 模型中的集成。位置编码是 Transformer 架构的基本组成部分,它使模型能够捕获序列中标记的顺序。

In this chapter, we covered the basics of positional encoding, its necessity, working, Python implementation, and integration within the Transformer model. Positional encoding is a fundamental component of the Transformer architecture that enables the model to capture the order of tokens in a sequence.

理解和实现位置编码的概念对于利用 Transformer 模型的全部潜力并将其有效应用于解决复杂的 NLP 问题非常重要。

Understanding and implementing the concept of positional encoding is important for using the full potential of the Transformer model and applying them effectively to solve complex NLP problems.