Chatgpt 简明教程
How Does ChatGPT Work?
人工智能 (AI) 已成为我们生活、工作和与周围世界互动方式不可或缺的一部分。在 AI 内部,有自然语言处理 (NLP)、计算机视觉、机器学习、机器人技术等多个垂直领域。在这些领域中,NLP 已成为一个至关重要的研究和开发领域。由 OpenAI 开发的 ChatGPT 是 NLP 领域取得进展的一个最佳示例。
Artificial Intelligence (AI) has become an integral part of how we live, work, and interact with the world around us. Within AI, there are several verticals such as natural language processing (NLP), computer vision, machine learning, robotics, etc. Among them, NLP has emerged as a critical area of research and development. ChatGPT, developed by OpenAI, is one of the best examples of the advancements made in NLP.
阅读本章以了解 ChatGPT 的工作原理、它经历的严格训练过程及其生成响应的机制。
Read this chapter to know how ChatGPT works, the rigorous training process it undergoes, and the mechanism behind its generation of responses.
What is GPT?
GPT-3 的强大技术是 ChatGPT 的核心所在, "Generative Pre-trained Transformer" 代表 GPT。它是一种由 OpenAI 开发的 AI 语言模型。GPT 模型旨在理解和生成自然语言文本,这种方式几乎就像人类所做的那样。
At the core of ChatGPT lies the powerful technology called GPT that stands for "Generative Pre-trained Transformer". It is a type of AI language model developed by OpenAI. GPT models are designed to understand and generate natural language text almost like how humans do.
下图总结了 GPT 的要点-
The image given below summarizes the major points of GPT −
Components of GPT
让我们分解 GPT 的每个组件——
Let’s break down each component of GPT −
Generative
从简单的角度来说,生成性是指模型根据从训练数据中学到的模式生成新内容(如文本、图像或音乐)的能力。在 GPT 的上下文中,它生成原创文本,听起来像是人类写的。
Generative, in simple terms, refers to the model’s ability to generate new content like text, images, or music, based on the patterns it has learned from training data. In the context of GPT, it generates original text that sound like they were written by a human being.
Pre-trained
预训练涉及使用大型数据集对模型进行训练。在此阶段,模型基本上学习数据中的关系。就 GPT 而言,该模型使用无监督学习对来自书籍、文章、网站等海量文本进行预训练。这有助于 GPT 学习预测序列中的下一个单词。
Pre-training involves training a model on a large dataset. In this phase, the model basically learns relationships withing the data. In case of GPT, the model is pretrained on vast amount of text from books, articles, websites, and more using unsupervised learning. This helps GPT to learn to predict the next word in a sequence.
Transformer
变压器是 GPT 模型中使用的深度学习架构。变压器使用称为 self-attention 的机制来衡量序列中不同单词的重要性。它使 GPT 能够理解单词之间的关系,并允许它生成更类似人类的输出。
The transformer is a deep learning architecture used in the GPT model. Transformer uses a mechanism called self-attention to weigh the importance of different words in a sequence. It enables GPT to understand the relationship between words and allow it to produce more human like outputs.
How ChatGPT Was Trained?
ChatGPT 是使用 GPT 架构的一个变体进行训练的。以下是 ChatGPT 训练中涉及的阶段——
ChatGPT was trained using a variant of GPT architecture. Below are the stages involved in training ChatGPT −
Language Modelling
ChatGPT 使用来自互联网的大量文本数据进行预训练,例如书籍、文章、网站和社交媒体。此阶段包括训练模型给定序列中所有前一个单词,预测文本序列中的下一个单词。
ChatGPT was pre-trained on a large collection of text data from the internet, such as books, articles, websites, and social media. This phase involves training the model to predict the next word in a sequence of text given all the preceding words in the sequence.
此预训练步骤有助于模型学习自然语言的统计特性,并发展对人类语言的一般理解。
This pre-training step helps the model learn the statistical properties of natural language and develop a general understanding of human language.
Fine Tuning
在预训练后,ChatGPT 针对会话式 AI 任务进行了微调。此阶段包括针对具有对话记录或聊天日志等数据的小型数据集对模型进行进一步训练。
After pre-training, ChatGPT was fine-tuned for conversational AI tasks. This phase involves further training the model on a smaller dataset having data such as dialogue transcripts or chat logs.
在微调过程中,模型使用迁移学习等技术来学习生成与用户查询语境相关的响应。
During fine-tuning, the model uses techniques such as transfer learning to learn generating contextually relevant responses to the user queries.
How Does ChatGPT Generate Responses?
ChatGPT 的响应生成过程使用了神经网络架构、注意力机制和概率模型等组件。借助这些组件,ChatGPT 可以快速生成与上下文相关的响应并提供给用户。
ChatGPT’s response generation process uses components like neural network architecture, attention mechanism, and probabilistic modeling. With the help of these components, ChatGPT can generate contextually relevant and quick responses to the users.
让我们了解 ChatGPT 响应生成过程中的步骤——
Let’s understand the steps involved in ChatGPT’s response generation process −
Encoding
ChatGPT 的响应生成过程始于将输入文本编码为数字格式,以便模型可以对其进行处理。此步骤使用模型捕获有关用户输入的语义信息,将单词或子单词转换为嵌入。
The response generation process of ChatGPT starts with encoding the input text into a numerical format so that the model can process it. This step converts the words or subwords into embeddings using which the model captures semantic information about the user input.
Language Understanding
现在将经过编码的输入文本输入到经过预训练的 ChatGPT 模型中,该模型通过多层 transformer 块进一步处理文本。如前所述,transformer 块使用 self-attention 机制来衡量每个令牌相对于其他令牌的重要性。这有助于模型根据上下文理解输入。
The encoded input text is now fed into the pre-trained ChatGPT model that further process the text through multiple layers of transformer blocks. As discussed earlier, the transformer blocks use self-attention mechanism to weigh the importance of each token in relation to the others. This helps the model to contextually understand the input.
Probability Distribution
在预处理输入文本后,ChatGPT 现在会为序列中的下一个单词生成一个关于词表的概率分布。给定所有前一个单词,每个单词都有可能是序列中的下一个单词。
After preprocessing the input text, ChatGPT now generates a probability distribution over the vocabulary for the next word in the sequence. This probability distribution has each word being the next in sequence, given all the preceding words.
Sampling
最后,ChatGPT 使用此概率分布选择下一个单词。然后将这个词添加到生成的响应中。此响应生成过程将继续,直到满足预定义的停止条件或生成结束序列令牌。
Finally, ChatGPT uses this probability distribution to select the next word. This word is then added to the generated response. This response generation process continues until a predefined stopping condition is met or generates an end-of-sequence token.
Conclusion
在本章中,我们首先解释了 ChatGPT 的基础,它是一种称为生成式预训练转换器 (GPT) 的 AI 语言模型。
In this chapter, we first explained the foundation of ChatGPT which is an AI language model called Generative Pre-trained Transformer (GPT).
然后,我们解释了 ChatGPT 的训练过程。 Language modelling, fine tuning, 和 iterative improvements 是其训练过程中的阶段。
We then explained the training process of ChatGPT. Language modelling, fine tuning, and iterative improvements are the stages involved in its training process.
我们还简要讨论了 ChatGPT 如何生成与上下文相关的快速响应。其中涉及我们详细讨论的 encoding, language understanding, probability distribution 和 sampling, 。
We also discussed briefly how ChatGPT generates contextually relevant and quick responses. It involves encoding, language understanding, probability distribution and sampling, which we discussed in detail.
ChatGPT 通过其 GPT 架构的集成、严格的培训流程和先进的响应生成机制,代表了人工智能驱动的会话代理的一个重大进步。
ChatGPT, through its integration of the GPT architecture, rigorous training process, and advanced response generation mechanisms, represents a significant advancement in AIdriven conversational agents.