Gen-ai 简明教程
Evolution of Generative AI
很多年前,我们智能手机的预测文本功能让我们感到惊讶。然后,Gmail 通过智能回复功能让我们的生活变得轻松,利用机器学习算法为我们提供一个句子回应。这些基本创新是生成式人工智能的初始形式的示例。但是,自那以后,随着生成式人工智能的进步,我们已经走过了很长的路。
Many years ago, our smartphone’s predictive text feature made us wonder. Then Gmail made our lives easy with the Smart Reply feature which utilizes machine learning algorithms to provide us one one-sentence response. These basic innovations are examples of initial forms of generative AI. But, since then, with the advancements in generative AI we have come a long way.
生成式人工智能的进化是一个引人入胜的故事。让我们看看 GenAI 如何演变并改变 AI 格局:
The evolution of generative AI is a fascinating story. Let’s see how GenAI has evolved and transformed the landscape of AI −
Early AI Exploration (1950s - 1980)
1950 年代是 AI 早期思想出现的时候。1950 年,艾伦·图灵在他的论文《计算机机械与智能》中探索了 AI 的数学可能性,并提出了为什么机器不能像人类一样思考的问题。
1950s was the time when early ideas of AI emerged. In 1950, Alan Turing in his paper “Computer Machinery and Intelligence” explored the mathematics possibilities of AI and raised a question why machines can’t think like a human being.
然后在 1952 年,英国计算机科学家 Christopher Strachey 为曼彻斯特 Mark 1 计算机创建了一个程序,生成了一封模拟情书。这个程序是第一个文本生成软件。
Then in 1952, Christopher Strachey, a British computer scientist, created a program for the Manchester Mark 1 computer that generated a simulated love letter. This program was the first text-generated software.
1966 年,麻省理工学院教授 Joseph Weizenbaum 创建了第一个名为 ELIZA 的聊天机器人。它是一个早期的自然语言处理程序,它模拟了与心理治疗师的对话。
In 1966, Joseph Weizenbaum, an MIT professor, creates the first chatbot named ELIZA. It was an early natural language processing program which simulates conversations with a psychotherapist.
1968 年,麻省理工学院斯坦福大学学生 Terry Winograd 创建了一个名为 SHRDLU 的自然语言处理计算机程序。它实际上是展示了一个系统,该系统能够理解和响应受限的块状世界环境中的命令。
In 1968, a Stanford university student Terry Winograd at MIT creates a natural language processing computer program named SHRDLU. It was actually a demonstration of a system capable of understanding and responding to commands in a restricted block world environment.
Michael Toy 和 Glenn Wichman 在 1980 年开发了一个基于 Unix 的视频游戏 Rogue。它是第一批为动态生成新游戏关卡实现过程生成的游戏之一。
Michael Toy and Glenn Wichman developed a Unix-based video game called Rogue in 1980. It was one of the first games to implement procedural generation for dynamically generating new game levels.
Neural Network Resurgence (1980s - 2010)
1985 年,著名的计算机科学家兼哲学家 Judea Pearl 引入了贝叶斯网络,也称为因果网络的信念网络。贝叶斯网络建立了生成式 AI 中的建模概念。
In 1985, Judea Pearl, a renowned computer scientist and philosopher introduced Bayesian networks which are also known as belief networks of casual networks. Bayesian networks establish the modeling concept in Generative AI.
Michael Irwin Jordan 于 1986 年发表的“串行顺序:并行分布式处理方法”奠定了使用 RNN(循环神经网络)的基础。
Michael Irwin Jordan, in 1986, with his publication “Serial order: A parallel distributed processing approach” lay the foundation for use of RNNs (Recurrent Neural Networks).
1989 年, Yann LeCun 和 Yosua Bengio 展示了将 CNN(卷积神经网络)用于图像识别的运用。
In 1989, Yann LeCun and Yosua Bengio demonstrate the use of CNN (Convolutional Neural Networks) for image recognition.
2003 年,蒙特利尔大学的研究人员发表了一篇论文,题目为“神经概率语言模型”。该论文提出了一种使用前馈神经网络进行语言建模的技术。
In the year 2003, Researchers from University of Montreal published a paper, “A Neural Probabilistic Language Model”. This paper suggests a technique for language modeling using feed-forward neural networks.
2006 年,斯坦福大学教授 Fei-Fei Li 创建了 ImageNet 数据库,为视觉对象识别奠定了基础。
In 2006, Fei-Fei Li, a professor at Stanford University creates ImageNet database that provides the foundation for visual object recognition.
Deep Learning Dominance & Transformer Revolution (2010s - 2020)
2011 年,苹果发布了 Siri ,一种基于深度学习技术的新文本转语音。
In 2011, Apple released Siri, a text-to-speech voice based on deep learning technology.
2012 年,由 Alex Krizhevsky 引入了 AlexNet CNN architecture 。它确实是一种创新的方法,可以自动训练神经网络,并利用近期 GPU 的进展。
In 2012, the AlexNet CNN architecture was introduced by Alex Krizhevsky. It was indeed an innovative approach to automatically train neural networks that take advantage of recent GPU advances.
Ian Goodfellow 和他的同事在 2014 年开发了生成对抗网络 (GAN)。同年, Max Welling 和 Diederick Kingma 开发了 Variational Encoders (VAE)来生成文本、图像和视频。
Ian Goodfellow and his colleagues develops Generative Adversarial Networks (GANs) in the year 2014. In the same year, Max Welling and Diederick Kingma developed Variational Encoders (VAEs) to generate text, images, and videos.
2015 年,斯坦福大学的一组研究人员发表了一篇论文“使用非平衡热力学的深度无监督学习”。他们介绍了一种扩散模型技术,该技术提供了一种逆向工程过程,将噪声添加到图像中。
In 2015, a group of researchers from Stanford University published a paper "Deep Unsupervised Learning using Nonequilibrium Thermodynamics". They introduced a technique on diffusion model that provides a way to reverse engineer the process of adding noise to an image.
谷歌研究人员在 2017 年引入了 Transformer 的概念。该技术会自动将未标记文本解析为 large language models (LLMs) 。
Google researchers, in the year 2017, introduce the concept of transformers. This technique automatically parses unlabeled text into large language models (LLMs).
2018 年,谷歌将 Transformer 实现了 BERT(Transformer 中的双向编码器表征)。同年,OpenAI 引入了 GPT-1 ,一种基于 Transformer 的语言模型。
In 2018, Google implemented transformers into BERT (Bidirectional Encoder Representations from Transformers). In the same year, GPT-1, a transformer-based language model was introduced by OpenAI.
Specialized Generative Models (2020s - Present)
2020 年,OpenAI 发布了生成式预训练 Transformer 的第三次迭代,即 GPT-3 。这是能够生成类似人类文本的最大语言模型之一。
In 2020, OpenAI released the third iteration of their Generative Pre-trained Transformer, i.e., GPT-3. It was one of the largest language models capable of generating human-like text.
第二年,即 2021 年,OpenAI 推出了可以从文本提示生成图像的 Dall-E 。2022 年 11 月 30 日,OpenAI 揭开了 web preview of ChatGPT 的面纱。
Next year, in 2021, OpenAI launched Dall-E that can generate images from text prompts. On November 30, 2022, OpenAI unveiled the web preview of ChatGPT.
Open AI 于 2023 年展示了 GPT-4 。这家 AI 公司声称“GPT-4 可以解决更具挑战性的问题,准确度更高,这要归功于其更广泛的通用知识和先进的推理能力。”2023 年 8 月 20 日,OpenAI 推出了 DALL-E3。
Open AI Unveiled GPT-4 in 2023. The AI company claims that "GPT-4 can solve challenging problems with greater accuracy, thanks to its broader general knowledge and advanced reasoning capabilities." On August 20, 2023, OpenAI launched DALL-E3.
2023 年 3 月,谷歌在其 LaMDA 引擎的基础上发布了 Bard chat service 。但 2024 年 2 月 8 日, Google rebranded the Bard chatbot as Gemini 。
In March 2023, Google released the Bard chat service based on its LaMDA engine. But, on February 8, 2024, Google rebranded the Bard chatbot as Gemini.