AI Concepts
Spring AI 利用各种概念来实现其 AI 功能,包括:
-
模型:处理和生成信息的算法,接受各种输入并产生预测或输出。
-
提示:基于语言的输入,指导模型生成特定的输出,涉及建立请求上下文和填充模板。
-
嵌入:将文本转换为数字向量,使模型能够处理语言数据。
-
标记:模型处理单词和生成输出的基本单位,数量决定了费用。
-
输出解析:将模型输出字符串解析为数据结构以进行应用程序集成。
-
数据集成:通过微调、提示填充和函数调用将特定领域的知识整合到模型中。
-
检索增强生成 (RAG):将相关数据合并到提示中以提高响应准确性的技术。
-
函数调用:允许模型连接到外部系统,提供实时数据和执行数据处理操作。
-
评估响应:使用预训练模型本身分析和评估模型输出的质量,以确保准确性和实用性。
本部分描述了 Spring AI 使用的核心概念。我们建议仔细阅读它,以了解 Spring AI 实现背后的思想。
This section describes core concepts that Spring AI uses. We recommend reading it closely to understand the ideas behind how Spring AI is implemented.
Models
AI 模型是设计用于处理和生成信息的算法,通常模仿人类认知功能。通过学习大量数据集中的模式和见解,这些模型可以进行预测、文本、图像或其他输出,从而增强各行业的各种应用程序。
AI models are algorithms designed to process and generate information, often mimicking human cognitive functions. By learning patterns and insights from large datasets, these models can make predictions, text, images, or other outputs, enhancing various applications across industries.
存在许多不同类型的 AI 模型,每种模型都适用于特定的用例。虽然 ChatGPT 及其生成式 AI 功能通过文本输入和输出吸引了用户,但许多模型和公司提供各种各样的输入和输出。在 ChatGPT 之前,许多人对 Midjourney 和 Stable Diffusion 等文本到图像生成模型着迷。
There are many different types of AI models, each suited for a specific use case. While ChatGPT and its generative AI capabilities have captivated users through text input and output, many models and companies offer diverse inputs and outputs. Before ChatGPT, many people were fascinated by text-to-image generation models such as Midjourney and Stable Diffusion.
下表根据输入和输出类型对几种模型进行了分类:
The following table categorizes several models based on their input and output types:
Input | Output | Examples |
---|---|---|
Language/Code/Images (Multi-Modal) |
Language/Code |
GPT4 - OpenAI, Google Gemini |
Language/Code |
Language/Code |
GPT 3.5 - OpenAI-Azure OpenAI, Google Bard, Meta Llama |
Language |
Image |
Dall-E - OpenAI + Azure, Deep AI |
Language/Image |
Image |
Midjourney, Stable Diffusion, RunwayML |
Text |
Numbers |
Many (AKA embeddings) |
Spring AI 最初专注于处理语言输入并提供语言输出的模型,最初是 OpenAI + Azure OpenAI。前一表格中的最后一行,接受文本作为输入并输出数字,更常称为嵌入文本,并表示 AI 模型中使用的内部数据结构。Spring AI 支持嵌入以支持更高级的用例。
The initial focus of Spring AI is on models that process language input and provide language output, initially OpenAI + Azure OpenAI. The last row in the previous table, which accepts text as input and outputs numbers, is more commonly known as embedding text and represents the internal data structures used in an AI model. Spring AI has support for embeddings to support more advanced use cases.
使 GPT 等模型与众不同的是它们的预训练性质,GPT 中的“P”表示 GPT-Chat 生成式预训练 Transformer。此预训练功能将 AI 转变为不需要广泛的机器学习或模型训练背景的通用开发工具。
What sets models like GPT apart is their pre-trained nature, as indicated by the "P" in GPT—Chat Generative Pre-trained Transformer. This pre-training feature transforms AI into a general developer tool that does not require an extensive machine learning or model training background.
Prompts
提示作为基于语言的输入的基础,这些输入指导 AI 模型产生特定输出。对于熟悉 ChatGPT 的人来说,提示可能看起来仅仅是发送给 API 的输入到对话框中的文本。然而,它涵盖的范围远不止这些。在许多 AI 模型中,提示的文本不仅仅是一个简单的字符串。
Prompts serve as the foundation for the language-based inputs that guide an AI model to produce specific outputs. For those familiar with ChatGPT, a prompt might seem like merely the text entered into a dialog box that is sent to the API. However, it encompasses much more than that. In many AI Models, the text for the prompt is not just a simple string.
ChatGPT 的 API 在一个提示内有多个文本输入,并且每个文本输入都被分配了一个角色。例如,有一个系统角色,它告诉模型如何执行并设置交互的上下文。还有用户角色,它通常是来自用户的输入。
ChatGPT’s API has multiple text inputs within a prompt, with each text input being assigned a role. For example, there is the system role, which tells the model how to behave and sets the context for the interaction. There is also the user role, which is typically the input from the user.
制作有效的提示既是一门艺术,也是一门科学。ChatGPT 是为人类对话设计的。这与使用类似 SQL 的东西来“'提问'”有着很大的不同。必须像与另一个人交谈一样与 AI 模型进行交流。
Crafting effective prompts is both an art and a science. ChatGPT was designed for human conversations. This is quite a departure from using something like SQL to "'ask a question.'" One must communicate with the AI model akin to conversing with another person.
这种交互方式的重要性在于,“提示工程”一词已经成为其自己的学科。有许多提高提示有效性的技术也在不断发展。花时间精心制作提示可以极大地改善最终的输出。
Such is the importance of this interaction style that the term "Prompt Engineering" has emerged as its own discipline. There is a burgeoning collection of techniques that improve the effectiveness of prompts. Investing time in crafting a prompt can drastically improve the resulting output.
分享提示已成为一种常见的做法,而且对此主题已开展积极的学术研究。作为创建有效提示有多么违反直觉的一个示例(例如,与 SQL 形成对比),一个 recent research paper 发现,你可以使用的最有效的提示之一以短语 “Take a deep breath and work on this step by step.” 开头,这应该告诉你语言为什么如此重要。我们还没有完全理解如何最有效地利用这一技术的以前迭代,例如 ChatGPT 3.5,更不用说正在开发的新版本。
Sharing prompts has become a communal practice, and there is active academic research being done on this subject. As an example of how counter-intuitive it can be to create an effective prompt (for example, contrasting with SQL), a recent research paper found that one of the most effective prompts you can use starts with the phrase, “Take a deep breath and work on this step by step.” That should give you an indication of why language is so important. We do not yet fully understand how to make the most effective use of previous iterations of this technology, such as ChatGPT 3.5, let alone new versions that are being developed.
Prompt Templates
创建有效的提示涉及建立请求的上下文,并将请求的部分用特定于用户输入的值替换。
Creating effective prompts involves establishing the context of the request and substituting parts of the request with values specific to the user’s input.
此过程使用传统的基于文本的模板引擎来创建和管理提示。Spring AI 为此使用了 OSS 库 StringTemplate。
This process uses traditional text-based template engines for prompt creation and management. Spring AI employs the OSS library StringTemplate for this purpose.
例如,考虑简单的提示模板:
For instance, consider the simple prompt template:
Tell me a {adjective} joke about {content}.
在 Spring AI 中,提示模板可以比作 Spring MVC 架构中的“'视图'”。一个模型对象,通常是 java.util.Map
,被提供用于填充模板中的占位符。“'呈现的'”字符串成为提供给 AI 模型的提示的内容。
In Spring AI, prompt templates can be likened to the "'View'" in Spring MVC architecture.
A model object, typically a java.util.Map
, is provided to populate placeholders within the template.
The "'rendered'" string becomes the content of the prompt supplied to the AI model.
发送给模型的提示的特定数据格式存在很大差异。提示最初从简单的字符串开始,现在已经演变为包含多个消息,其中每个消息中的每个字符串代表模型的不同角色。
There is considerable variability in the specific data format of the prompt sent to the model. Initially starting as simple strings, prompts have evolved to include multiple messages, where each string in each message represents a distinct role for the model.
Embeddings
嵌入将文本转换为数字阵列或向量,使 AI 模型能够处理和解释语言数据。从文本到数字并返回的转换是 AI 如何与人类语言交互和理解的关键要素。作为探索 AI 的 Java 开发人员,没有必要理解这些向量表示背后的复杂数学理论或具体实现。基本了解它们在 AI 系统中的角色和功能就足够了,尤其是当你将 AI 功能集成到你的应用程序中时。
Embeddings transform text into numerical arrays or vectors, enabling AI models to process and interpret language data. This transformation from text to numbers and back is a key element in how AI interacts with and understands human language. As a Java developer exploring AI, it’s not necessary to comprehend the intricate mathematical theories or the specific implementations behind these vector representations. A basic understanding of their role and function within AI systems suffices, particularly when you’re integrating AI functionalities into your applications.
嵌入在检索增强生成 (RAG) 模式等实际应用中尤为重要。它们能够将数据表示为语义空间中的点,这类似于欧几里得几何的 2-D 空间,但它是在更高维度。这意味着,就像欧几里德几何中的平面上的点可以根据它们的坐标接近或远离一样,在语义空间中,点的接近性反映了意义上的相似性。关于相似主题的句子在这个多维空间中被定位得更近,就像在一个图表上彼此靠近的点一样。这种接近性有助于诸如文本分类、语义搜索甚至产品推荐之类的任务,因为它允许 AI 根据其在此扩展语义景观中的“位置”来识别和分组相关概念。
Embeddings are particularly relevant in practical applications like the Retrieval Augmented Generation (RAG) pattern. They enable the representation of data as points in a semantic space, which is akin to the 2-D space of Euclidean geometry, but in higher dimensions. This means just like how points on a plane in Euclidean geometry can be close or far based on their coordinates, in a semantic space, the proximity of points reflects the similarity in meaning. Sentences about similar topics are positioned closer in this multi-dimensional space, much like points lying close to each other on a graph. This proximity aids in tasks like text classification, semantic search, and even product recommendations, as it allows the AI to discern and group related concepts based on their 'location' in this expanded semantic landscape.
你可以将这个语义空间看作一个向量。
You can think of this semantic space as a vector.
Tokens
标记是 AI 模型工作原理的基础。在输入时,模型将单词转换为标记。在输出时,它们将标记转换回单词。
Tokens serve as the building blocks of how an AI model works. On input, models convert words to tokens. On output, they convert tokens back to words.
在英语中,一个标记大致对应于一个单词的 75%。作为参考,莎士比亚全集大约有 900,000 个单词,翻译成大约 120 万个标记。
In English, one token roughly corresponds to 75% of a word. For reference, Shakespeare’s complete works, totaling around 900,000 words, translates to approximately 1.2 million tokens.
大概更重要的是标记 = $
。
Perhaps more important is that Tokens = $
.
在托管 AI 模型的上下文中,费用由所用标记的数量决定。输入和输出都会增加标记总数。
In the context of hosted AI models, your charges are determined by the number of tokens used. Both input and output contribute to the overall token count.
此外,模型受标记限制的影响,该限制限制了在单个 API 调用中处理的文本数量。此阈值通常称为“上下文窗口”。模型不会处理超过此限制的任何文本。
Also, models are subject to token limits, which restrict the amount of text processed in a single API call. This threshold is often referred to as the 'context window'. The model does not process any text that exceeds this limit.
例如,ChatGPT3 的标记限制为 4K,而 GPT4 提供不同的选项,如 8K、16K 和 32K。Anthropic 的 Claude AI 模型具有 100K 的标记限制,Meta 的最近研究产生了一个 1M 的标记限制模型。
For instance, ChatGPT3 has a 4K token limit, while GPT4 offers varying options, such as 8K, 16K, and 32K. Anthropic’s Claude AI model features a 100K token limit, and Meta’s recent research yielded a 1M token limit model.
要通过 GPT4 总结莎士比亚的收集作品,你需要设计软件工程策略来切分数据并在模型的上下文窗口限制内呈现数据。Spring AI 项目有助于你完成此任务。
To summarize the collected works of Shakespeare with GPT4, you need to devise software engineering strategies to chop up the data and present the data within the model’s context window limits. The Spring AI project helps you with this task.
Output Parsing
AI 模型的输出通常是以 java.lang.String
形式呈现的,即使你要求答复采用 JSON 格式。它可能是正确的 JSON,但它不是 JSON 数据结构。它只是一个字符串。此外,在提示中要求“for JSON
”并不 100% 准确。
The output of AI models traditionally arrives as a java.lang.String
, even if you ask for the reply to be in JSON.
It may be the correct JSON, but it is not a JSON data structure. It is just a string.
Also, asking “for JSON” as part of the prompt is not 100% accurate.
这种复杂性导致了一个专门领域的出现,该领域涉及创建提示以产生预期输出,然后将结果简单字符串解析为应用程序集成的可用数据结构。
This intricacy has led to the emergence of a specialized field involving the creation of prompts to yield the intended output, followed by parsing the resulting simple string into a usable data structure for application integration.
输出解析采用精心制作的提示,通常需要与模型进行多次交互以达到所需的格式。
Output parsing employs meticulously crafted prompts, often necessitating multiple interactions with the model to achieve the desired formatting.
这一挑战促使 OpenAI 引入了“OpenAI Functions”,作为一种精确指定模型的所需输出格式的方法。
This challenge has prompted OpenAI to introduce 'OpenAI Functions' as a means to specify the desired output format from the model precisely.
Bringing Your Data to the AI model
你如何用 AI 模型不知道的信息来装备 AI 模型?
How can you equip the AI model with information on which it has not been trained?
请注意,GPT 3.5/4.0 数据集仅延展到 2021 年 9 月。因此,该模型表示它不知道需要该日期之后知识的问题的答案。有趣的是,这个数据集大约有 650GB。
Note that the GPT 3.5/4.0 dataset extends only until September 2021. Consequently, the model says that it does not know the answer to questions that require knowledge beyond that date. An interesting bit of trivia is that this dataset is around 650GB.
有三种技术可用于自定义 AI 模型以纳入你的数据:
Three techniques exist for customizing the AI model to incorporate your data:
-
Fine Tuning
: This traditional machine learning technique involves tailoring the model and changing its internal weighting. However, it is a challenging process for machine learning experts and extremely resource-intensive for models like GPT due to their size. Additionally, some models might not offer this option. -
Prompt Stuffing
: A more practical alternative involves embedding your data within the prompt provided to the model. Given a model’s token limits, techniques are required to present relevant data within the model’s context window. This approach is colloquially referred to as “stuffing the prompt.” The Spring AI library helps you implement solutions based on the “stuffing the prompt” technique otherwise known as Retrieval Augmented Generation (RAG). -
Function Calling
: This technique allows registering custom, user functions that connect the large language models to the APIs of external systems. Spring AI greatly simplifies code you need to write to support function calling.
Retrieval Augmented Generation
一种称为检索增强生成 (RAG) 的技术已经出现,以解决将相关数据合并到提示中以获得准确的 AI 模型响应的挑战。
A technique termed Retrieval Augmented Generation (RAG) has emerged to address the challenge of incorporating relevant data into prompts for accurate AI model responses.
该方法涉及一个批处理样式的编程模型,其中作业会从你的文档中读取非结构化数据,转换数据,然后将其写入向量数据库。在较高层面上,这是一个 ETL(提取、转换和加载)管道。向量数据库用于 RAG 技术的检索部分。
The approach involves a batch processing style programming model, where the job reads unstructured data from your documents, transforms it, and then writes it into a vector database. At a high level, this is an ETL (Extract, Transform and Load) pipeline. The vector database is used in the retrieval part of RAG technique.
将非结构化数据加载到向量数据库时,最重要的转换之一是将原始文档拆分为更小的部分。将原始文档拆分为更小的部分的过程有两个重要的步骤:
As part of loading the unstructured data into the vector database, one of the most important transformations is to split the original document into smaller pieces. The procedure of splitting the original document into smaller pieces has two important steps:
-
Split the document into parts while preserving the semantic boundaries of the content. For example, for a document with paragraphs and tables, one should avoid splitting the document in the middle of a paragraph or table. For code, avoid splitting the code in the middle of a method’s implementation.
-
Split the document’s parts further into parts whose size is a small percentage of the AI Model’s token limit.
RAG 的下一阶段是处理用户输入。当用户的问题要由 AI 模型回答时,问题和所有“类似的
”文档片段都将放入发送到 AI 模型的提示中。这就是使用向量数据库的原因。它非常擅长查找相似的内容。
The next phase in RAG is processing user input. When a user’s question is to be answered by an AI model, the question and all the “similar” document pieces are placed into the prompt that is sent to the AI model. This is the reason to use a vector database. It is very good at finding similar content.
有几个概念用于实现 RAG。这些概念映射到 Spring AI 中的类:
There are several concepts that are used in implementing RAG. The concepts map onto classes in Spring AI:
-
DocumentReader
: A Java functional interface that is responsible for loading aList<Document>
from a data source. Common data sources are PDF, Markdown, and JSON. -
Document
: A text-based representation of your data source that also contains metadata to describe the contents. -
DocumentTransformer
: Responsible for processing the data in various ways (for example, splitting documents into smaller pieces or adding additional metadata to theDocument
). -
DocumentWriter
: Lets you persist the Documents into a database (most commonly in the AI stack, a vector database). -
Embedding
: A representation of your data as aList<Double>
that is used by the vector database to compute the “similarity” of a user’s query to relevant documents.
Function Calling
大语言模型 (LLM) 在训练后会被冻结,从而导致知识陈旧,并且它们无法访问或修改外部数据。
Large Language Models (LLMs) are frozen after training, leading to stale knowledge and they are unable to access or modify external data.
Function Calling
机制解决了这些缺点。它允许注册自定义用户函数,这些函数将大语言模型连接到外部系统的 API。这些系统可以向 LLM 提供实时数据,并代表 LLM 执行数据处理操作。
The Function Calling
mechanism addresses these shortcomings.
It allows you register custom, user, functions that connect the large language models to the APIs of external systems.
These systems can provide LLMs with real-time data and perform data processing actions on their behalf.
Spring AI 大大简化了为支持函数调用所需编写的代码。它为您处理函数调用对话。您可以提供您的函数作为 @Bean
,然后在提示选项中提供函数的 Bean 名称以激活该函数。您还可以在单个提示中定义和引用多个函数。
Spring AI greatly simplifies code you need to write to support function invocation.
It brokers the function invocation conversation for you.
You can provide your function as a @Bean
and then provide the bean name of the function in your prompt options to activate that function.
You can also define and reference multiple functions in a single prompt.
Evaluating AI responses
有效评估人工智能系统在响应用户请求时的输出对于确保最终应用程序的准确性和实用性非常重要。几种新兴技术使能够为此目的使用预训练模型本身。
Effectively evaluating the output of an AI system in response to user requests is very important to ensuring the accuracy and usefulness of the final application. Several emerging techniques enable the use of the pre-trained model itself for this purpose.
此评估过程涉及分析生成响应是否与用户的意图和查询上下文保持一致。诸如关联性、连贯性和事实正确性之类的指标用于衡量人工智能生成响应的质量。
This evaluation process involves analyzing whether the generated response aligns with the user’s intent and the context of the query. Metrics such as relevance, coherence, and factual correctness are used to gauge the quality of the AI-generated response.
一种方法包括向模型展示用户的请求和人工智能模型的响应,查询响应是否与提供的数据一致。
One approach involves presenting both the user’s request and the AI model’s response to the model, querying whether the response aligns with the provided data.
此外,利用存储在矢量数据库中的信息作为补充数据可以增强评估过程,帮助确定响应的相关性。
Furthermore, leveraging the information stored in the vector database as supplementary data can enhance the evaluation process, aiding in the determination of response relevance.
Spring AI 项目目前提供了一些非常基本的示例,说明如何以提示的形式评估要在 JUnit 测试中包含的响应。
The Spring AI project currently provides some very basic examples of how you can evaluate the responses in the form of prompts to include in a JUnit test.