Understanding Vectors

vector 2d coordinates

矢量有维数和方向。例如,下图显示了笛卡尔坐标系中的二维矢量 \vec{a},它被描绘成一个箭头。

Vectors have dimensionality and a direction. For example, the following image depicts a two-dimensional vector \vec{a} in the cartesian coordinate system pictured as an arrow.

矢量 \vec{a} 的头部位于点 (a_1, a_2)。x 坐标值为 a_1,y 坐标值为 a_2。这些坐标也被称为向量的分量。

The head of the vector \vec{a} is at the point (a_1, a_2). The x coordinate value is a_1 and the y coordinate value is a_2. The coordinates are also referred to as the components of the vector.

Similarity

可以应用几个数学公式来确定两个矢量是否相似。最直观、最易于理解的公式之一是余弦相似度。考虑一下以下图片,它们显示了三组图形:

Several mathematical formulas can be used to determine if two vectors are similar. One of the most intuitive to visualize and understand is cosine similarity. Consider the following images that show three sets of graphs:

vector similarity

当两个矢量彼此靠近时,就认为它们是相似的,就像第一个图表中所示的。当它们彼此垂直时,则认为这些矢量无关,当它们相互远离时,则认为它们相反。

The vectors \vec{A} and \vec{B} are considered similar, when they are pointing close to each other, as in the first diagram. The vectors are considered unrelated when pointing perpendicular to each other and opposite when they point away from each other.

它们之间的角度 \theta 是衡量相似度的一个很好的指标。如何计算角度 \theta?

The angle between them, \theta, is a good measure of their similarity. How can the angle \theta be computed?

我们都熟悉 Pythagorean Theorem

We are all familiar with the Pythagorean Theorem.

当*a*和*b*之间的角度不是90度时的情况如何呢?

What about when the angle between a and b is not 90 degrees?

输入 Law of cosines

Enter the Law of cosines.

Law of Cosines

a^2 + b^2 - 2ab\cos\theta = c^2

下图显示了这种方法,展示为矢量图:lawofcosines

The following image shows this approach as a vector diagram: lawofcosines

此向量的幅度由其分量定义为:

The magnitude of this vector is defined in terms of its components as:

Magnitude

\vec{A} * \vec{A} = ||\vec{A}||^2 = A_1^2 + A_2^2

\vec{A} * \vec{A} = ||\vec{A}||^2 = A_1^2 + A_2^2

两个向量\vec{A}和\vec{B}之间的点积由其分量定义为:

The dot product between two vectors \vec{A} and \vec{B} is defined in terms of its components as:

Dot Product

\vec{A} * \vec{B} = A_1B_1 + A_2B_2

使用向量幅度和点积重新编写余弦定理,得到以下结果:

Rewriting the Law of Cosines with vector magnitudes and dot products gives the following:

Law of Cosines in Vector form

||\vec{A}||^2 + ||\vec{B}||^2 - 2||\vec{A}||||\vec{B}||\cos\theta = ||\vec{C}||^2

将||\vec{C}||^2替换为||\vec{B} - \vec{A}||^2,得到以下结果:

Replacing ||\vec{C}||^2 with ||\vec{B} - \vec{A}||^2 gives the following:

Law of Cosines in Vector form only in terms of \vec{A} and \vec{B}

||\vec{A}||^2 + ||\vec{B}||^2 - 2||\vec{A}||||\vec{B}||\cos\theta = ||\vec{B} - \vec{A}||^2

Expanding this out为我们提供了 Cosine Similarity的公式。

Expanding this out gives us the formula for Cosine Similarity.

Cosine Similarity

相似性(vec{A},vec{B}) = \cos(\theta) = \frac{\vec{A}\cdot\vec{B}}{||\vec{A}\||\cdot||\vec{B}||

similarity(vec{A},vec{B}) = \cos(\theta) = \frac{\vec{A}\cdot\vec{B}}{||\vec{A}\||\cdot||\vec{B}||

此公式适用于大于 2 或 3 的维数,尽管很难可视化。但是, it can be visualized to some extent。在 AI/ML 应用程序中,具有数百甚至数千个维度的向量很常见。

This formula works for dimensions higher than 2 or 3, though it is hard to visualize. However, it can be visualized to some extent. It is common for vectors in AI/ML applications to have hundreds or even thousands of dimensions.

下面显示使用向量的分量的更高维度中的相似度函数。它使用 Summation mathematical syntax将以前给出的“Magnitude”和“Dot Product”的二维定义扩展到 *N*维度。

The similarity function in higher dimensions using the components of the vector is shown below. It expands the two-dimensional definitions of Magnitude and Dot Product given previously to N dimensions by using Summation mathematical syntax.

Cosine Similarity with vector components

相似性(vec{A},vec{B}) = \cos(\theta) = \frac{ \sum_{i=1}^{n} {A_i B_i} }{ \sqrt{\sum_{i=1}{n}{A_i2} \cdot \sum_{i=1}{n}{B_i2}}

similarity(vec{A},vec{B}) = \cos(\theta) = \frac{ \sum_{i=1}^{n} {A_i B_i} }{ \sqrt{\sum_{i=1}{n}{A_i2} \cdot \sum_{i=1}{n}{B_i2}}

这是向量存储的简单实现中使用的一项关键公式,可以在`InMemoryVectorStore`实现中找到。

This is the key formula used in the simple implementation of a vector store and can be found in the InMemoryVectorStore implementation.