Data Structures Algorithms 简明教程

Interpolation Search Algorithm

插值搜索是二分搜索的改进变体。此搜索算法基于所需值的探查位置来工作。为了使此算法正常工作，数据集应以排序形式且均匀分布。

与线性搜索相比，二分搜索在时间复杂度方面具有巨大优势。线性搜索的最坏情况复杂度为 Ο(n)，而二分搜索为 Ο(log n)。

在某些情况下，可能预先知道目标数据的位置。例如，在电话簿的情况下，如果我们想搜索“Morpheus”的电话号码。在此，线性搜索甚至二分搜索看起来会很慢，因为我们可以直接跳转到存储以“M”开头的名字的内存空间。

Positioning in Binary Search

在二分搜索中，如果找不到所需数据，则将列表的其余部分分成两部分：较低和较高。搜索在其中任何一个部分中进行。

即使对数据进行了排序，二分搜索也不会利用这一点来探测所需数据的位置。

Position Probing in Interpolation Search

插值搜索通过计算探测位置来查找特定项。最初，探测位置是集合中的中间项的位置。

如果发生匹配，则返回该项的索引。为了将列表分成两部分，我们使用以下方法：

mid\, = \,Lo\, + \,\\frac{\left ( Hi\, - \,Lo \right )\ast \left ( X\, - \,A\left [ Lo \right ] \right )}{A\left [ Hi \right ]\, - \,A\left [ Lo \right ]}

其中：

A = list
Lo = Lowest index of the list
Hi = Highest index of the list
A[n] = Value stored at index n in the list

如果中间项大于该项，则再次计算探测位置在中间项右侧的子数组中。否则，在中间项左侧的子数组中搜索该项。此过程也在子数组上继续，直到子数组的大小减小为零。

Interpolation Search Algorithm

由于它是现有 BST 算法的改进，我们提到了使用位置探测搜索“目标”数据值索引的步骤：

1. Start searching data from middle of the list.
2. If it is a match, return the index of the item, and exit.
3. If it is not a match, probe position.
4. Divide the list using probing formula and find the new middle.
5. If data is greater than middle, search in higher sub-list.
6. If data is smaller than middle, search in lower sub-list.
7. Repeat until match.

Pseudocode

A → Array list
N → Size of A
X → Target Value

Procedure Interpolation_Search()

   Set Lo → 0
   Set Mid → -1
   Set Hi → N-1

   While X does not match
      if Lo equals to Hi OR A[Lo] equals to A[Hi]
         EXIT: Failure, Target not found
      end if

      Set Mid = Lo + ((Hi - Lo) / (A[Hi] - A[Lo])) * (X - A[Lo])

      if A[Mid] = X
         EXIT: Success, Target found at Mid
      else
         if A[Mid] < X
            Set Lo to Mid+1
         else if A[Mid] > X
            Set Hi to Mid-1
         end if
      end if
   End While
End Procedure

Analysis

在有利的情况下，插值搜索算法的运行时复杂度是 Ο(log (log n)) ，而 BST 的运行时复杂度是 Ο(log n) 。

Example

要理解插值搜索涉及的分步过程，让我们看一个例子并围绕其展开。

考虑一个排好序的元素数组，如下所示 -

让我们搜索元素 19。

Solution

与二分搜索不同，这种方法中的中间点是使用以下公式选择的 -

mid\, = \,Lo\, + \,\\frac{\left ( Hi\, - \,Lo \right )\ast \left ( X\, - \,A\left [ Lo \right ] \right )}{A\left [ Hi \right ]\, - \,A\left [ Lo \right ]}

所以在给定的这个数组输入中，

Lo = 0, A[Lo] = 10
Hi = 9, A[Hi] = 44
X = 19

应用公式以查找列表中的中间点，我们得到

mid = 0 + 9 - 0)*(19 - 10/(44 - 10)

mid = (9*9)/34

mid = 81/34 = 2.38

因为 mid 是一个索引值，我们仅考虑小数的整数部分。也就是说，mid = 2。

将给定的键元素（即 19）与 mid 索引中存在的元素进行比较，发现两个元素相匹配。

因此，在索引 2 处找到了这个元素。

Implementation

插值搜索是二分搜索的改进变种。此搜索算法对所需值的探查位置起作用。为了使此算法正常工作，数据集合应以排序且分布均匀的形式出现。