Data Structures Algorithms 简明教程

Set Cover Problem

*集合覆盖算法为许多现实世界的资源分配问题提供了解决方案。举例来说，考虑一下航空公司为每一架飞机分配机组人员，以确保他们有足够的人员来满足旅程的要求。他们考虑的是飞行时间、持续时间、短途停留以及机组人员的可用性，以便将他们分配到航班上。集合覆盖算法就是这样产生的。

The set cover algorithm provides solution to many real-world resource allocating problems. For instance, consider an airline assigning crew members to each of their airplanes such that they have enough people to fulfill the requirements for the journey. They take into account the flight timings, the duration, the pit-stops, availability of the crew to assign them to the flights. This is where set cover algorithm comes into picture.

给定一个包含一些元素的全集 U，这些元素都被划分为子集。考虑到这些子集的集合为 S = {S1、S2、S3、S4…Sn}，集合覆盖算法找到最少的子集数，以便它们覆盖全集中的所有元素。

Given a universal set U, containing few elements which are all divided into subsets. Considering the collection of these subsets as S = {S1, S2, S3, S4… Sn}, the set cover algorithm finds the minimum number of subsets such that they cover all the elements present in the universal set.

如上图所示，点表示存在于全集 U 中的元素，这些元素被划分为不同的集合，S = {S1、S2、S3、S4、S5、S6}。需要选择以覆盖所有元素的最少数目集合将是最佳输出 = {S1、S2、S3}。

As shown in the above diagram, the dots represent the elements present in the universal set U that are divided into different sets, S = {S1, S2, S3, S4, S5, S6}. The minimum number of sets that need to be selected to cover all the elements will be the optimal output = {S1, S2, S3}.

Set Cover Algorithm

集合覆盖将集合的集合作为输入，并返回包含所有全集元素所需的最小集合数量。

The set cover takes the collection of sets as an input and and returns the minimum number of sets required to include all the universal elements.

*集合覆盖算法是一个 NP-Hard 问题和一个 2-逼近贪心算法。

The set cover algorithm is an NP-Hard problem and a 2-approximation greedy algorithm.

Algorithm

步骤 1- 初始化 Output = {}，其中 Output 表示元素的输出集合。

*Step 1 *− Initialize Output = {} where Output represents the output set of elements.

步骤 2− 虽然输出集未包括泛型集合中的所有元素，但请执行以下操作 −

*Step 2 *− While the Output set does not include all the elements in the universal set, do the following −

Find the cost-effectiveness of every subset present in the universal set using the formula, $\frac{Cost\left ( S_{i} \right )}{S_{i}-Output}$
Find the subset with minimum cost effectiveness for each iteration performed. Add the subset to the Output set.

步骤 3− 重复步骤 2，直到泛型集中没有剩余元素。获得的输出为最终的输出集。

*Step 3 *− Repeat Step 2 until there is no elements left in the universe. The output achieved is the final Output set.

Pseudocode

APPROX-GREEDY-SET_COVER(X, S)
   U = X
   OUTPUT = ф
   while U ≠ ф
      select Si Є S which has maximum |Si∩U|
   U = U – S
   OUTPUT = OUTPUT∪ {Si}
return OUTPUT

Analysis

假设元素的总体数量等于集合的总体数量（| X | = | S |），则代码运行时间为 O(| X | 3)

assuming the overall number of elements equals the overall number of sets (|X| = |S|), the code runs in time O(|X|3)

Example

让我们详细了解一个示例，该示例描述了集合覆盖问题的近似算法

Let us look at an example that describes the approximation algorithm for the set covering problem in more detail

S1 = {1, 2, 3, 4}                cost(S1) = 5
S2 = {2, 4, 5, 8, 10}            cost(S2) = 10
S3 = {1, 3, 5, 7, 9, 11, 13}     cost(S3) = 20
S4 = {4, 8, 12, 16, 20}          cost(S4) = 12
S5 = {5, 6, 7, 8, 9}             cost(S5) = 15

Step 1

输出集，输出 = ∅

The output set, Output = ф

为输出集中不存在元素的每个集合找到成本效益，

Find the cost effectiveness of each set for no elements in the output set,

S1 = cost(S1) / (S1 – Output) = 5 / (4 – 0)
S2 = cost(S2) / (S2 – Output) = 10 / (5 – 0)
S3 = cost(S3) / (S3 – Output) = 20 / (7 – 0)
S4 = cost(S4) / (S4 – Output) = 12 / (5 – 0)
S5 = cost(S5) / (S5 – Output) = 15 / (5 – 0)

此迭代中的最低成本效益在 S1 上实现，因此，添加到输出集的子集，输出 = {S1}，包含元素 {1、2、3、4}

The minimum cost effectiveness in this iteration is achieved at S1, therefore, the subset added to the output set, Output = {S1} with elements {1, 2, 3, 4}

Step 2

为输出集中新元素找到每个集合的成本效益，

Find the cost effectiveness of each set for the new elements in the output set,

S2 = cost(S2) / (S2 – Output) = 10 / (5 – 4)
S3 = cost(S3) / (S3 – Output) = 20 / (7 – 4)
S4 = cost(S4) / (S4 – Output) = 12 / (5 – 4)
S5 = cost(S5) / (S5 – Output) = 15 / (5 – 4)

此迭代中的最低成本效益在 S3 上实现，因此，添加到输出集的子集，输出 = {S1、S3}，包含元素 {1、2、3、4、5、7、9、11、13}。

The minimum cost effectiveness in this iteration is achieved at S3, therefore, the subset added to the output set, Output = {S1, S3} with elements {1, 2, 3, 4, 5, 7, 9, 11, 13}.

Step 3

为输出集中新元素找到每个集合的成本效益，

Find the cost effectiveness of each set for the new elements in the output set,

S2 = cost(S2) / (S2 – Output) = 10 / |(5 – 9)|
S4 = cost(S4) / (S4 – Output) = 12 / |(5 – 9)|
S5 = cost(S5) / (S5 – Output) = 15 / |(5 – 9)|

此迭代中的最低成本效益在 S2 上实现，因此，添加到输出集的子集，输出 = {S1、S3、S2}，包含元素 {1、2、3、4、5、7、8、9、10、11、13}

The minimum cost effectiveness in this iteration is achieved at S2, therefore, the subset added to the output set, Output = {S1, S3, S2} with elements {1, 2, 3, 4, 5, 7, 8, 9, 10, 11, 13}

Step 4

为输出集中新元素找到每个集合的成本效益，

Find the cost effectiveness of each set for the new elements in the output set,

S4 = cost(S4) / (S4 – Output) = 12 / |(5 – 11)|
S5 = cost(S5) / (S5 – Output) = 15 / |(5 – 11)|

此迭代中的最低成本效益在 S4 上实现，因此，添加到输出集的子集，输出 = {S1、S3、S2、S4}，包含元素 {1、2、3、4、5、7、8、9、10、11、12、13、16、20}

The minimum cost effectiveness in this iteration is achieved at S4, therefore, the subset added to the output set, Output = {S1, S3, S2, S4} with elements {1, 2, 3, 4, 5, 7, 8, 9, 10, 11, 12, 13, 16, 20}

Step 5

为输出集中新元素找到每个集合的成本效益，

Find the cost effectiveness of each set for the new elements in the output set,

S5 = cost(S5) / (S5 – Output) = 15 / |(5 – 14)|

此迭代中的最低成本效益在 S5 上实现，因此，添加到输出集的子集，输出 = {S1、S3、S2、S4、S5}，包含元素 {1、2、3、4、5、6、7、8、9、10、11、12、13、16、20}

The minimum cost effectiveness in this iteration is achieved at S5, therefore, the subset added to the output set, Output = {S1, S3, S2, S4, S5} with elements {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 16, 20}

覆盖泛有限集合中所有元素的最终输出为，输出 = {S1、S3、S2、S4、S5}。

The final output that covers all the elements present in the universal finite set is, Output = {S1, S3, S2, S4, S5}.

Implementation

以下是以上方法在各种编程语言中的实现 −

Following are the implementations of the above approach in various programming langauges −