Artificial Intelligence With Python 简明教程

AI with Python – Genetic Algorithms

本章详细讨论了人工智能的遗传算法。

This chapter discusses Genetic Algorithms of AI in detail.

What are Genetic Algorithms?

遗传算法 (GA) 是基于自然选择和遗传学概念的基于搜索的算法。GA 是计算的一个更大分支的子集,称为进化计算。

Genetic Algorithms (GAs) are search based algorithms based on the concepts of natural selection and genetics. GAs are a subset of a much larger branch of computation known as Evolutionary Computation.

GA 由 John Holland 及密歇根大学的学生和同事(最著名的是 David E. Goldberg)开发的,此后已在各种优化问题上进行尝试,并获得了高度成功。

GAs were developed by John Holland and his students and colleagues at the University of Michigan, most notably David E. Goldberg. It has since been tried on various optimization problems with a high degree of success.

在 GA 中,针对给定的问题有一组可能的解决方案。然后,这些解决方案经过重新组合和变异(就像自然遗传中一样),产生新的子代,并且该过程会重复进行多代。每个个体(或候选解决方案)被分配一个适应度值(基于其目标函数值),并且适应度较高的个体更有可能配对并产生子个体。这与达尔文的进化论相符。

In GAs, we have a pool of possible solutions to the given problem. These solutions then undergo recombination and mutation (like in natural genetics), produces new children, and the process is repeated for various generations. Each individual (or candidate solution) is assigned a fitness value (based on its objective function value) and the fitter individuals are given a higher chance to mate and yield fitter individuals. This is in line with the Darwinian Theory of Survival of the Fittest.

因此,它会在多代中保留适应度更好的个体或解决方案,直到达到停止条件。

Thus, it keeps evolving better individuals or solutions over generations, till it reaches a stopping criterion.

遗传算法在本质上是足够随机的,但它们的性能比随机局部搜索(我们只尝试随机解决方案,跟踪到目前为止的最佳解决方案)好得多,因为它们也利用了历史信息。

Genetic Algorithms are sufficiently randomized in nature, but they perform much better than random local search (where we just try random solutions, keeping track of the best so far), as they exploit historical information as well.

How to Use GA for Optimization Problems?

优化就是对设计、情境、资源和系统进行操作,使其尽可能有效。下图块图显示了优化流程 -

Optimization is an action of making design, situation, resource and system, as effective as possible. The following block diagram shows the optimization process −

optimization problems

Stages of GA mechanism for optimization process

当用于问题优化时,以下是 GA 机制的步骤序列。

The following is a sequence of steps of GA mechanism when used for optimization of problems.

  1. Step 1 − Generate the initial population randomly.

  2. Step 2 − Select the initial solution with best fitness values.

  3. Step 3 − Recombine the selected solutions using mutation and crossover operators.

  4. Step 4 − Insert an offspring into the population.

  5. Step 5 − Now, if the stop condition is met, return the solution with their best fitness value. Else go to step 2.

Installing Necessary Packages

为了使用 Python 中的遗传算法解决问题,我们将使用一个名为 deap 的强大 GA 软件包。它是一个新颖进化计算框架的库,用于快速原型制作和测试创意。我们可以借助命令提示符上的以下命令安装此软件包 -

For solving the problem by using Genetic Algorithms in Python, we are going to use a powerful package for GA called DEAP. It is a library of novel evolutionary computation framework for rapid prototyping and testing of ideas. We can install this package with the help of the following command on command prompt −

pip install deap

如果您使用的是 iPython 环境,则可以使用以下命令安装 deap -

If you are using anaconda environment, then following command can be used to install deap −

conda install -c conda-forge deap

Implementing Solutions using Genetic Algorithms

本节将向您解释使用遗传算法实现解决方案的方法。

This section explains you the implementation of solutions using Genetic Algorithms.

Generating bit patterns

以下示例向您展示如何生成一个位字符串,该字符串将基于 0-1 背包问题包含 15 个 1。

The following example shows you how to generate a bit string that would contain 15 ones, based on the One Max problem.

导入必要的包,如下所示

Import the necessary packages as shown −

import random
from deap import base, creator, tools

定义评估函数。这是创建遗传算法的第一步。

Define the evaluation function. It is the first step to create a genetic algorithm.

def eval_func(individual):
   target_sum = 15
   return len(individual) - abs(sum(individual) - target_sum),

现在,使用正确参数创建工具箱 -

Now, create the toolbox with the right parameters −

def create_toolbox(num_bits):
   creator.create("FitnessMax", base.Fitness, weights=(1.0,))
   creator.create("Individual", list, fitness=creator.FitnessMax)

初始化工具箱

Initialize the toolbox

   toolbox = base.Toolbox()
toolbox.register("attr_bool", random.randint, 0, 1)
toolbox.register("individual", tools.initRepeat, creator.Individual,
   toolbox.attr_bool, num_bits)
toolbox.register("population", tools.initRepeat, list, toolbox.individual)

注册评估运算符 −

Register the evaluation operator −

toolbox.register("evaluate", eval_func)

现在,注册交叉运算符 −

Now, register the crossover operator −

toolbox.register("mate", tools.cxTwoPoint)

注册变异运算符 −

Register a mutation operator −

toolbox.register("mutate", tools.mutFlipBit, indpb = 0.05)

定义育种运算符 −

Define the operator for breeding −

toolbox.register("select", tools.selTournament, tournsize = 3)
return toolbox
if __name__ == "__main__":
   num_bits = 45
   toolbox = create_toolbox(num_bits)
   random.seed(7)
   population = toolbox.population(n = 500)
   probab_crossing, probab_mutating = 0.5, 0.2
   num_generations = 10
   print('\nEvolution process starts')

评估整个种群 −

Evaluate the entire population −

fitnesses = list(map(toolbox.evaluate, population))
for ind, fit in zip(population, fitnesses):
   ind.fitness.values = fit
print('\nEvaluated', len(population), 'individuals')

创建和遍历世代 −

Create and iterate through generations −

for g in range(num_generations):
   print("\n- Generation", g)

选择下一代个体 −

Selecting the next generation individuals −

offspring = toolbox.select(population, len(population))

现在,克隆选定的个体 −

Now, clone the selected individuals −

offspring = list(map(toolbox.clone, offspring))

在后代上应用交叉和变异 −

Apply crossover and mutation on the offspring −

for child1, child2 in zip(offspring[::2], offspring[1::2]):
   if random.random() < probab_crossing:
   toolbox.mate(child1, child2)

删除子代的适应度值

Delete the fitness value of child

del child1.fitness.values
del child2.fitness.values

现在,应用变异 −

Now, apply mutation −

for mutant in offspring:
   if random.random() < probab_mutating:
   toolbox.mutate(mutant)
   del mutant.fitness.values

评估具有无效适应度的个体 −

Evaluate the individuals with an invalid fitness −

invalid_ind = [ind for ind in offspring if not ind.fitness.valid]
fitnesses = map(toolbox.evaluate, invalid_ind)
for ind, fit in zip(invalid_ind, fitnesses):
   ind.fitness.values = fit
print('Evaluated', len(invalid_ind), 'individuals')

现在,用下一代个体替换种群 −

Now, replace population with next generation individual −

population[:] = offspring

打印当前世代的统计数据 −

Print the statistics for the current generations −

fits = [ind.fitness.values[0] for ind in population]
length = len(population)
mean = sum(fits) / length
sum2 = sum(x*x for x in fits)
std = abs(sum2 / length - mean**2)**0.5
print('Min =', min(fits), ', Max =', max(fits))
print('Average =', round(mean, 2), ', Standard deviation =',
round(std, 2))
print("\n- Evolution ends")

打印最终输出 −

Print the final output −

   best_ind = tools.selBest(population, 1)[0]
   print('\nBest individual:\n', best_ind)
   print('\nNumber of ones:', sum(best_ind))
Following would be the output:
Evolution process starts
Evaluated 500 individuals
- Generation 0
Evaluated 295 individuals
Min = 32.0 , Max = 45.0
Average = 40.29 , Standard deviation = 2.61
- Generation 1
Evaluated 292 individuals
Min = 34.0 , Max = 45.0
Average = 42.35 , Standard deviation = 1.91
- Generation 2
Evaluated 277 individuals
Min = 37.0 , Max = 45.0
Average = 43.39 , Standard deviation = 1.46
… … … …
- Generation 9
Evaluated 299 individuals
Min = 40.0 , Max = 45.0
Average = 44.12 , Standard deviation = 1.11
- Evolution ends
Best individual:
[0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1,
 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0,
 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1]
Number of ones: 15

Symbol Regression Problem

它是遗传规划中最著名的问题之一。所有符号回归问题都使用任意数据分布,并尝试用符号公式拟合最准确的数据。通常,像均方根误差 (RMSE) 这样的度量用于衡量个体的适应度。这是一个经典的回归器问题,我们这里使用方程式 5x3-6x2+8x=1 。我们需要按照上述示例中的所有步骤进行操作,但主要部分是创建基元集,因为它们是构成个体的基本要素,因此评估可以开始。在这里,我们将使用经典基元集。

It is one of the best known problems in genetic programming. All symbolic regression problems use an arbitrary data distribution, and try to fit the most accurate data with a symbolic formula. Usually, a measure like the RMSE (Root Mean Square Error) is used to measure an individual’s fitness. It is a classic regressor problem and here we are using the equation 5x3-6x2+8x=1. We need to follow all the steps as followed in the above example, but the main part would be to create the primitive sets because they are the building blocks for the individuals so the evaluation can start. Here we will be using the classic set of primitives.

以下 Python 代码对此进行了详细解释 −

The following Python code explains this in detail −

import operator
import math
import random
import numpy as np
from deap import algorithms, base, creator, tools, gp
def division_operator(numerator, denominator):
   if denominator == 0:
      return 1
   return numerator / denominator
def eval_func(individual, points):
   func = toolbox.compile(expr=individual)
   return math.fsum(mse) / len(points),
def create_toolbox():
   pset = gp.PrimitiveSet("MAIN", 1)
   pset.addPrimitive(operator.add, 2)
   pset.addPrimitive(operator.sub, 2)
   pset.addPrimitive(operator.mul, 2)
   pset.addPrimitive(division_operator, 2)
   pset.addPrimitive(operator.neg, 1)
   pset.addPrimitive(math.cos, 1)
   pset.addPrimitive(math.sin, 1)
   pset.addEphemeralConstant("rand101", lambda: random.randint(-1,1))
   pset.renameArguments(ARG0 = 'x')
   creator.create("FitnessMin", base.Fitness, weights = (-1.0,))
   creator.create("Individual",gp.PrimitiveTree,fitness=creator.FitnessMin)
   toolbox = base.Toolbox()
   toolbox.register("expr", gp.genHalfAndHalf, pset=pset, min_=1, max_=2)
   toolbox.expr)
   toolbox.register("population",tools.initRepeat,list, toolbox.individual)
   toolbox.register("compile", gp.compile, pset = pset)
   toolbox.register("evaluate", eval_func, points = [x/10. for x in range(-10,10)])
   toolbox.register("select", tools.selTournament, tournsize = 3)
   toolbox.register("mate", gp.cxOnePoint)
   toolbox.register("expr_mut", gp.genFull, min_=0, max_=2)
   toolbox.register("mutate", gp.mutUniform, expr = toolbox.expr_mut, pset = pset)
   toolbox.decorate("mate", gp.staticLimit(key = operator.attrgetter("height"), max_value = 17))
   toolbox.decorate("mutate", gp.staticLimit(key = operator.attrgetter("height"), max_value = 17))
   return toolbox
if __name__ == "__main__":
   random.seed(7)
   toolbox = create_toolbox()
   population = toolbox.population(n = 450)
   hall_of_fame = tools.HallOfFame(1)
   stats_fit = tools.Statistics(lambda x: x.fitness.values)
   stats_size = tools.Statistics(len)
   mstats = tools.MultiStatistics(fitness=stats_fit, size = stats_size)
   mstats.register("avg", np.mean)
   mstats.register("std", np.std)
   mstats.register("min", np.min)
   mstats.register("max", np.max)
   probab_crossover = 0.4
   probab_mutate = 0.2
   number_gen = 10
   population, log = algorithms.eaSimple(population, toolbox,
      probab_crossover, probab_mutate, number_gen,
      stats = mstats, halloffame = hall_of_fame, verbose = True)

请注意,所有基本步骤与生成位模式时使用的相同。该程序将在 10 代之后给我们最小值、最大值和标准差 (std) 作为输出。

Note that all the basic steps are same as used while generating bit patterns. This program will give us the output as min, max, std (standard deviation) after 10 number of generations.