Biopython 简明教程

Biopython - Population Genetics

种群遗传学在进化论中扮演着重要的角色。它分析物种之间的遗传差异以及同一物种内的两个或更多个体之间的遗传差异。

Biopython 为种群遗传学提供 Bio.PopGen 模块,并主要支持 `GenePop,一个由 Michel Raymond 和 Francois Rousset 开发的流行遗传学包。

A simple parser

让我们编写一个简单的应用程序来解析 GenePop 格式并理解该概念。

在下面给出的链接中下载 Biopython 团队提供的 genePop 文件 − https://raw.githubusercontent.com/biopython/biopython/master/Tests/PopGen/c3line.gen

使用下面的代码段加载 GenePop 模块 −

from Bio.PopGen import GenePop

按照下面使用 GenePop.read 方法解析文件 −

record = GenePop.read(open("c3line.gen"))

显示下面给出的基因座和种群信息 −

>>> record.loci_list
['136255903', '136257048', '136257636']
>>> record.pop_list
['4', 'b3', '5']
>>> record.populations
[[('1', [(3, 3), (4, 4), (2, 2)]), ('2', [(3, 3), (3, 4), (2, 2)]),
   ('3', [(3, 3), (4, 4), (2, 2)]), ('4', [(3, 3), (4, 3), (None, None)])],
[('b1', [(None, None), (4, 4), (2, 2)]), ('b2', [(None, None), (4, 4), (2, 2)]),
   ('b3', [(None, None), (4, 4), (2, 2)])],
[('1', [(3, 3), (4, 4), (2, 2)]), ('2', [(3, 3), (1, 4), (2, 2)]),
   ('3', [(3, 2), (1, 1), (2, 2)]), ('4',
   [(None, None), (4, 4), (2, 2)]), ('5', [(3, 3), (4, 4), (2, 2)])]]
>>>

此处,文件中有三个基因座和三组种群:第一组种群有 4 条记录,第二组种群有 3 条记录,第三组种群有 5 条记录。record.populations 显示所有种群集以及每个基因座的等位基因数据。

Manipulate the GenePop file

Biopython 提供移除基因座和种群数据的选项。

Remove a population set by position,

>>> record.remove_population(0)
>>> record.populations
[[('b1', [(None, None), (4, 4), (2, 2)]),
   ('b2', [(None, None), (4, 4), (2, 2)]),
   ('b3', [(None, None), (4, 4), (2, 2)])],
   [('1', [(3, 3), (4, 4), (2, 2)]),
   ('2', [(3, 3), (1, 4), (2, 2)]),
   ('3', [(3, 2), (1, 1), (2, 2)]),
   ('4', [(None, None), (4, 4), (2, 2)]),
   ('5', [(3, 3), (4, 4), (2, 2)])]]
>>>

Remove a locus by position,

>>> record.remove_locus_by_position(0)
>>> record.loci_list
['136257048', '136257636']
>>> record.populations
[[('b1', [(4, 4), (2, 2)]), ('b2', [(4, 4), (2, 2)]), ('b3', [(4, 4), (2, 2)])],
   [('1', [(4, 4), (2, 2)]), ('2', [(1, 4), (2, 2)]),
   ('3', [(1, 1), (2, 2)]), ('4', [(4, 4), (2, 2)]), ('5', [(4, 4), (2, 2)])]]
>>>

Remove a locus by name,

>>> record.remove_locus_by_name('136257636') >>> record.loci_list
['136257048']
>>> record.populations
[[('b1', [(4, 4)]), ('b2', [(4, 4)]), ('b3', [(4, 4)])],
   [('1', [(4, 4)]), ('2', [(1, 4)]),
   ('3', [(1, 1)]), ('4', [(4, 4)]), ('5', [(4, 4)])]]
>>>

Interface with GenePop Software

Biopython 提供了与 GenePop 软件交互的接口,由此公开了该软件的许多功能。为此,使用了 Bio.PopGen.GenePop 模块。EasyController 是一个易于使用的接口。让我们了解如何解析 GenePop 文件以及使用 EasyController 执行一些分析。

首先,安装 GenePop 软件并将安装文件夹放入系统路径中。要获取关于 GenePop 文件的基本信息,请创建一个 EasyController 对象,然后按照下面指定的调用 get_basic_info 方法−

>>> from Bio.PopGen.GenePop.EasyController import EasyController
>>> ec = EasyController('c3line.gen')
>>> print(ec.get_basic_info())
(['4', 'b3', '5'], ['136255903', '136257048', '136257636'])
>>>

此处,第一项是种群列表,第二项是基因座列表。

要获取特定基因座的所有等位基因列表,请按如下指定传递基因座名称来调用 get_alleles_all_pops 方法−

>>> allele_list = ec.get_alleles_all_pops("136255903")
>>> print(allele_list)
[2, 3]

要按特定种群和基因座获取等位基因列表,请按如下指定传递基因座名称和种群位置来调用 get_alleles −

>>> allele_list = ec.get_alleles(0, "136255903")
>>> print(allele_list)
[]
>>> allele_list = ec.get_alleles(1, "136255903")
>>> print(allele_list)
[]
>>> allele_list = ec.get_alleles(2, "136255903")
>>> print(allele_list)
[2, 3]
>>>

同样,EasyController 公开了很多功能:等位基因频率、基因型频率、多基因座 F 统计量、哈代-温伯格平衡、连锁不平衡等。