Beautiful Soup 简明教程
Beautiful Soup - descendants Property
Method Description
在 Beautiful Soup API 中,您可以使用 PageElement 对象的 descendants 属性遍历其下的所有子元素列表。此属性返回一个生成器对象,可以通过该对象以广度优先的顺序检索子元素。
With the descendants property of a PageElement object in Beautiful Soup API you can traverse the list of all children under it. This property returns a generator object, with which the children elements can be retrieved in a breadth-first sequence.
在搜索树结构时,广度优先遍历从树根开始,并在继续进入下一深度级别的节点之前,探索当前深度处的所有节点。
While searching a tree structure, the Breadth-first traversal starts at the tree root and explores all nodes at the present depth prior to moving on to the nodes at the next depth level.
Example 1
在下面的代码中,我们有一个带有嵌套无序列表标记的 HTML 文档。我们以广度优先的方式解析子元素。
In the code below, we have a HTML document with nested unordered list tags. We scrape through the children elements parsed in breadth-first manner.
html = '''
<ul id='outer'>
<li class="mainmenu">Accounts</li>
<ul>
<li class="submenu">Anand</li>
<li class="submenu">Mahesh</li>
</ul>
<li class="mainmenu">HR</li>
<ul>
<li class="submenu">Anil</li>
<li class="submenu">Milind</li>
</ul>
</ul>
'''
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'html.parser')
tag = soup.find('ul', {'id': 'outer'})
tags = soup.descendants
for desc in tags:
print (desc)
Output
<ul id="outer">
<li class="mainmenu">Accounts</li>
<ul>
<li class="submenu">Anand</li>
<li class="submenu">Mahesh</li>
</ul>
<li class="mainmenu">HR</li>
<ul>
<li class="submenu">Anil</li>
<li class="submenu">Milind</li>
</ul>
</ul>
<li class="mainmenu">Accounts</li>
Accounts
<ul>
<li class="submenu">Anand</li>
<li class="submenu">Mahesh</li>
</ul>
<li class="submenu">Anand</li>
Anand
<li class="submenu">Mahesh</li>
Mahesh
<li class="mainmenu">HR</li>
HR
<ul>
<li class="submenu">Anil</li>
<li class="submenu">Milind</li>
</ul>
<li class="submenu">Anil</li>
Anil
<li class="submenu">Milind</li>
Milind
Example 2
在下面的示例中,我们列出 <head> 标签的后代
In the following example, we list out the descendants of <head> tag
html = """
<html><head><title>TutorialsPoint</title></head>
<body>
<p>Hello World</p>
"""
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'html.parser')
tag = soup.head
for element in tag.descendants:
print (element)