Beautiful Soup 简明教程

Beautiful Soup - descendants Property

Method Description

在 Beautiful Soup API 中,您可以使用 PageElement 对象的 descendants 属性遍历其下的所有子元素列表。此属性返回一个生成器对象,可以通过该对象以广度优先的顺序检索子元素。

With the descendants property of a PageElement object in Beautiful Soup API you can traverse the list of all children under it. This property returns a generator object, with which the children elements can be retrieved in a breadth-first sequence.

在搜索树结构时,广度优先遍历从树根开始,并在继续进入下一深度级别的节点之前,探索当前深度处的所有节点。

While searching a tree structure, the Breadth-first traversal starts at the tree root and explores all nodes at the present depth prior to moving on to the nodes at the next depth level.

Syntax

tag.descendants

Return value

descendants 属性返回一个生成器对象。

The descendants property returns a generator object.

Example 1

在下面的代码中,我们有一个带有嵌套无序列表标记的 HTML 文档。我们以广度优先的方式解析子元素。

In the code below, we have a HTML document with nested unordered list tags. We scrape through the children elements parsed in breadth-first manner.

html = '''
   <ul id='outer'>
   <li class="mainmenu">Accounts</li>
      <ul>
      <li class="submenu">Anand</li>
      <li class="submenu">Mahesh</li>
      </ul>
   <li class="mainmenu">HR</li>
      <ul>
      <li class="submenu">Anil</li>
      <li class="submenu">Milind</li>
      </ul>
   </ul>
'''
from bs4 import BeautifulSoup

soup = BeautifulSoup(html, 'html.parser')
tag = soup.find('ul', {'id': 'outer'})
tags = soup.descendants
for desc in tags:
   print (desc)

Output

<ul id="outer">
<li class="mainmenu">Accounts</li>
<ul>
<li class="submenu">Anand</li>
<li class="submenu">Mahesh</li>
</ul>
<li class="mainmenu">HR</li>
<ul>
<li class="submenu">Anil</li>
<li class="submenu">Milind</li>
</ul>
</ul>

<li class="mainmenu">Accounts</li>
Accounts
<ul>
<li class="submenu">Anand</li>
<li class="submenu">Mahesh</li>
</ul>

<li class="submenu">Anand</li>
Anand
<li class="submenu">Mahesh</li>
Mahesh

<li class="mainmenu">HR</li>
HR
<ul>
<li class="submenu">Anil</li>
<li class="submenu">Milind</li>
</ul>

<li class="submenu">Anil</li>
Anil
<li class="submenu">Milind</li>
Milind

Example 2

在下面的示例中,我们列出 <head> 标签的后代

In the following example, we list out the descendants of <head> tag

html = """
<html><head><title>TutorialsPoint</title></head>
<body>
<p>Hello World</p>
"""
from bs4 import BeautifulSoup

soup = BeautifulSoup(html, 'html.parser')
tag = soup.head
for element in tag.descendants:
   print (element)

Output

<title>TutorialsPoint</title>
TutorialsPoint