Beautiful Soup 简明教程

Beautiful Soup - clear() Method

Method Description

Beautiful Soup 库中的 clear() 方法清除标签的内部内容,保持标签的完整性。如果存在子元素,将调用其 extract() 方法。如果 decompose 参数设置为 True,则调用 decompose() 方法,而不是 extract()。

The clear() method in Beautiful Soup library removes the inner content of a tag, keeping the tag intact. If there are any child elements, extract() method is called on them. If decompose argument is set to True, then decompose() method is called instead of extract().

Syntax

clear(decompose=False)

Parameters

  1. decompose − If this is True, decompose() (a more destructive method) will be called instead of extract()

Return Value

clear() 方法不返回任何对象。

The clear() method doesn’t return any object.

Example 1

由于 clear() 方法在表示整个文档的 soup 对象上调用,因此所有内容都会被移除,文档将为空。

As clear() method is called on the soup object that represents the entire document, all the content is removed, leaving the document blank.

html = '''
<html>
   <body>
      <p>The quick, brown fox jumps over a lazy dog.</p>
      <p>DJs flock by when MTV ax quiz prog.</p>
      <p>Junk MTV quiz graced by fox whelps.</p>
      <p>Bawds jog, flick quartz, vex nymphs.</p>
   </body>
</html>
'''
from bs4 import BeautifulSoup

soup = BeautifulSoup(html, "html.parser")
soup.clear()
print(soup)

Output

Example 2

在以下示例中,我们找到所有 <p> 标签并在每个标签上调用 clear() 方法。

In the following example, we find all the <p> tags and call clear() method on each of them.

html = '''
<html>
   <body>
      <p>The quick, brown fox jumps over a lazy dog.</p>
      <p>DJs flock by when MTV ax quiz prog.</p>
      <p>Junk MTV quiz graced by fox whelps.</p>
      <p>Bawds jog, flick quartz, vex nymphs.</p>
   </body>
</html>
'''
from bs4 import BeautifulSoup

soup = BeautifulSoup(html, "html.parser")
tags = soup.find_all('p')
for tag in tags:
   tag.clear()

print(soup)

Output

每个 <p> .. </p> 的内容将被删除,标签将保留。

Contents of each <p> .. </p> will be removed, the tags will be retained.

<html>
<body>
<p></p>
<p></p>
<p></p>
<p></p>
</body>
</html>

Example 3

我们在此清除 <body> 标签的内容,同时将 decompose 参数设置为 Tue。

Here we clear the contents of <body> tags with decompose argument set to Tue.

html = '''
<html>
   <body>
      <p>The quick, brown fox jumps over a lazy dog.</p>
      <p>DJs flock by when MTV ax quiz prog.</p>
      <p>Junk MTV quiz graced by fox whelps.</p>
      <p>Bawds jog, flick quartz, vex nymphs.</p>
   </body>
</html>
'''
from bs4 import BeautifulSoup

soup = BeautifulSoup(html, "html.parser")
tags = soup.find('body')
ret = tags.clear(decompose=True)

print(soup)

Output

<html>
<body></body>
</html>