Beautiful Soup 简明教程
Beautiful Soup - clear() Method
Method Description
Beautiful Soup 库中的 clear() 方法清除标签的内部内容,保持标签的完整性。如果存在子元素,将调用其 extract() 方法。如果 decompose 参数设置为 True,则调用 decompose() 方法,而不是 extract()。
Example 1
由于 clear() 方法在表示整个文档的 soup 对象上调用,因此所有内容都会被移除,文档将为空。
html = '''
<html>
<body>
<p>The quick, brown fox jumps over a lazy dog.</p>
<p>DJs flock by when MTV ax quiz prog.</p>
<p>Junk MTV quiz graced by fox whelps.</p>
<p>Bawds jog, flick quartz, vex nymphs.</p>
</body>
</html>
'''
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, "html.parser")
soup.clear()
print(soup)
Example 2
在以下示例中,我们找到所有 <p> 标签并在每个标签上调用 clear() 方法。
html = '''
<html>
<body>
<p>The quick, brown fox jumps over a lazy dog.</p>
<p>DJs flock by when MTV ax quiz prog.</p>
<p>Junk MTV quiz graced by fox whelps.</p>
<p>Bawds jog, flick quartz, vex nymphs.</p>
</body>
</html>
'''
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, "html.parser")
tags = soup.find_all('p')
for tag in tags:
tag.clear()
print(soup)
Example 3
我们在此清除 <body> 标签的内容,同时将 decompose 参数设置为 Tue。
html = '''
<html>
<body>
<p>The quick, brown fox jumps over a lazy dog.</p>
<p>DJs flock by when MTV ax quiz prog.</p>
<p>Junk MTV quiz graced by fox whelps.</p>
<p>Bawds jog, flick quartz, vex nymphs.</p>
</body>
</html>
'''
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, "html.parser")
tags = soup.find('body')
ret = tags.clear(decompose=True)
print(soup)