Beautiful Soup 简明教程
Beautiful Soup - decompose() Method
Method Description
decompose() 方法销毁当前元素及其子元素,因此元素从树中移除,将其擦除及其下的所有内容。你可以通过 decomposed
属性来检查元素是否已分解。如果已销毁,返回 True,否则返回 false。
Example 1
当我们对 BeautifulSoup 对象本身调用 descompose() 方法时,整个内容将被销毁。
html = '''
<html>
<body>
<p>The quick, brown fox jumps over a lazy dog.</p>
<p>DJs flock by when MTV ax quiz prog.</p>
<p>Junk MTV quiz graced by fox whelps.</p>
<p>Bawds jog, flick quartz, vex nymphs.</p>
</body>
</html>
'''
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, "html.parser")
soup.decompose()
print ("decomposed:",soup.decomposed)
print (soup)
Output
decomposed: True
document: Traceback (most recent call last):
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~
TypeError: can only concatenate str (not "NoneType") to str
由于 soup 对象已分解,它返回 True,但是,你会得到如上所示的 TypeError。
Example 2
下面的代码使用 decompose() 方法,删除 HTML 字符串中所有出现的 <p> 标签。
html = '''
<html>
<body>
<p>The quick, brown fox jumps over a lazy dog.</p>
<p>DJs flock by when MTV ax quiz prog.</p>
<p>Junk MTV quiz graced by fox whelps.</p>
<p>Bawds jog, flick quartz, vex nymphs.</p>
</body>
</html>
'''
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, "html.parser")
p_all = soup.find_all('p')
[p.decompose() for p in p_all]
print ("document:",soup)
Example 3
在此,我们从 HTML 文档树中找到 <body> 标签,并分解前一个元素,该元素恰好是 <title> 标签。生成的文档树中省略了 <title> 标签。
html = '''
<html>
<head>
<title>TutorialsPoint</title>
</head>
<body>
Hello World
</body>
</html>
'''
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, "html.parser")
tag = soup.body
tag.find_previous().decompose()
print ("document:",soup)