Beautiful Soup 简明教程

Beautiful Soup - decompose() Method

Method Description

decompose() 方法销毁当前元素及其子元素,因此元素从树中移除,将其擦除及其下的所有内容。你可以通过 decomposed 属性来检查元素是否已分解。如果已销毁,返回 True,否则返回 false。

Syntax

decompose()

Parameters

未为该方法定义任何参数。

Return Type

该方法不返回任何对象。

Example 1

当我们对 BeautifulSoup 对象本身调用 descompose() 方法时,整个内容将被销毁。

html = '''
<html>
   <body>
      <p>The quick, brown fox jumps over a lazy dog.</p>
      <p>DJs flock by when MTV ax quiz prog.</p>
      <p>Junk MTV quiz graced by fox whelps.</p>
      <p>Bawds jog, flick quartz, vex nymphs.</p>
   </body>
</html>
'''
from bs4 import BeautifulSoup

soup = BeautifulSoup(html, "html.parser")
soup.decompose()
print ("decomposed:",soup.decomposed)
print (soup)

Output

decomposed: True
document: Traceback (most recent call last):
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~
TypeError: can only concatenate str (not "NoneType") to str

由于 soup 对象已分解,它返回 True,但是,你会得到如上所示的 TypeError。

Example 2

下面的代码使用 decompose() 方法,删除 HTML 字符串中所有出现的 <p> 标签。

html = '''
<html>
   <body>
      <p>The quick, brown fox jumps over a lazy dog.</p>
      <p>DJs flock by when MTV ax quiz prog.</p>
      <p>Junk MTV quiz graced by fox whelps.</p>
      <p>Bawds jog, flick quartz, vex nymphs.</p>
   </body>
</html>
'''
from bs4 import BeautifulSoup

soup = BeautifulSoup(html, "html.parser")
p_all = soup.find_all('p')
[p.decompose() for p in p_all]

print ("document:",soup)

Output

移除所有 <p> 标签后,剩余的 HTML 文档将会被打印出来。

document:
<html>
<body>

</body>
</html>

Example 3

在此,我们从 HTML 文档树中找到 <body> 标签,并分解前一个元素,该元素恰好是 <title> 标签。生成的文档树中省略了 <title> 标签。

html = '''
<html>
   <head>
      <title>TutorialsPoint</title>
   </head>
   <body>
      Hello World
   </body>
</html>

'''
from bs4 import BeautifulSoup

soup = BeautifulSoup(html, "html.parser")
tag = soup.body
tag.find_previous().decompose()

print ("document:",soup)

Output

document:
<html>
<head>

</head>
<body>
Hello World
</body>
</html>