Beautiful Soup 简明教程

Beautiful Soup - decompose() Method

Method Description

decompose() 方法销毁当前元素及其子元素,因此元素从树中移除,将其擦除及其下的所有内容。你可以通过 decomposed 属性来检查元素是否已分解。如果已销毁,返回 True,否则返回 false。

The decompose() method destroys current element along with its children, thus the element is removed from the tree, wiping it out and everything beneath it. You can check whether an element has been decomposed, by the decomposed property. It returns True if destroyed, false otherwise.

Syntax

decompose()

Parameters

未为该方法定义任何参数。

No parameters are defined for this method.

Return Type

该方法不返回任何对象。

The method doesn’t return any object.

Example 1

当我们对 BeautifulSoup 对象本身调用 descompose() 方法时,整个内容将被销毁。

When we call descompose() method on the BeautifulSoup object itself, the entire content will be destroyed.

html = '''
<html>
   <body>
      <p>The quick, brown fox jumps over a lazy dog.</p>
      <p>DJs flock by when MTV ax quiz prog.</p>
      <p>Junk MTV quiz graced by fox whelps.</p>
      <p>Bawds jog, flick quartz, vex nymphs.</p>
   </body>
</html>
'''
from bs4 import BeautifulSoup

soup = BeautifulSoup(html, "html.parser")
soup.decompose()
print ("decomposed:",soup.decomposed)
print (soup)

Output

decomposed: True
document: Traceback (most recent call last):
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~
TypeError: can only concatenate str (not "NoneType") to str

由于 soup 对象已分解,它返回 True,但是,你会得到如上所示的 TypeError。

Since the soup object is decomposed, it returns True, however, you get TypeError as shown above.

Example 2

下面的代码使用 decompose() 方法,删除 HTML 字符串中所有出现的 <p> 标签。

The code below makes use of decompose() method to remove all the occurrences of <p> tags in the HTML string used.

html = '''
<html>
   <body>
      <p>The quick, brown fox jumps over a lazy dog.</p>
      <p>DJs flock by when MTV ax quiz prog.</p>
      <p>Junk MTV quiz graced by fox whelps.</p>
      <p>Bawds jog, flick quartz, vex nymphs.</p>
   </body>
</html>
'''
from bs4 import BeautifulSoup

soup = BeautifulSoup(html, "html.parser")
p_all = soup.find_all('p')
[p.decompose() for p in p_all]

print ("document:",soup)

Output

移除所有 <p> 标签后,剩余的 HTML 文档将会被打印出来。

Rest of the HTML document after removing all <p> tags will be printed.

document:
<html>
<body>

</body>
</html>

Example 3

在此,我们从 HTML 文档树中找到 <body> 标签,并分解前一个元素,该元素恰好是 <title> 标签。生成的文档树中省略了 <title> 标签。

Here, we find the <body> tag from the HTML document tree and decompose the previous element which happens to be the <title> tag. The resultant document tree omits the <title> tag.

html = '''
<html>
   <head>
      <title>TutorialsPoint</title>
   </head>
   <body>
      Hello World
   </body>
</html>

'''
from bs4 import BeautifulSoup

soup = BeautifulSoup(html, "html.parser")
tag = soup.body
tag.find_previous().decompose()

print ("document:",soup)

Output

document:
<html>
<head>

</head>
<body>
Hello World
</body>
</html>