Beautiful Soup 简明教程

Beautiful Soup - stripped_strings Property

Method Description

Tag/Soup 对象的 stripped_strings 属性给出了与 strings 属性类似的返回结果,但去除了额外的换行符和空格。因此,可以说 stripped_strings 属性造成了对象中属于使用对象的内部元素的可遍历字符串对象的生成器。

The stripped_strings property of a Tag/Soup object gives the return similar to strings property, except for the fact that the extra line breaks and whitespaces are stripped off. Hence, it can be said that the stripped_strings property results in a generator of NavigableString objects of the inner elements belonging to the object in use.

Syntax

Tag.stripped_strings

Example 1

在下面的示例中,BeautifulSoup 对象中解析的文档树中所有元素的字符串在应用剥离操作后被显示。

In the example below, the strings of all the elements in the document tree parsed in a BeautifulSoup object are displayed after applying the stripping.

from bs4 import BeautifulSoup, NavigableString

markup = '''
   <div id="Languages">
      <p>Java</p> <p>Python</p> <p>C++</p>
   </div>
'''
soup = BeautifulSoup(markup, 'html.parser')
print ([string for string in soup.stripped_strings])

Output

['Java', 'Python', 'C++']

与 strings 属性的输出结果相比,你可以看到换行符和空格都被移除了。

Compared to the output of strings property, you can see that the line breaks and whitespaces are removed.

Example 2

这里,我们在 <div> 标签下的每个子元素中提取可遍历字符串。

Here we extract the NavigableStrings of each of the child elements under the <div> tag.

tag = soup.div

navstrs = tag.stripped_strings
for navstr in navstrs:
   print (navstr)

Output

Java
Python
C++