Beautiful Soup 简明教程

Beautiful Soup - strings Property

Method Description

对于包含多个子级的任何页面元素,可以通过 strings 属性获取每个子级的内部文本。与 string 属性不同,strings 处理元素包含多个子级的情况。strings 属性返回生成器对象。它会生成对应于每个子元素的 NavigableStrings 序列。

For any PageElement having more than one children, the inner text of each can be fetched by the strings property. Unlike the string property, strings handles the case when the element contains multiple children. The strings property returns a generator object. It yields a sequence of NavigableStrings corresponding to each of the child elements.

Syntax

Tag.strings

Example 1

你可以为汤对象和标记对象检索 strings 属性的值。在以下示例中,检查了汤对象的 stings 属性。

You can retrieve the value od strings property for soup as well as a tag object. In the following example, the soup object’s stings property is checked.

from bs4 import BeautifulSoup, NavigableString

markup = '''
   <div id="Languages">
      <p>Java</p> <p>Python</p> <p>C++</p>
   </div>
'''
soup = BeautifulSoup(markup, 'html.parser')
print ([string for string in soup.strings])

Output

['\n', '\n', 'Java', ' ', 'Python', ' ', 'C++', '\n', '\n']

注意列表中的换行符和空格。我们可以使用 stripped_strings 属性删除它们。

Note the line breaks and white spaces in the list.We can remove them with stripped_strings property.

Example 2

我们现在获得 <div> 标记的 strings 属性返回的生成器对象。使用循环,我们打印字符串。

We now obtain a generator object returned by the strings property of <div> tag. With a loop, we print the strings.

tag = soup.div

navstrs = tag.strings
for navstr in navstrs:
   print (navstr)

Output

Java

Python

C++

请注意,换行符和空格已出现在输出中,可以使用 stripped_strings 属性将其删除。

Note that the line breaks and whiteapces have appeared in the output, which can be removed with stripped_strings property.