Beautiful Soup 简明教程

Beautiful Soup - NavigableString Class

Beautiful Soup API 中普遍存在的一个主要对象是 NavigableString 类的对象。它表示大多数 HTML 标记的开始和结束部分之间的字符串或文本。例如,如果 <b>Hello</b> 是要解析的标记,Hello 是 NavigableString。

One of the main objects prevalent in Beautiful Soup API is the object of NavigableString class. It represents the string or text between the opening and closing counterparts of most of the HTML tags. For example, if <b>Hello</b> is the markup to be parsed, Hello is the NavigableString.

NavigableString 类是 bs4 包中的 PageElement 类的子类,也是 Python 的内置 str 类的子类。因此,它继承了 PageElement 方法,如 find_*()、insert、append、wrap、unwrap 方法以及 str 类的 upper、lower、find、isalpha 等方法。

NavigableString class is subclassed from the PageElement class in bs4 package, as well as Python’s built-in str class. Hence, it inherits the PageElement methods such as find_*(), insert, append, wrap,unwrap methods as well as methods from str class such as upper, lower, find, isalpha etc.

此类的构造函数采用一个参数,即 str 对象。

The constructor of this class takes a single argument, a str object.

Example

from bs4 import NavigableString
new_str = NavigableString('world')

现在,您可以使用此 NavigableString 对象对解析的树执行各种操作,例如 append、insert、find 等。

You can now use this NavigableString object to perform all kinds of operations on the parsed tree, such as append, insert, find etc.

在下面的示例中,我们将新创建的 NavigableString 对象附加到现有的 Tab 对象。

In the following example, we append the newly created NavigableString object to an existing Tab object.

Example

from bs4 import BeautifulSoup, NavigableString

markup = '<b>Hello</b>'
soup = BeautifulSoup(markup, 'html.parser')

tag = soup.b
new_str = NavigableString('world')
tag.append(new_str)
print (soup)

Output

<b>Helloworld</b>

请注意,NavigableString 是一个 PageElement,因此它也可以附加到 Soup 对象。如果我们这样做,请检查差异。

Note that the NavigableString is a PageElement, hence it can be appended to the Soup object also. Check the difference if we do so.

Example

new_str = NavigableString('world')
soup.append(new_str)
print (soup)

Output

<b>Hello</b>world

正如我们所看到的, string 位于 <b> 标签之后。

As we can see, the string appears after the <b> tag.

Beautiful Soup 提供了一个 new_string() 方法。创建一个与这个 BeautifulSoup 对象相关联的新 NavigableString。

Beautiful Soup offers a new_string() method. Create a new NavigableString associated with this BeautifulSoup object.

让我们使用 new_string() 方法创建一个 NavigableString 对象,并将其添加到 PageElements。

Let us new_string() method to create a NavigableString object, and add it to the PageElements.

Example

from bs4 import BeautifulSoup, NavigableString

markup = '<b>Hello</b>'
soup = BeautifulSoup(markup, 'html.parser')

tag = soup.b

ns=soup.new_string(' World')
tag.append(ns)
print (tag)
soup.append(ns)
print (soup)

Output

<b>Hello World</b>
<b>Hello</b> World

我们在这里发现了一个有趣的行为。NavigableString 对象被添加到树内部的一个标签以及 soup 对象本身。虽然标签显示附加的字符串,但在 soup 对象中,文本 World 被附加,但它不显示在标签中。这是因为 new_string() 方法创建了与 Soup 对象相关联的 NavigableString。

We find an interesting behaviour here. The NavigableString object is added to a tag inside the tree, as well as to the soup object itself. While the tag shows the appended string, but in the soup object, the text World is appended, but it doesn’t show in the tag. This is because the new_string() method creates a NavigableString associated with the Soup object.