Beautiful Soup 简明教程

Beautiful Soup - Pretty Printing

要显示 HTML 文档的整个已解析树或特定标记的内容,您也可以使用 print() 函数或同样调用 str() 函数。

To display the entire parsed tree of an HTML document or the contents of a specific tag, you can use the print() function or call str() function as well.

Example

from bs4 import BeautifulSoup

soup = BeautifulSoup("<h1>Hello World</h1>", "lxml")
print ("Tree:",soup)
print ("h1 tag:",str(soup.h1))

Output

Tree: <html><body><h1>Hello World</h1></body></html>
h1 tag: <h1>Hello World</h1>

str() 函数返回一个以 UTF-8 编码的字符串。

The str() function returns a string encoded in UTF-8.

要获得格式良好的 Unicode 字符串,请使用 Beautiful Soup 的 prettify() 方法。它对 Beautiful Soup 解析树进行格式化,以便每个标记都位于单独的行上并带有缩进。它允许您轻松可视化 Beautiful Soup 解析树的结构。

To get a nicely formatted Unicode string, use Beautiful Soup’s prettify() method. It formats the Beautiful Soup parse tree so that there each tag is on its own separate line with indentation. It allows to you to easily visualize the structure of the Beautiful Soup parse tree.

考虑以下 HTML 字符串。

Consider the following HTML string.

<p>The quick, <b>brown fox</b> jumps over a lazy dog.</p>

通过使用 prettify() 方法,我们可以更好地理解其结构 −

Using the prettify() method we can better understand its structure −

html = '''
   <p>The quick, <b>brown fox</b> jumps over a lazy dog.</p>
'''
from bs4 import BeautifulSoup

soup = BeautifulSoup(html, "lxml")
print (soup.prettify())

Output

<html>
 <body>
  <p>
   The quick,
   <b>
    brown fox
   </b>
   jumps over a lazy dog.
  </p>
 </body>
</html>

您可以在文档中的任何标记对象上调用 prettify()。

You can call prettify() on on any of the Tag objects in the document.

print (soup.b.prettify())

Output

<b>
 brown fox
</b>

prettify() 方法用于理解文档的结构。但是,它不应该用于重新对其进行格式化,因为它会添加空白(采用新行的形式),并更改 HTML 文档的含义。

The prettify() method is for understanding the structure of the document. However, it should not be used to reformat it, as it adds whitespace (in the form of newlines), and changes the meaning of an HTML document.

prettify() 方法可以选择提供 formatter 参数,以指定要使用的格式化。

He prettify() method can optionally be provided formatter argument to specify the formatting to be used.