Beautiful Soup 简明教程
Beautiful Soup - prettify() Method
Method Description
要获得格式良好的 Unicode 字符串,请使用 Beautiful Soup 的 prettify() 方法。它对 Beautiful Soup 解析树进行格式化,以便每个标记都位于单独的行上并带有缩进。它允许您轻松可视化 Beautiful Soup 解析树的结构。
To get a nicely formatted Unicode string, use Beautiful Soup’s prettify() method. It formats the Beautiful Soup parse tree so that there each tag is on its own separate line with indentation. It allows to you to easily visualize the structure of the Beautiful Soup parse tree.
Parameters
-
encoding − The eventual encoding of the string. If this is None, a Unicode string will be returned.
-
A Formatter object, or a string naming one of the standard formatters.
Return Type
prettify() 方法返回 Unicode 字符串(如果编码==无)或字节串(否则)。
The prettify() method returns a Unicode string (if encoding==None) or a bytestring (otherwise).
Example 1
考虑以下 HTML 字符串。
Consider the following HTML string.
<p>The quick, <b>brown fox</b> jumps over a lazy dog.</p>
通过使用 prettify() 方法,我们可以更好地理解其结构 −
Using the prettify() method we can better understand its structure −
html = '''
<p>The quick, <b>brown fox</b> jumps over a lazy dog.</p>
'''
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, "lxml")
print (soup.prettify())
Example 2
您可以在文档中的任何标记对象上调用 prettify()。
You can call prettify() on on any of the Tag objects in the document.
print (soup.b.prettify())
Output
<b>
brown fox
</b>
prettify() 方法用于理解文档的结构。但是,它不应该用于重新对其进行格式化,因为它会添加空白(采用新行的形式),并更改 HTML 文档的含义。
The prettify() method is for understanding the structure of the document. However, it should not be used to reformat it, as it adds whitespace (in the form of newlines), and changes the meaning of an HTML document.
prettify() 方法可以选择提供 formatter 参数,以指定要使用的格式化。
He prettify() method can optionally be provided formatter argument to specify the formatting to be used.
以下是 formatter 的可能值。
There are following possible values for the formatter.
formatter="minimal" − 它是默认值。将对字符串进行足够的处理以确保 Beautiful Soup 生成有效的 HTML/XML。
formatter="minimal" − This is the default. Strings will only be processed enough to ensure that Beautiful Soup generates valid HTML/XML.
formatter="html" − 只要有可能,Beautiful Soup 将把 Unicode 字符转换为 HTML 实体。
formatter="html" − Beautiful Soup will convert Unicode characters to HTML entities whenever possible.
formatter="html5" − 它类似于 formatter="html",但是 Beautiful Soup 将在 HTML 空标签(例如 "br")中省略结束斜杠。
formatter="html5" − it’s similar to formatter="html", but Beautiful Soup will omit the closing slash in HTML void tags like "br".
formatter=None − Beautiful Soup 将不修改输出中的字符串。这是最快的选项,但可能导致 Beautiful Soup 生成无效的 HTML/XML。
formatter=None − Beautiful Soup will not modify strings at all on output. This is the fastest option, but it may lead to Beautiful Soup generating invalid HTML/XML.