Beautiful Soup 简明教程

Beautiful Soup - encode() Method

Method Description

Beautiful Soup 中的 encode() 方法呈现给定的 PageElement 及其内容的字节字符串表示形式。

The encode() method in Beautiful Soup renders a bytestring representation of the given PageElement and its contents.

prettify() 方法允许你轻松直观地展示 Beautiful Soup 解析树的结构,它有 encoding 参数。encode() 方法在 prettify() 方法中的作用与编码相同。

The prettify() method, which allows to you to easily visualize the structure of the Beautiful Soup parse tree, has the encoding argument. The encode() method plays the same role as the encoding in prettify() method has.

Syntax

encode(encoding, indent_level, formatter, errors)

Parameters

  1. encoding − The destination encoding.

  2. indent_level − Each line of the rendering will be

  3. indented this many levels. Used internally in recursive calls while pretty-printing.

  4. formatter − A Formatter object, or a string naming one of the standard formatters.

  5. errors − An error handling strategy.

Return Value

encode() 方法返回标记及其内容的字节字符串表示形式。

The encode() method returns a byte string representation of the tag and its contents.

Example 1

默认情况下,编码参数为 utf-8。以下代码显示了羹对象经过编码后的字节字符串表示形式。

The encoding parameter is utf-8 by default. Following code shows the encoded byte string representation of the soup object.

from bs4 import BeautifulSoup

soup = BeautifulSoup("Hello “World!”", 'html.parser')
print (soup.encode('utf-8'))

Output

b'Hello \xe2\x80\x9cWorld!\xe2\x80\x9d'

Example 2

制表符对象具有以下预定义值 −

The formatter object has the following predefined values −

formatter="minimal" − 它是默认值。将对字符串进行足够的处理以确保 Beautiful Soup 生成有效的 HTML/XML。

formatter="minimal" − This is the default. Strings will only be processed enough to ensure that Beautiful Soup generates valid HTML/XML.

formatter="html" − 只要有可能,Beautiful Soup 将把 Unicode 字符转换为 HTML 实体。

formatter="html" − Beautiful Soup will convert Unicode characters to HTML entities whenever possible.

formatter="html5" − 它类似于 formatter="html",但是 Beautiful Soup 将在 HTML 空标签(例如 "br")中省略结束斜杠。

formatter="html5" − it’s similar to formatter="html", but Beautiful Soup will omit the closing slash in HTML void tags like "br".

formatter=None − Beautiful Soup 将不修改输出中的字符串。这是最快的选项,但可能导致 Beautiful Soup 生成无效的 HTML/XML。

formatter=None − Beautiful Soup will not modify strings at all on output. This is the fastest option, but it may lead to Beautiful Soup generating invalid HTML/XML.

在以下示例中,不同的制表符值被用作 encode() 方法的参数。

In the following example, different formatter values are used as argument for encode() method.

from bs4 import BeautifulSoup

french = "<p>Il a dit <<Sacré bleu!>></p>"
soup = BeautifulSoup(french, 'html.parser')
print ("minimal: ")
print(soup.p.encode(formatter="minimal"))
print ("html: ")
print(soup.p.encode(formatter="html"))
print ("None: ")
print(soup.p.encode(formatter=None))

Output

minimal:
b'<p>Il a dit <<Sacr\xc3\xa9 bleu!>></p>'
html:
b'<p>Il a dit <<Sacré bleu!>></p>'
None:
b'<p>Il a dit <<Sacr\xc3\xa9 bleu!>></p>'

Example 3

以下示例使用 Latin-1 作为编码参数。

The following example uses Latin-1 as the encoding parameter.

markup = '''
<html>
   <head>
      <meta content="text/html; charset=ISO-Latin-1" http-equiv="Content-type" />
   </head>
   <body>
      <p>Sacr`e bleu!</p>
   </body>
</html>
'''
from bs4 import BeautifulSoup

soup = BeautifulSoup(markup, 'lxml')
print(soup.p.encode("latin-1"))

Output

b'<p>Sacr`e bleu!</p>'