Beautiful Soup 简明教程

Beautiful Soup - decode() Method

Method Description

Beautiful Soup 中的 decode() 方法将解析树作为 HTML 或 XML 文档返回为字符串或 Unicode 表示形式。此方法使用为编码注册的编解码器解码字节。它的函数与 encode() 方法相反。您调用 encode() 获取字节串,调用 decode() 获取 Unicode。让我们通过一些示例来学习 decode() 方法。

The decode() method in Beautiful Soup returns a string or Unicode representation of the parse tree as an HTML or XML document. The method decodes the bytes using the codec registered for encoding. Its function is opposite to that of encode() method. You call encode() to get a bytestring, and decode() to get Unicode. Let us study decode() method with some examples.

Syntax

decode(pretty_print, encoding, formatter, errors)

Parameters

  1. pretty_print − If this is True, indentation will be used to make the document more readable.

  2. encoding − The encoding of the final document. If this is None, the document will be a Unicode string.

  3. formatter − A Formatter object, or a string naming one of the standard formatters.

  4. errors − The error handling scheme to use for the handling of decoding errors. Values are 'strict', 'ignore' and 'replace'.

Return Value

decode() 方法返回一个 Unicode 字符串。

The decode() method returns a Unicode String.

Example

from bs4 import BeautifulSoup

soup = BeautifulSoup("Hello “World!”", 'html.parser')
enc = soup.encode('utf-8')
print (enc)
dec = enc.decode()
print (dec)

Output

b'Hello \xe2\x80\x9cWorld!\xe2\x80\x9d'
Hello "World!"