Beautiful Soup 简明教程

Beautiful Soup - parent Property

Method Description

BeautifulSoup库中的parent属性返回所述PegeElement的直接父元素。parents属性返回的值的类型是Tag对象。对于BeautifulSoup对象,其父级是文档对象

The parent property in BeautifulSoup library returns the immediate parent element of the said PegeElement. The type of the value returned by the parents property is a Tag object. For the BeautifulSoup object, its parent is a document object

Syntax

Element.parent

Return value

parent属性返回Tag对象。对于Soup对象,它返回文档对象

The parent property returns a Tag object. For Soup object, it returns document object

Example 1

此示例使用.parent属性来查找示例HTML字符串中第一个<p>标签的直接父元素。

This example uses .parent property to find the immediate parent element of the first <p> tag in the example HTML string.

html = """
<html>
   <head>
      <title>TutorialsPoint</title>
   </head>
   <body>
      <p>Hello World</p>
"""
from bs4 import BeautifulSoup

soup = BeautifulSoup(html, 'html.parser')
tag = soup.p
print (tag.parent.name)

Output

body

Example 2

在以下示例中,我们看到<title>标签封闭在<head>标签内。因此,<title>标签的parent属性返回<head>标签。

In the following example, we see that the <title> tag is enclosed inside a <head> tag. Hence, the parent property for <title> tag returns the <head> tag.

html = """
<html>
   <head>
      <title>TutorialsPoint</title>
   </head>
   <body>
      <p>Hello World</p>
"""
from bs4 import BeautifulSoup

soup = BeautifulSoup(html, 'html.parser')
tag = soup.title
print (tag.parent)

Output

<head><title>TutorialsPoint</title></head>

Example 3

Python的内置HTML解析器的行为与html5lib和lxml解析器略有不同。内置解析器不会尝试从提供的字符串中构建一个完美的文档。如果字符串中不存在的话,它不会添加附加的父标签,如body或html。另一方面,html5lib和lxml解析器会添加这些标签以使文档成为一个完美的HTML文档。

The behaviour of Python’s built-in HTML parser is a little different from html5lib and lxml parsers. The built-in parser doesn’t try to build a perfect document out of the string provided. It doesn’t add additional parent tags like body or html if they don’t exist in the string. On the other hand, html5lib and lxml parsers add these tags to make the document a perfect HTML document.

html = """
<p><b>Hello World</b></p>
"""
from bs4 import BeautifulSoup

soup = BeautifulSoup(html, 'html.parser')
print (soup.p.parent.name)

soup = BeautifulSoup(html, 'html5lib')
print (soup.p.parent.name)

Output

[document]
Body

由于HTML解析器不添加附加标签,所以被解析的soup的父级是文档对象。然而,当我们使用html5lib时,父级标签的name属性为Body。

As the HTML parser doesn’t add additional tags, the parent of parsed soup is document object. However, when we use html5lib, the parent tag’s name property is Body.