Beautiful Soup 简明教程

Beautiful Soup - find_all_previous() Method

Method Description

Beautiful Soup 中的 find_all_previous() 方法会向后查看此 PageElement 文档,并查找与给定条件匹配且出现在当前元素之前的所有 PageElements。它返回一个 ResultsSet 的 PageElements,该结果集出现在文档中的当前标记之前。与所有其他查找方法一样,此方法具有以下语法:

The find_all_previous() method in Beautiful Soup look backwards in the document from this PageElement and finds all the PageElements that match the given criteria and appear before the current element. It returns a ResultsSet of PageElements that comes before the current tag in the document. Like all other find methods, this method has the following syntax −

Syntax

find_previous(name, attrs, string, limit, **kwargs)

Parameters

  1. name − A filter on tag name.

  2. attrs − A dictionary of filters on attribute values.

  3. string − A filter for a NavigableString with specific text.

  4. limit − Stop looking after finding this many results.

  5. kwargs − A dictionary of filters on attribute values.

Return Value

find_all_previous() 方法返回一个 Tag 或 NavigableString 对象的结果集。如果 limit 参数为 1,则该方法等效于 find_previous() 方法。

The find_all_previous() method returns a ResultSet of Tag or NavigableString objects. If the limit parameter is 1, the method is equivalent to find_previous() method.

Example 1

在此示例中,显示了出现在第一个 input 标记之前的每个对象的 name 属性。

In this example, name property of each object that appears before the first input tag is displayed.

from bs4 import BeautifulSoup

fp = open("index.html")
soup = BeautifulSoup(fp, 'html.parser')
tag = soup.find('input')
for t in tag.find_all_previous():
   print (t.name)

Output

form
h1
body
title
head
html

Example 2

在所考虑的 HTML 文档(index.html)中,有三个输入元素。使用以下代码,我们打印 marks.nm 属性之前的 <input> 标记之前所有标记的标记名称。为了区分之前的两个输入标记,我们还会打印 attrs 属性。请注意,其他标记没有任何属性。

In the HTML document under consideration (index.html), there are three input elements. With the following code, we print the tag names of all preceding tags before thr <input> tag with nm attribute as marks. To differentiate between the two input tags before it, we also print the attrs property. Note that the other tags don’t have any attributes.

from bs4 import BeautifulSoup

fp = open("index.html")
soup = BeautifulSoup(fp, 'html.parser')
tag = soup.find('input', {'name':'marks'})
pretags = tag.find_all_previous()
for pretag in pretags:
   print (pretag.name, pretag.attrs)

Output

input {'type': 'text', 'id': 'age', 'name': 'age'}
input {'type': 'text', 'id': 'nm', 'name': 'name'}
form {}
h1 {}
body {}
title {}
head {}
html {}

Example 3

BeautifulSoup 对象存储了整个文档的树。它没有之前元素,如下例所示:

The BeautifulSoup object stores the entire document’s tree. It doesn’t have any previous element, as the example below shows −

from bs4 import BeautifulSoup

fp = open("index.html")
soup = BeautifulSoup(fp, 'html.parser')
tags = soup.find_all_previous()
print (tags)

Output

[]