Beautiful Soup 简明教程
Beautiful Soup - find_previous_siblings() Method
Method Description
Beautiful Soup包中的find_previous_siblings()方法返回所有在文档中出现在此PAgeElement更前面的兄弟,并且符合给定的条件。
The find_previous_siblings() method in Beautiful Soup package returns all siblings that appear earlier to this PAgeElement in the document and match the given criteria.
Parameters
-
name − A filter on tag name.
-
attrs − A dictionary of filters on attribute values.
-
string − A filter for a NavigableString with specific text.
-
limit − Stop looking after finding this many results.
-
kwargs − A dictionary of filters on attribute values.
Return Value
find_previous_siblings() 方法的结果集 PageElements。
The find_previous_siblings() method a ResultSet of PageElements.
Example 1
让我们为此目的使用以下 HTML 片段:
Let us use the following HTML snippet for this purpose −
<p>
<b>
Excellent
</b>
<i>
Python
</i>
<u>
Tutorial
</u>
</p>
在下面的代码中,我们尝试查找所有 <> 标记的同级元素。在用于搜索的 HTML 字符串中,同级别还有两个标记。
In the code below, we try to find all the siblings of <> tag. There are two more tags at the same level in the HTML string used for scraping.
from bs4 import BeautifulSoup
soup = BeautifulSoup("<p><b>Excellent</b><i>Python</i><u>Tutorial</u></p>", 'html.parser')
tag1 = soup.find('u')
print ("previous siblings:")
for tag in tag1.find_previous_siblings():
print (tag)
Example 2
网页(index.html)有一个 HTML 表单,其中包含三个输入元素。我们使用 id 属性为 marks 找到一个输入元素,然后查找其之前的兄弟元素。
The web page (index.html) has a HTML form with three input elements. We locate one with id attribute as marks and then find its previous siblings.
from bs4 import BeautifulSoup
fp = open("index.html")
soup = BeautifulSoup(fp, 'html.parser')
tag = soup.find('input', {'id':'marks'})
sibs = tag.find_previous_sibling()
print (sibs)
Example 3
HTML 字符串中有两个 <p> 标记。我们找出了 id 属性为 id1 的标记之前的兄弟元素。
The HTML string has two <p> tags. We find out the siblings previous to the one with id1 as its id attribute.
html = '''
<p><b>Excellent</b><p>Python</p><p id='id1'>Tutorial</p></p>
'''
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'html.parser')
tag = soup.find('p', id='id1')
ptags = tag.find_previous_siblings()
for ptag in ptags:
print ("Tag: {}, Text: {}".format(ptag.name, ptag.text))