Beautiful Soup 简明教程

Beautiful Soup - find_all() Method

Method Description

Beautiful Soup 中的 find_all() 方法查找与该 PageElement 的子元素中的给定条件相匹配的元素并返回所有元素的列表。

The find_all() method in Beautiful Soup looks for the elements that match the given criteria in the children of this PageElement and returns a list of all elements.

Syntax

Soup.find_all(name, attrs, recursive, string, **kwargs)

Parameters

name − 对标记名称的筛选。

name − A filter on tag name.

attrs − 对属性值进行筛选的字典。

attrs − A dictionary of filters on attribute values.

recursive − 如果为 True,则 find() 将执行递归搜索。否则,仅考虑直接子元素。

recursive − If this is True, find() a recursive search will be performed. Otherwise, only the direct children will be considered.

limit − 在找到指定数量的出现次数后停止寻找。

limit − Stop looking after specified number of occurrences have been found.

kwargs − 对属性值进行筛选的字典。

kwargs − A dictionary of filters on attribute values.

Return type

find_all() 方法会返回一个 ResultSet 对象,这是一个列表生成器。

The find_all() method returns a ResultSet object which is a list generator.

Example 1

当我们能够以 name 的形式传递一个值时,Beautiful Soup 才会考虑具有特定名称的标签。文本字符串会被忽略,与不匹配名称的标签也会被忽略。在此示例中,我们将 title 传递给 find_all() 方法。

When we can pass in a value for name, Beautiful Soup only considers tags with certain names. Text strings will be ignored, as will tags whose names that don’t match. In this example we pass title to find_all() method.

from bs4 import BeautifulSoup
html = open('index.html')
soup = BeautifulSoup(html, 'html.parser')
obj = soup.find_all('input')
print (obj)

Output

[<input id="nm" name="name" type="text"/>, <input id="age" name="age" type="text"/>, <input id="marks" name="marks" type="text"/>]

Example 2

我们将在本示例中使用以下 HTML 脚本:

We shall use following HTML script in this example −

<html>
   <body>
      <h2>Departmentwise Employees</h2>
      <ul id="dept">
      <li>Accounts</li>
         <ul id='acc'>
         <li>Anand</li>
         <li>Mahesh</li>
         </ul>
      <li>HR</li>
         <ol id="HR">
         <li>Rani</li>
         <li>Ankita</li>
         </ol>
      </ul>
   </body>
</html>

我们能够向 find_all() 方法的 name 参数中传递一个字符串。使用字符串,你可以搜索字符串而非标签。你可以传递字符串、正则表达式、列表、函数或真值。

We can pass a string to the name argument of find_all() method. With string you can search for strings instead of tags. You can pass in a string, a regular expression, a list, a function, or the value True.

在本示例中,一个函数被传递给了 name 参数。所有以“A”开头的名称都会由 find_all() 方法返回。

In this example, a function is passed to name argument. All the name starting with 'A' are returned by find_all() method.

from bs4 import BeautifulSoup

def startingwith(ch):
   return ch.startswith('A')

soup = BeautifulSoup(html, 'html.parser')

lst=soup.find_all(string=startingwith)

print (lst)

Output

['Accounts', 'Anand', 'Ankita']

Example 3

在本示例中,我们向 find_all() 方法传递了 limit=2 参数。该方法将返回 <li> 标签的前两次出现。

In this example, we pass limit=2 argument to find_all() method. The method returns first two appearances of <li> tag.

soup = BeautifulSoup(html, 'html.parser')
lst=soup.find_all('li', limit =2)

print (lst)

Output

[<li>Accounts</li>, <li>Anand</li>]