Beautiful Soup 简明教程

Beautiful Soup - Find Elements by ID

在 HTML 文档中，通常每个元素都分配了一个唯一的 ID。这使得元素的值可以通过前端代码（如 JavaScript function）来提取。

In an HTML document, usually each element is assigned a unique ID. This enables the value of an element to be extracted by a front-end code such as JavaScript function.

使用 BeautifulSoup，你可以通过给定的元素的 ID 来查找它的内容。可以通过以下两种方法来实现这一点——find() 和 find_all()，以及 select()。

With BeautifulSoup, you can find the contents of a given element by its ID. There are two methods by which this can be achieved - find() as well as find_all(), and select()

Using find() method

BeautifulSoup 对象的 find() 方法搜索满足给定条件（作为参数）的第一个元素。

The find() method of BeautifulSoup object searches for first element that satisfies the given criteria as an argument.

让我们为了这个目的使用以下 HTML 脚本（作为 index.html）：

Let us use the following HTML script (as index.html) for the purpose

<html>
   <head>
      <title>TutorialsPoint</title>
   </head>
   <body>
      <form>
         <input type = 'text' id = 'nm' name = 'name'>
         <input type = 'text' id = 'age' name = 'age'>
         <input type = 'text' id = 'marks' name = 'marks'>
      </form>
   </body>
</html>

下面的 Python 代码找到了 id 为 nm 的元素：

The following Python code finds the element with its id as nm

Example

from bs4 import BeautifulSoup

fp = open("index.html")
soup = BeautifulSoup(fp, 'html.parser')

obj = soup.find(id = 'nm')
print (obj)

Output

<input id="nm" name="name" type="text"/>

Using find_all()

find_all() 方法也接受一个过滤器参数。它返回所有具有给定 id 的元素的列表。在某些 HTML 文档中，通常具有特定 id 的单个元素。因此，使用 find() 来搜索给定的 id 比使用 find_all() 更可取。

The find_all() method also accepts a filter argument. It returns a list of all the elements with the given id. In a certain HTML document, usually a single element with a particular id. Hence, using find() instead of find_all() is preferrable to search for a given id.

Example

from bs4 import BeautifulSoup

fp = open("index.html")
soup = BeautifulSoup(fp, 'html.parser')

obj = soup.find_all(id = 'nm')
print (obj)

Output

[<input id="nm" name="name" type="text"/>]

请注意，find_all() 方法返回一个列表。find_all() 方法还有一个限制参数。将 find_all() 的限制设置为 1 等价于 find()。

Note that the find_all() method returns a list. The find_all() method also has a limit parameter. Setting limit=1 to find_all() is equivalent to find()

obj = soup.find_all(id = 'nm', limit=1)

Using select() method

BeautifulSoup 类中的 select() 方法接受 CSS 选择器作为参数。# 符号是 id 的 CSS 选择器。然后将所需 id 的值传递给 select() 方法。它的工作方式与 find_all() 方法相同。

The select() method in BeautifulSoup class accepts CSS selector as an argument. The # symbol is the CSS selector for id. It followed by the value of required id is passed to select() method. It works as the find_all() method.

Example

from bs4 import BeautifulSoup

fp = open("index.html")
soup = BeautifulSoup(fp, 'html.parser')

obj = soup.select("#nm")
print (obj)

Output

[<input id="nm" name="name" type="text"/>]

Using select_one()

与 find_all() 方法一样，select() 方法也返回一个列表。还有一个 select_one() 方法可以返回给定参数的第一个标记。

Like the find_all() method, the select() method also returns a list. There is also a select_one() method to return the first tag of the given argument.

Example

from bs4 import BeautifulSoup

fp = open("index.html")
soup = BeautifulSoup(fp, 'html.parser')

obj = soup.select_one("#nm")
print (obj)

Output

<input id="nm" name="name" type="text"/>