Beautiful Soup 简明教程

Beautiful Soup - select() Method

Method Description

在 BeautifulSoup 库中,select() 方法是抓取 HTML/XML 文档的一个重要工具。与 find() 和 find_*() 方法类似,select() 方法也有助于找到满足给定条件的元素。根据给定的 CSS 选择器(作为参数)在文档树中选择元素。

In Beautiful Soup library, the select() method is an important tool for scraping the HTML/XML document. Similar to find() and find_*() methods, the select() method also helps in locating an element that satisfies a given criteria. The selection of an element in the document tree is done based on the CSS selector given to it as an argument.

Beautiful Soup 还具有 select_one() 方法。select() 和 select_one() 之间的区别在于,select() 返回属于 PageElement 并由 CSS 选择器表征的所有元素的 ResultSet;而 select_one() 返回满足基于 CSS 选择器选择标准的元素的第一个出现。

Beautiful Soup also has select_one() method. Difference in select() and select_one() is that, select() returns a ResultSet of all the elements belonging to the PageElement and characterized by the CSS selector; whereas select_one() returns the first occurrence of the element satisfying the CSS selector based selection criteria.

在 BeautifulSoup 4.7 版之前,select() 方法通常只支持常见的 CSS 选择器。从 4.7 版开始,BeautifulSoup 与 Soup Sieve CSS 选择器库集成在一起。因此,现在可以使用更多的选择器。在 4.12 版中,除了现有的便捷方法 select() 和 select_one() 之外,还添加了一个 .css 属性。

Prior to Beautiful Soup version 4.7, the select() method used to be able to support only the common CSS selectors. With version 4.7, Beautiful Soup was integrated with Soup Sieve CSS selector library. As a result, much more selectors can now be used. In the version 4.12, a .css property has been added in addition to the existing convenience methods, select() and select_one().

Syntax

select(selector, limit, **kwargs)

Parameters

  1. selector − A string containing a CSS selector.

  2. limit − After finding this number of results, stop looking.

  3. kwargs − Keyword arguments to be passed.

如果将限制参数设置为 1,它将等同于 select_one() 方法。

If the limit parameter is set to 1, it becomes equivalent to select_one() method.

Return Value

select() 方法返回一个 Tag 对象的结果集。select_one() 方法返回一个单独的 Tag 对象。

The select() method returns a ResultSet of Tag objects. The select_one() method returns a single Tag object.

Soup Sieve 库具有不同类型的 CSS 选择器。基本的 CSS 选择器为 −

The Soup Sieve library has different types of CSS selectors. The basic CSS selectors are −

  1. Type selectors match elements by node name. For example −

tags = soup.select('div')
  1. The Universal selector (*) matches elements of any type. Example −

tags = soup.select('*')
  1. The ID selector matches an element based on its id attribute. The symbol # denotes the ID selector. Example −

tags = soup.select("#nm")
  1. The class selector matches an element based on the values contained in the class attribute. The . symbol prefixed to the class name is the CSS class selector. Example −

tags = soup.select(".submenu")

Example: Type Selector

from bs4 import BeautifulSoup, NavigableString

markup = '''
   <div id="Languages">
      <p>Java</p> <p>Python</p> <p>C++</p>
   </div>
'''
soup = BeautifulSoup(markup, 'html.parser')

tags = soup.select('div')
print (tags)

Output

[<div id="Languages">
<p>Java</p> <p>Python</p> <p>C++</p>
</div>]

Example: ID selector

from bs4 import BeautifulSoup

html = '''
   <form>
      <input type = 'text' id = 'nm' name = 'name'>
      <input type = 'text' id = 'age' name = 'age'>
      <input type = 'text' id = 'marks' name = 'marks'>
   </form>
'''
soup = BeautifulSoup(html, 'html.parser')
obj = soup.select("#nm")
print (obj)

Output

[<input id="nm" name="name" type="text"/>]

Example: class selector

html = '''
   <ul>
      <li class="mainmenu">Accounts</li>
      <ul>
         <li class="submenu">Anand</li>
         <li class="submenu">Mahesh</li>
      </ul>
      <li class="mainmenu">HR</li>
      <ul>
         <li class="submenu">Rani</li>
         <li class="submenu">Ankita</li>
      </ul>
   </ul>
'''
from bs4 import BeautifulSoup

soup = BeautifulSoup(html, 'html.parser')
tags = soup.select(".mainmenu")
print (tags)

Output

[<li class="mainmenu">Accounts</li>, <li class="mainmenu">HR</li>]