Scrapy 简明教程
Scrapy - Items
Description
Scrapy 进程可用于从诸如使用爬虫的网页等来源提取数据。Scrapy 使用 Item 类生成输出,其对象用于收集抓取的数据。
Scrapy process can be used to extract the data from sources such as web pages using the spiders. Scrapy uses Item class to produce the output whose objects are used to gather the scraped data.
Declaring Items
你可以使用类定义语法以及如下所示的字段对象来声明项目 -
You can declare the items using the class definition syntax along with the field objects shown as follows −
import scrapy
class MyProducts(scrapy.Item):
productName = Field()
productLink = Field()
imageURL = Field()
price = Field()
size = Field()
Item Fields
项目字段用于显示每个字段的元数据。由于字段对象上的值没有限制,可访问的元数据键不会包含元数据的任何参考列表。字段对象用于指定所有字段元数据,你可以根据项目中的要求指定任何其他字段键。可以使用 Item.fields 属性访问字段对象。
The item fields are used to display the metadata for each field. As there is no limitation of values on the field objects, the accessible metadata keys does not ontain any reference list of the metadata. The field objects are used to specify all the field metadata and you can specify any other field key as per your requirement in the project. The field objects can be accessed using the Item.fields attribute.
Extending Items
可以通过声明原始项目的子类来扩展项目。例如 -
The items can be extended by stating the subclass of the original item. For instance −
class MyProductDetails(Product):
original_rate = scrapy.Field(serializer = str)
discount_rate = scrapy.Field()
你可以使用现有的字段元数据通过添加更多值或更改现有值来扩展字段元数据,如下面的代码所示 -
You can use the existing field metadata to extend the field metadata by adding more values or changing the existing values as shown in the following code −
class MyProductPackage(Product):
name = scrapy.Field(Product.fields['name'], serializer = serializer_demo)
Item Objects
可以使用以下类指定项目对象,该类从给定的参数中提供新的已初始化项目 -
The item objects can be specified using the following class which provides the new initialized item from the given argument −
class scrapy.item.Item([arg])
Item 提供构造函数的副本,并提供一个由项目中的字段给出的额外属性。
The Item provides a copy of the constructor and provides an extra attribute that is given by the items in the fields.