Scrapy 简明教程

Scrapy - Define an Item

Description

项目是用于收集从网站获取的数据的容器。你必须通过定义你的项目来启动你的爬虫。要定义项目,编辑在目录 first_scrapy 下找到的 items.py 文件(自定义目录)。items.py 看起来如下:

Items are the containers used to collect the data that is scrapped from the websites. You must start your spider by defining your Item. To define items, edit items.py file found under directory first_scrapy (custom directory). The items.py looks like the following −

import scrapy

class First_scrapyItem(scrapy.Item):
   # define the fields for your item here like:
      # name = scrapy.Field()

MyItem 类继承了包含一系列 Scrapy 已经为我们构建的预定义对象的 Item。例如,如果你想从站点中提取名称、URL 和描述,则需要为这三个属性中的每一个定义字段。

The MyItem class inherits from Item containing a number of pre-defined objects that Scrapy has already built for us. For instance, if you want to extract the name, URL, and description from the sites, you need to define the fields for each of these three attributes.

因此,让我们添加我们要收集的那些项目:

Hence, let’s add those items that we want to collect −

from scrapy.item import Item, Field

class First_scrapyItem(scrapy.Item):
   name = scrapy.Field()
   url = scrapy.Field()
   desc = scrapy.Field()