Scrapy 简明教程

Scrapy - Feed exports

Description

数据提要导出是一种存储从网站抓取数据的技术,即生成 "export file"

Feed exports is a method of storing the data scraped from the sites, that is generating a "export file".

Serialization Formats

通过使用多种序列化格式和存储后端,数据提要导出可使用项目导出器并根据抓取的项目生成提要。

Using multiple serialization formats and storage backends, Feed Exports use Item exporters and generates a feed with scraped items.

下表展示受支持的格式:

The following table shows the supported formats−

Sr.No

Format & Description

1

JSON FEED_FORMAT is json Exporter used is class scrapy.exporters.JsonItemExporter

2

JSON lines FEED_FROMAT is jsonlines Exporter used is class scrapy.exporters.JsonLinesItemExporter

3

CSV FEED_FORMAT is CSV Exporter used is class scrapy.exporters.CsvItemExporter

4

XML FEED_FORMAT is xml Exporter used is class scrapy.exporters.XmlItemExporter

通过使用 FEED_EXPORTERS 设置,受支持的格式还可以得到扩展 −

Using FEED_EXPORTERS settings, the supported formats can also be extended −

Sr.No

Format & Description

1

Pickle FEED_FORMAT is pickel Exporter used is class scrapy.exporters.PickleItemExporter

2

Marshal FEED_FORMAT is marshal Exporter used is class scrapy.exporters.MarshalItemExporter

Storage Backends

存储后端定义了在何处存储使用 URI 的数据提要。

Storage backend defines where to store the feed using URI.

下表展示了受支持的存储后端 −

Following table shows the supported storage backends −

Sr.No

Storage Backend & Description

1

Local filesystem URI scheme is file and it is used to store the feeds.

2

FTP URI scheme is ftp and it is used to store the feeds.

3

S3 URI scheme is S3 and the feeds are stored on Amazon S3. External libraries botocore or boto are required.

4

Standard output URI scheme is stdout and the feeds are stored to the standard output.

Storage URI Parameters

以下是存储 URL 的参数,在创建数据提要时替换它 −

Following are the parameters of storage URL, which gets replaced while the feed is being created −

  1. %(time)s: This parameter gets replaced by a timestamp.

  2. %(name)s: This parameter gets replaced by spider name.

Settings

下表显示了用于配置 Feed 导出的设置:

Following table shows the settings using which Feed exports can be configured −

Sr.No

Setting & Description

1

FEED_URI It is the URI of the export feed used to enable feed exports.

2

FEED_FORMAT It is a serialization format used for the feed.

3

FEED_EXPORT_FIELDS It is used for defining fields which needs to be exported.

4

FEED_STORE_EMPTY It defines whether to export feeds with no items.

5

FEED_STORAGES It is a dictionary with additional feed storage backends.

6

FEED_STORAGES_BASE It is a dictionary with built-in feed storage backends.

7

FEED_EXPORTERS It is a dictionary with additional feed exporters.

8

FEED_EXPORTERS_BASE It is a dictionary with built-in feed exporters.