📜  Scrapy-饲料出口

📅  最后修改于: 2020-10-31 14:35:02             🧑  作者: Mango


描述

Feed导出是一种存储从站点抓取的数据的方法,该方法会生成“导出文件”

序列化格式

Feed Exports使用多种序列化格式和存储后端,使用Item导出程序并生成包含刮擦项的feed。

下表显示了受支持的格式-

Sr.No Format & Description
1

JSON

FEED_FORMAT is json

Exporter used is class scrapy.exporters.JsonItemExporter

2

JSON lines

FEED_FROMAT is jsonlines

Exporter used is class scrapy.exporters.JsonLinesItemExporter

3

CSV

FEED_FORMAT is CSV

Exporter used is class scrapy.exporters.CsvItemExporter

4

XML

FEED_FORMAT is xml

Exporter used is class scrapy.exporters.XmlItemExporter

使用FEED_EXPORTERS设置,还可以扩展支持的格式-

Sr.No Format & Description
1

Pickle

FEED_FORMAT is pickel

Exporter used is class scrapy.exporters.PickleItemExporter

2

Marshal

FEED_FORMAT is marshal

Exporter used is class scrapy.exporters.MarshalItemExporter

储存后端

存储后端定义使用URI将提要存储在何处。

下表显示了受支持的存储后端-

Sr.No Storage Backend & Description
1

Local filesystem

URI scheme is file and it is used to store the feeds.

2

FTP

URI scheme is ftp and it is used to store the feeds.

3

S3

URI scheme is S3 and the feeds are stored on Amazon S3. External libraries botocore or boto are required.

4

Standard output

URI scheme is stdout and the feeds are stored to the standard output.

存储URI参数

以下是存储URL的参数,在创建提要时将替换它们-

  • %(time)s:此参数被时间戳替换。
  • %(name)s:该参数被蜘蛛名称代替。

设定值

下表显示了可用于配置Feed导出的设置-

Sr.No Setting & Description
1

FEED_URI

It is the URI of the export feed used to enable feed exports.

2

FEED_FORMAT

It is a serialization format used for the feed.

3

FEED_EXPORT_FIELDS

It is used for defining fields which needs to be exported.

4

FEED_STORE_EMPTY

It defines whether to export feeds with no items.

5

FEED_STORAGES

It is a dictionary with additional feed storage backends.

6

FEED_STORAGES_BASE

It is a dictionary with built-in feed storage backends.

7

FEED_EXPORTERS

It is a dictionary with additional feed exporters.

8

FEED_EXPORTERS_BASE

It is a dictionary with built-in feed exporters.