Scrapy 简明教程

Scrapy - Telnet Console

Description

Telnet 控制台是一个 Python 外壳,该外壳在 Scrapy 流程内部运行,用于检查和控制要运行的 Scrapy 流程。

Telnet console is a Python shell which runs inside Scrapy process and is used for inspecting and controlling a Scrapy running process.

Access Telnet Console

可以使用以下命令访问 telnet 控制台 −

The telnet console can be accessed using the following command −

telnet localhost 6023

基本上,telnet 控制台在 TELNETCONSOLE_PORT 中所述的 TCP 端口中列出。

Basically, telnet console is listed in TCP port, which is described in TELNETCONSOLE_PORT settings.

Variables

下表中所述的某些默认变量用作快捷方式 −

Some of the default variables given in the following table are used as shortcuts −

Sr.No

Shortcut & Description

1

crawler This refers to the Scrapy Crawler (scrapy.crawler.Crawler) object.

2

engine This refers to Crawler.engine attribute.

3

spider This refers to the spider which is active.

4

slot This refers to the engine slot.

5

extensions This refers to the Extension Manager (Crawler.extensions) attribute.

6

stats This refers to the Stats Collector (Crawler.stats) attribute.

7

setting This refers to the Scrapy settings object (Crawler.settings) attribute.

8

est This refers to print a report of the engine status.

9

prefs This refers to the memory for debugging.

10

p This refers to a shortcut to the pprint.pprint function.

11

hpy This refers to memory debugging.

Examples

以下是使用 Telnet 控制台说明的一些示例。

Following are some examples illustrated using Telnet Console.

Pause, Resume and Stop the Scrapy Engine

要暂停 Scrapy 引擎,请使用以下命令 -

To pause Scrapy engine, use the following command −

telnet localhost 6023
>>> engine.pause()
>>>

要恢复 Scrapy 引擎,请使用以下命令 -

To resume Scrapy engine, use the following command −

telnet localhost 6023
>>> engine.unpause()
>>>

要停止 Scrapy 引擎,请使用以下命令 -

To stop Scrapy engine, use the following command −

telnet localhost 6023
>>> engine.stop()
Connection closed by foreign host.

View Engine Status

Telnet 控制台使用 est() 方法检查 Scrapy 引擎状态,如下面的代码中所示 -

Telnet console uses est() method to check the status of Scrapy engine as shown in the following code −

telnet localhost 6023
>>> est()
Execution engine status

time()-engine.start_time                        : 8.62972998619
engine.has_capacity()                           : False
len(engine.downloader.active)                   : 16
engine.scraper.is_idle()                        : False
engine.spider.name                              : followall
engine.spider_is_idle(engine.spider)            : False
engine.slot.closing                             : False
len(engine.slot.inprogress)                     : 16
len(engine.slot.scheduler.dqs or [])            : 0
len(engine.slot.scheduler.mqs)                  : 92
len(engine.scraper.slot.queue)                  : 0
len(engine.scraper.slot.active)                 : 0
engine.scraper.slot.active_size                 : 0
engine.scraper.slot.itemproc_size               : 0
engine.scraper.slot.needs_backout()             : False

Telnet Console Signals

你可以使用 telnet 控制台信号在 telnet 本地命名空间添加、更新或删除变量。要执行此操作,你需要在处理程序中添加 telnet_vars 字典。

You can use the telnet console signals to add, update, or delete the variables in the telnet local namespace. To perform this action, you need to add the telnet_vars dict in your handler.

scrapy.extensions.telnet.update_telnet_vars(telnet_vars)

参数 -

Parameters −

telnet_vars (dict)

其中,dict 是一个包含 telnet 变量的字典。

Where, dict is a dictionary containing telnet variables.

Telnet Settings

下表显示控制 Telnet 控制台行为的设置 -

The following table shows the settings that control the behavior of Telnet Console −

Sr.No

Settings & Description

Default Value

1

TELNETCONSOLE_PORT This refers to port range for telnet console. If it is set to none, then the port will be dynamically assigned.

[6023, 6073]

2

TELNETCONSOLE_HOST This refers to the interface on which the telnet console should listen.

'127.0.0.1'