Scrapy 简明教程

Scrapy - Logging

Description

Logging 表示使用内置日志系统进行的事件跟踪，并定义用于实现应用程序和库的功能和类。日志记录是一种可随时使用的材料，它可以与记录设置中列出的 Scrapy 设置配合使用。

Logging means tracking of events, which uses built-in logging system and defines functions and classes to implement applications and libraries. Logging is a ready-to-use material, which can work with Scrapy settings listed in Logging settings.

在运行命令时，Scrapy 会设置一些默认设置，并通过 scrapy.utils.log.configure_logging() 处理这些设置。

Scrapy will set some default settings and handle those settings with the help of scrapy.utils.log.configure_logging() when running commands.

Log levels

在 Python 中，一个日志消息具有五种不同的严重程度级别。以下列表以升序列出了标准日志消息 -

In Python, there are five different levels of severity on a log message. The following list shows the standard log messages in an ascending order −

logging.DEBUG − for debugging messages (lowest severity)
logging.INFO − for informational messages
logging.WARNING − for warning messages
logging.ERROR − for regular errors
logging.CRITICAL − for critical errors (highest severity)

How to Log Messages

下面的代码显示使用 logging.info 级别记录消息。

The following code shows logging a message using logging.info level.

import logging
logging.info("This is an information")

上述记录消息可以用 logging.log 作为参数，显示如下：

The above logging message can be passed as an argument using logging.log shown as follows −

import logging
logging.log(logging.INFO, "This is an information")

现在，你还可以使用记录器用日志助手日志将消息包含起来以清楚地显示出日志消息，如下所示：

Now, you can also use loggers to enclose the message using the logging helpers logging to get the logging message clearly shown as follows −

import logging
logger = logging.getLogger()
logger.info("This is an information")

可以有多个记录器，可以通过使用 logging.getLogger 函数获取其名称来访问它们，显示如下。

There can be multiple loggers and those can be accessed by getting their names with the use of logging.getLogger function shown as follows.

import logging
logger = logging.getLogger('mycustomlogger')
logger.info("This is an information")

对于任何模块，可以使用 name 变量来使用自定义记录器，它包含了模块路径，如下所示：

A customized logger can be used for any module using the name variable which contains the module path shown as follows −

import logging
logger = logging.getLogger(__name__)
logger.info("This is an information")

Logging from Spiders

每个爬取器实例都拥有一个 logger ，并可以使用，如下所示：

Every spider instance has a logger within it and can used as follows −

import scrapy

class LogSpider(scrapy.Spider):
   name = 'logspider'
   start_urls = ['http://dmoz.com']
   def parse(self, response):
      self.logger.info('Parse function called on %s', response.url)

在上面的代码中，记录器是使用爬取器的名称创建的，但是你可以使用 Python 提供的任何自定义记录器，如下所示：

In the above code, the logger is created using the Spider’s name, but you can use any customized logger provided by Python as shown in the following code −

import logging
import scrapy

logger = logging.getLogger('customizedlogger')
class LogSpider(scrapy.Spider):
   name = 'logspider'
   start_urls = ['http://dmoz.com']

   def parse(self, response):
      logger.info('Parse function called on %s', response.url)

Logging Configuration

记录器无法自行显示它们发送的消息。因此，它们需要“处理器”来显示这些消息，而处理器会将这些消息重定向到各自的目的地，如文件、电子邮件和标准输出。

Loggers are not able to display messages sent by them on their own. So they require "handlers" for displaying those messages and handlers will be redirecting these messages to their respective destinations such as files, emails, and standard output.

根据下列设置，Scrapy 会为记录器配置处理器。

Depending on the following settings, Scrapy will configure the handler for logger.

Logging Settings

下列设置用于配置日志 −

The following settings are used to configure the logging −

The LOG_FILE and LOG_ENABLED decide the destination for log messages.
When you set the LOG_ENCODING to false, it won’t display the log output messages.
The LOG_LEVEL will determine the severity order of the message; those messages with less severity will be filtered out.
The LOG_FORMAT and LOG_DATEFORMAT are used to specify the layouts for all messages.
When you set the LOG_STDOUT to true, all the standard output and error messages of your process will be redirected to log.

Command-line Options

可以通过传递命令行参数来覆盖 Scrapy 设置，如下表所示：

Scrapy settings can be overridden by passing command-line arguments as shown in the following table −

Sr.No

Command & Description

--logfile FILE Overrides LOG_FILE

--loglevel/-L LEVEL Overrides LOG_LEVEL

--nolog Sets LOG_ENABLED to False

scrapy.utils.log module

此函数可用于初始化 Scrapy 的默认日志记录。

This function can be used to initialize logging defaults for Scrapy.

scrapy.utils.log.configure_logging(settings = None, install_root_handler = True)

Sr.No

Parameter & Description

settings (dict, None) It creates and configures the handler for root logger. By default, it is None.

install_root_handler (bool) It specifies to install root logging handler. By default, it is True.

以上函数 −

The above function −

Routes warnings and twisted loggings through Python standard logging.
Assigns DEBUG to Scrapy and ERROR level to Twisted loggers.
Routes stdout to log, if LOG_STDOUT setting is true.

可以使用 settings 参数覆盖默认选项。当未指定设置时，则使用默认值。当 install_root_handler 设为 true 时，可以为根日志记录器创建处理程序。如果将其设为 false，则不会设置任何日志输出。在使用 Scrapy 命令时，configure_logging 将自动调用，并且在运行自定义脚本时可以显式运行。

Default options can be overridden using the settings argument. When settings are not specified, then defaults are used. The handler can be created for root logger, when install_root_handler is set to true. If it is set to false, then there will not be any log output set. When using Scrapy commands, the configure_logging will be called automatically and it can run explicitly, while running the custom scripts.

若要手动配置日志记录输出，可以使用 logging.basicConfig() ，如下所示 −

To configure logging’s output manually, you can use logging.basicConfig() shown as follows −

import logging
from scrapy.utils.log import configure_logging

configure_logging(install_root_handler = False)
logging.basicConfig (
   filename = 'logging.txt',
   format = '%(levelname)s: %(your_message)s',
   level = logging.INFO
)