scrpay 中的 custom_settings - Python (1)

📌 相关文章

📜 scrpay 中的 custom_settings - Python (1)

📅 最后修改于: 2023-12-03 15:20:02.039000 🧑 作者: Mango

关于 scrapy 中的 custom_settings

Scrapy 是一个强大的 Python 爬虫框架，它提供了一系列的功能使得爬取网站变得容易。其中一个很有用的功能是 custom_settings。

什么是 custom_settings？

Scrapy 的 Spider 类包含一个 custom_settings 变量，它是一个字典类型，可以用于覆盖全局设置（global settings）的值。换句话说，它允许你在每个 spider 中重新定义 Scrapy 的默认设置。

如何使用 custom_settings？

要使用 custom_settings，只需要在 spider 中定义一个类变量，如下所示：

import scrapy


class MySpider(scrapy.Spider):
    name = 'myspider'
    custom_settings = {
        'DOWNLOAD_DELAY': 1,
        'CONCURRENT_REQUESTS': 4,
    }

    # start_requests, parse 等方法等等

在这个例子中，custom_settings 变量被定义为一个字典，它包含两个键值对，分别是 DOWNLOAD_DELAY 和 CONCURRENT_REQUESTS。在这个 spider 中，这些值将覆盖 Scrapy 的默认设置。

custom_settings 中的常用设置

下面是 custom_settings 中经常使用的一些设置，包括：

DOWNLOAD_DELAY：下载每个页面之间的延迟。默认值为 0 秒。
CONCURRENT_REQUESTS：同时进行的请求数量。默认值为 16。
ROBOTSTXT_OBEY：如果设置为 True，则 Scrapy 会自动遵守 robots.txt。默认值为 True。
USER_AGENT：用于伪装浏览器的 user agent 字符串。默认情况下，Scrapy 使用其自己的 user agent 字符串。
COOKIES_ENABLED：是否启用 cookie。默认为 True。

custom_settings 中的优先级

当你在 spider 中定义一个 custom_settings 变量时，它将覆盖 Scrapy 的全局设置（global settings）中相应的变量。但是，如果你在命令行中传递了一个值（例如，使用 -s），它将覆盖 custom_settings。

结论

在 Scrapy 中使用 custom_settings 可以让您定制每个爬虫的设置，使其更适合你的需求。因此，在编写爬虫时，请务必了解这个有用的功能。