Scrapy-统计收集 - 芒果文档

📌 相关文章

📜 Scrapy-统计收集

📅 最后修改于: 2020-10-31 14:40:30 🧑 作者: Mango

描述

Stats Collector是Scrapy提供的一种用于收集键/值形式的统计信息的工具，可使用Crawler API进行访问(Crawler提供对所有Scrapy核心组件的访问)。统计信息收集器为每个蜘蛛网提供一个统计信息表，其中，当蜘蛛网打开时，统计信息收集器自动打开，而当蜘蛛网关闭时，统计信息收集器关闭。

常用统计收集器用途

以下代码使用stats属性访问stats收集器。

class ExtensionThatAccessStats(object): 
   def __init__(self, stats): 
      self.stats = stats  
   
   @classmethod 
   def from_crawler(cls, crawler): 
      return cls(crawler.stats)

下表显示了可与统计信息收集器一起使用的各种选项-


stats.set_value('hostname', socket.gethostname())
stats.inc_value('customized_count')
stats.max_value('max_items_scraped', value)
stats.min_value('min_free_memory_percent', value)
stats.get_value('customized_count')
stats.get_stats() {'custom_count': 1, 'start_time': 
datetime.datetime(2009, 7, 14, 21, 47, 28, 977139)} 

Sr.No	Parameters	Description
1	stats.set_value('hostname', socket.gethostname())	It is used to set the stats value.
2	stats.inc_value('customized_count')	It increments the stat value.
3	stats.max_value('max_items_scraped', value)	You can set the stat value, only if greater than previous value.
4	stats.min_value('min_free_memory_percent', value)	You can set the stat value, only if lower than previous value.
5	stats.get_value('customized_count')	It fetches the stat value.
6	stats.get_stats() {'custom_count': 1, 'start_time': datetime.datetime(2009, 7, 14, 21, 47, 28, 977139)}	It fetches all the stats

可用的统计收集器

Scrapy提供了不同类型的统计信息收集器，可以使用STATS_CLASS设置进行访问。

MemoryStatsCollector

这是默认的Stats收集器，它维护用于抓取的每个Spider的统计信息，数据将存储在内存中。

class scrapy.statscollectors.MemoryStatsCollector

DummyStatsCollector

此统计信息收集器非常有效，它什么也不做。可以使用STATS_CLASS设置进行设置，并且可以用于禁用统计信息收集以提高性能。

class scrapy.statscollectors.DummyStatsCollector