有没有一种方法可以在Spider类终止之前触发它?
我可以自己终止spider,如下所示:
class MySpider(CrawlSpider): #Config stuff goes here... def quit(self): #Do some stuff... raise CloseSpider('MySpider is quitting now.') def my_parser(self, response): if termination_condition: self.quit() #Parsing stuff goes here...
但是我找不到任何有关如何确定spider何时自然退出的信息。
看来你可以通过来注册信号监听器dispatcher。
dispatcher
我会尝试类似的东西:
from scrapy import signals from scrapy.xlib.pydispatch import dispatcher class MySpider(CrawlSpider): def __init__(self): dispatcher.connect(self.spider_closed, signals.spider_closed) def spider_closed(self, spider): # second param is instance of spder about to be closed.