VB.net 2010 视频教程 VB.net 2010 视频教程 python基础视频教程
SQL Server 2008 视频教程 c#入门经典教程 Visual Basic从门到精通视频教程
当前位置:
首页 > Python基础教程 >
  • 爬虫(十八):Scrapy框架(五) Scrapy通用爬虫(6)

process_start_requests(self, start_requests, spider):
  • # Called with the start requests of the spider, and works
  • # similarly to the process_spider_output() method, except
  • # that it doesn’t have a response associated.
  •  
  • # Must return only requests (not items).
  • for r in start_requests:
  • yield r
  •  
  • def spider_opened(self, spider):
  • spider.logger.info('Spider opened: %s' % spider.name)
  •  
  •  
  • class qdDownloaderMiddleware(object):
  • # Not all methods need to be defined. If a method is not defined,
  • # scrapy acts as if the downloader middleware does not modify the
  • # passed objects.
  •  
  • @classmethod
  • def from_crawler(cls, crawler):
  • # This method is used by Scrapy to create your spiders.
  • s = cls()
  • crawler.signals.connect(s.spider_opened, signal=signals.spider_opened)
  • return s
  •  
  • def process_request(self, request, spider):
  • # Called for each request that goes through the downloader
  • # middleware.
  •  
  • # Must either:
  • # - return None: continue processing this request
  • # - or return a Response object
  • # - or return a Request object
  • # - or raise IgnoreRequest: process_exception() methods of
  • # installed downloader middleware will be called
  • return None
  •  
  • def process_response(self, request, response, spider):
  • # Called with the response returned from the downloader.
  •  
  • # Must either;
  • # - return a Response object
  • # - return a Request object
  • # - or raise IgnoreRequest
  • return response
  •  
  • def process_exception(self, request, exception, spider):
  • # Called when a download handler or a process_request()
  • # (from other downloader middleware) raises an exception.
  •  
  • # Must either:
  • # - return None: continue processing this exception
  • # - return a Response object: stops process_exception() chain
  • # - return a Request object: stops process_exception() chain
  • pass
  •  
  • def spider_opened(self, spider):
  • spider.logger.info('Spider opened: %s' % spider.name)
  • pipelines.py:

    
    	
    1. # -*- coding: utf-8 -*-
    2.  
    3. # Define your item pipelines here
    4. #
    5. # Don't forget to add your pipeline to the ITEM_PIPELINES setting
    6. # See: https://docs.scrapy.org/en/latest/topics/item-pipeline.html
    7.  
    8.  
    9. class QdPipeline(object):
    10. def process_item(self, item, spider):
    11. return item

    
    相关教程
    关于我们--广告服务--免责声明--本站帮助-友情链接--版权声明--联系我们       黑ICP备07002182号