VB.net 2010 视频教程 VB.net 2010 视频教程 python基础视频教程
SQL Server 2008 视频教程 c#入门经典教程 Visual Basic从门到精通视频教程
当前位置:
首页 > Python基础教程 >
  • 爬虫(十八):Scrapy框架(五) Scrapy通用爬虫(4)

NULL'
  • return author
  •  
  • def get_state(self,response):
  • state=response.xpath('//p[@class="tag"]/span/text()').extract()[0]
  • if len(state)>0:
  • state=state.strip()
  • else:
  • st='NULL'
  • return state
  •  
  • def get_type(self,response):
  • type=response.xpath('//p[@class="tag"]/a/text()').extract()
  • if len(type)>0:
  • t=""
  • for i in type:
  • t+=' '+i
  • type=t
  • else:
  • type='NULL'
  • return type
  •  
  • def get_about(self,response):
  • about=response.xpath('//p[@class="intro"]/text()').extract()[0]
  • if len(about)>0:
  • about=about.strip()
  • else:
  • about='NULL'
  • return about
  •  
  • def get_score(self,response):
  •  
  • def get_sc(id):
  • urll = 'https://book.qidian.com/ajax/comment/index?_csrfToken=ziKrBzt4NggZbkfyUMDwZvGH0X0wtrO5RdEGbI9w&bookId=' + id + '&pageSize=15'
  • rr = requests.get(urll)
  • # print(rr)
  • score = rr.text[16:19]
  • return score
  •  
  • bid=response.xpath('//a[@id="bookImg"]/@data-bid').extract()[0] #获取书的id
  • if len(bid)>0:
  • score=get_sc(bid) #调用方法获取评分 若是整数 可能返回 9,"
  • if score[1]==',':
  • score=score.replace(',"',".0")
  • else:
  • score=score
  •  
  • else:
  • score='NULL'
  • return score
  •  
  • def get_story(self,response):
  • story=response.xpath('//div[@class="book-intro"]/p/text()').extract()[0]
  • if len(story)>0:
  • story=story.strip()
  • else:
  • story='NULL'
  • return story
  •  
  • def get_news(self,response):
  • news=response.xpath('//div[@class="detail"]/p[@class="cf"]/a/text()').extract()[0]
  • if len(news)>0:
  • news=news.strip()
  • else:
  • news='NULL'
  • return news
  • items.py:

    
    	
    1. # -*- coding: utf-8 -*-
    2.  
    3. # Define here the models for your scraped items
    4. #
    5. # See documentation in:
    6. # https://doc.scrapy.org/en/latest/topics/items.html
    7.  
    8. from scrapy import Field,Item
    9.  
    10. class QdItem(Item):
    11. # define the fields for your item here like:
    12.  
    13. book_name = Field() #书名
    14. author=Field() #作者
    15. state=Field() #状态
    16. type=Field() #类型
    17. about=Field() #简介
    18. score=Field() #评分
    19. story=Field() #故事
    20. news=Field() #最新章节

    settings.py:

    
          
    
    
    
      
    
    相关教程