当前位置:
首页 > Python基础教程 >
-
Python妹子图爬虫实战项目【新手必学】(2)
= webdriver.Chrome(chrome_path, chrome_options=chrome_options)
for girl_url in self.girl_urls:
driver.get(girl_url)
time.sleep(3)
driver.find_element_by_xpath('//em[@class="ch all"]').click()
time.sleep(3)
html = driver.page_source
selector = etree.HTML(html)
self.girl_name = selector.xpath('//div[@class="article"]/h2/text()')[0]
self.pic_urls = selector.xpath('//div[@id="content"]/img/@data-img')
try:
self.download_pic()
except Exception as e:
print("{}保存失败".format(self.girl_name) + str(e))
# 下载图片的方法
def download_pic(self):
try:
os.mkdir(PICTURES_PATH)
except:
pass
girl_path = PICTURES_PATH + self.girl_name
try:
os.mkdir(girl_path)
except Exception as e:
print("{}已存在".format(self.girl_name))
img_name = 0
for pic_url in self.pic_urls:
img_name += 1
img_data = requests.get(pic_url, headers=headers)
pic_path = girl_path + '/' + str(img_name)+'.jpg'
if os.path.isfile(pic_path):
print("{}第{}张已存在".format(self.girl_name, img_name))
pass
else:
with open(pic_path, 'wb')as f:
f.write(img_data.content)
print("正在保存{}第{}张".format(self.girl_name, img_name))
f.close()
return
# 爬虫的启动方法,按照爬虫逻辑依次调用方法
def start(self):
self.get_page_urls()
self.get_girl_urls()
self.get_pic_urls()
# main函数
if __name__ == '__main__':
page_num = input("请输入页码:")
mmjpg_spider = Spider(page_num)
mmjpg_spider.start()
现在你就可以慢慢爬妹子图了,注意最好在请求的时候time.sleep()
几秒钟,请求太频繁的话,也有一定的概率被识别为爬虫的,虽然我并没有实验,但是我也建议你这么做,因为,过于频繁的请求还是会让服务器吃不消,看在人家的图片这么良心的情况下,爬慢点......另外要注意:很多人学Python过程中会遇到各种烦恼问题,没有人解答容易放弃。