《爬虫学习》（五）（爬虫实战之爬取天气信息）(4)

当前位置:

首页 > Python基础教程 >

《爬虫学习》（五）（爬虫实战之爬取天气信息）(4)

修改了一下main方法：获取全国数据

4.数据排序找出全国气温最高十大城市：

		
								# 排序找出十大温度最高的城市

								# 按照温度排序

								data.sort(key=lambda x:int(x['最高气温']))

								#十大温度最高的城市

								data_2 = data[-10:]

其中在排序的时候注意：要转化为int型才可以进行排序，否则是按照string进行排序的。

5.数据可视化：

		
								citys = list(map(lambda x:x['城市'], data_2))#横坐标

								wendu = list(map(lambda x:x['最高气温'], data_2))#纵坐标

								charts = Bar('中国十大最高温度城市')

								charts.add('', citys, wendu)

								charts.render('天气网.html')

使用Bar模块：

　　Bar方法主要可以给该图标命名

　　add方法主要是添加（图颜色的名称，横坐标名，纵坐标名）

　　render主要是存储在本地之中

结果展示：

完整代码：

		
								#数据可视化

								from pyecharts import Bar

								#用来url连接登陆等功能

								import requests

								#解析数据

								from bs4 import BeautifulSoup

								#用来存取爬取到的数据

								data = []

								def parse_data(url):

								    headers = {

								        'User-Agent':"Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.25 Safari/537.36 Core/1.70.3741.400 QQBrowser/10.5.3863.400"

								    }

								    rest = requests.get(url=url, headers=headers)#使用requests.get方法爬取网页

								    # 一般人可能会用rest.text，但是会显示乱码

								    text = rest.content.decode('utf-8')#使用utf-8解码，防止显示乱码，接下来无法解析

								    soup = BeautifulSoup(text, 'html5lib')#BeautifulSoup方法需要指定解析文本和解析方式

								    # 爬取数据

								    cons = soup.find('div', attrs={'class':'conMidtab'})

								    tables = cons.find_all('table')

								    for table in tables:

								        trs = table.find_all('tr')[2:]

								        for index,tr in enumerate(trs):

								            if index == 0:

								                tds = tr.find_all('td')[1]

								                qiwen = tr.find_all('td')[4]

								            else:

								                tds = tr.find_all('td')[0]

								                qiwen = tr.find_all('td')[3]

								            city = list(tds.stripped_strings)[0]

								            wendu = list(qiwen.stripped_strings)[0]

								            data.append({'城市':city, '最高气温':wendu})

								def main():

								    urls = [

								        "http://www.weather.com.cn/textFC/hb.shtml",

								        "http://www.weather.com.cn/textFC/db.shtml",

								        "http://www.weather.com.cn/textFC/hd.shtml",

								        "http://www.weather.com.cn/textFC/hz.shtml",

								        "http://www.weather.com.cn/textFC/hn.shtml",

								        "http://www.weather.com.cn/textFC/xb.shtml",

								        "http://www.weather.com.cn/textFC/xn.shtml",

								        "http://www.weather.com.cn/textFC/gat.shtml"

								    ]

								    for url in urls:

								        parse_data(url)

								    # 排序找出十大温度最高的城市

								    # 按照温度排序

								    data.sort(key=lambda x:int(x['最高气温']))

								    #十大温度最高的城市

								    data_2 = data[-10:]

								    # 数据可视化

								    citys = list(map(lambda x:x['城市'], data_2))#横坐标

								    wendu = list(map(lambda x:x['最高气温'], data_2))#纵坐标

								    charts = Bar('中国十大最高温度城市')

								    charts.add('', citys, wendu)

								    charts.render('天气网.html')

								if __name__ == '__main__':

								    main()

栏目列表