VB.net 2010 视频教程 VB.net 2010 视频教程 python基础视频教程
SQL Server 2008 视频教程 c#入门经典教程 Visual Basic从门到精通视频教程
当前位置:
首页 > temp > 简明python教程 >
  • 第一个爬虫和测试(4)

效果:

 

 将爬取到的数据存为csv文件,只需将printUni()函数换掉。

改动后的代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
import requests
from bs4 import BeautifulSoup
import csv
import os
 
ALL = []
def getHTMLtext(url):
    try:
        = requests.get(url,timeout = 30)
        r.raise_for_status()
        r.encoding = 'utf-8'
        return r.text
    except:
        return ""
 
def fillUni(soup):
    data = soup.find_all('tr')
    for tr in data:
        td1 = tr.find_all('td')
        if len(td1) == 0:
            continue
        Single = []
        for td in td1:
            Single.append(td.string)
        ALL.append(Single)
        
 
def writercsv(save_road,num,title):
    if os.path.isfile(save_road):
        with open(save_road,'a',newline='')as f:
            csv_write=csv.writer(f,dialect='excel')
            for in range(num):
                u=ALL[i]
                csv_write.writerow(u)
    else:
        with open(save_road,'w',newline='')as f:
            csv_write=csv.writer(f,dialect='excel')
            csv_write.writerow(title)
            for in range(num):
                u=ALL[i]
                csv_write.writerow(u)
 
title=["排名","学校名称","省市","总分","生源质量","培养结果","科研规模","科研质量","顶尖成果","顶尖人才","科技服务","产学研究合作","成果转化"]
save_road="C:\\Users\\邓若言\\Desktop\\html.csv"
 
def main(num):
    url = "http://www.zuihaodaxue.com/zuihaodaxuepaiming2016.html"
    html = getHTMLtext(url)
    soup = BeautifulSoup(html,"html.parser")
    fillUni(soup)
    writercsv(save_road,num,title)
 
main(10)

相关教程