第一个爬虫和测试(2)

当前位置:

首页 > temp > 简明python教程 >

第一个爬虫和测试(2)

结果如下：

则测试得代码无误。

二、用requests库的get()函数访问必应搜狗主页20次，打印返回状态，text内容，并且计算text（）属性和content属性返回网页内容的长度。

关于requests库的内容可戳以下链接

https://www.cnblogs.com/deng11/p/12863994.html

			
									import requests

									for i in range(20):

									    r=requests.get("https://www.sogou.com",timeout=30)   #网页链接可换

									    r.raise_for_status()

									    r.encoding='utf-8'

									    print('状态={}'.format(r.status_code))

									    print(r.text)

									    print('text属性长度{}，content属性长度{}'.format(len(r.text),len(r.content)))

　结果如下（取20次中的其中一次，text属性内容太长所以不展示出来）：

三、根据所给的html页面，保持为字符串，完成如下要求：

（1）打印head标签内容和你学号的后两位

（2）获取body标签的内容

（3）获取id的first的标签对象

（4）获取并打印html页面中的中文字符

			
									<!DOCTYPE html>

									<html>

									<head>

									<meta charset="utf-8">

									<title>菜鸟教程(runoob.com)</title>

									</head>

									<body>

									<h1>我的第一个标题</h1>

									<p id="first">我的第一个段落。</p>

									</body>

									    <table border="1">

									        <tr>

									            <td>row 1, cell 1</td>

									            <td>row 1, cell 2</td>

									        </tr>

									    </table>

									</html>

栏目列表