Python爬虫是如何遍历文档树呢？一招教你(4)

当前位置:

首页 > temp > 简明python教程 >

Python爬虫是如何遍历文档树呢？一招教你(4)

运行结果

		
								<generator object descendants at 0x00519AB0>

								<title>The Dormouse's story</title>

								The Dormouse's story

3.节点内容：.string属性

如果Tag只有一个NavigableString类型子节点，那么这个Tag可以使用.string得到子节点。如果一个Tag仅有一个子节点，那么这个Tab也可以使用.string方法，输出结果与当前唯一子节点的.string结果相同。

通俗点来讲就是：如果一个标签里面没有标签了，那么.string就会返回标签里面的内容。如果标签里面只有唯一的一个标签了，那么.string也会返回里面的内容。例如：

		
								#!/usr/bin/python3

								# -*- coding:utf-8 -*-

								from bs4 import BeautifulSoup

								html = """

								<html><head><title>The Dormouse's story</title></head>

								<body>

								<p class="title" name="dromouse"><b>The Dormouse's story</b></p>

								<p class="story">Once upon a time there were three little sisters; and their names were

								<a href="http://example.com/elsie" class="sister" id="link1"><!-- Elsie --></a>,

								<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and

								<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;

								and they lived at the bottom of a well.</p>

								<p class="story">...</p>

								"""

								# 创建 Beautiful Soup 对象，指定lxml解析器

								soup = BeautifulSoup(html, "lxml")

								print(soup.head.string)

								print(soup.head.title.string)

栏目列表