Python爬虫是如何遍历文档树呢？一招教你(2)

当前位置:

首页 > temp > 简明python教程 >

Python爬虫是如何遍历文档树呢？一招教你(2)

.children

它返回的不是一个列表，不过我们可以通过遍历获取所有的子节点。

		
								#!/usr/bin/python3

								# -*- coding:utf-8 -*-

								from bs4 import BeautifulSoup

								html = """

								<html><head><title>The Dormouse's story</title></head>

								<body>

								<p class="title" name="dromouse"><b>The Dormouse's story</b></p>

								<p class="story">Once upon a time there were three little sisters; and their names were

								<a href="http://example.com/elsie" class="sister" id="link1"><!-- Elsie --></a>,

								<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and

								<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;

								and they lived at the bottom of a well.</p>

								<p class="story">...</p>

								"""

								# 创建 Beautiful Soup 对象，指定lxml解析器

								soup = BeautifulSoup(html, "lxml")

								# 输出方式为列表生成器对象

								print(soup.head.children)

								# 通过遍历获取所有子节点

								for child in soup.head.children:

								    print(child)

栏目列表