Python3标准库：xml.etree.ElementTree XML操纵API(4)

当前位置:

首页 > Python基础教程 >

Python3标准库：xml.etree.ElementTree XML操纵API(4)

= XMLParser(target=target)

with open('podcasts.opml','rt') as f:

for line in f:

parser.feed(line)

parser.close()

PodcastListToCSV实现了TreeBuilder协议。每次遇到一个新的XML标记时，都会调用start()并提供标记名和属性。看到一个结束标记时，会根据这个标记名调用end()。在这二者之间，如果一个节点有内容，则会调用data()(一般认为树构造器会跟踪“当前”节点)。在所有输入都已经被处理时，将调用close()。它会返回一个值，返回给XMLTreeBuilder的用户。

1.7 用元素节点构造文档

除了解析功能，xml.etree.ElementTree还支持由应用中构造的Element对象来创建良构的XML文档。解析文档时使用的Element类还知道如何生成其内容的一个串行化形式，然后可以将这个串行化内容写至一个文件或其他数据流。

有3个辅助函数对于创建Element节点层次结构很有用。Element()创建一个标准节点，SubElement()将一个新节点关联到一个父节点，Comment()创建一个使用XML注释语法串行化数据的节点。

from xml.etree.ElementTree import Element,SubElement,Comment,tostring
top = Element('top')
comment = Comment('Generated for PyMOTW')
top.append(comment)
child = SubElement(top,'child')
child.text = 'This child contains text.'
child_with_tail = SubElement(top,'child_with_tail')
child_with_tail.text = 'This child has text.'
child_with_tail.tail = 'And "tail" text.'
child_with_entity_ref = SubElement(top,'child_with_entity_ref')
child_with_entity_ref.text = 'This & that'
print(tostring(top))

这个输出只包含树中的XML节点，而不包含版本和编码的XML声明。

1.8 美观打印XML

ElementTree不会通过格式化tostring()的输出来提高可读性，因为增加额外的空白符会改变文档的内容。为了让输出更易读，后面的例子将使用xml.dom.minidom解析XML，然后使用它的toprettyxml()方法。

from xml.etree import ElementTree
from xml.dom import minidom
from xml.etree.ElementTree import Element,SubElement,Comment,tostring
def prettify(elem):
"""
Return a pretty-printed XML string for the Element.
"""
rough_string = ElementTree.tostring(elem,'utf-8')
reparsed = minidom.parseString(rough_string)
return reparsed.toprettyxml(indent=" ")
top = Element('top')
comment = Comment('Generated for PyMOTW')
top.append(comment)
child = SubElement(top,'child')
child.text = 'This child contains text.'
child_with_tail = SubElement(top,'child_with_tail')
child_with_tail.text = 'This child has text.'
child_with_tail.tail = 'And "tail" text.'
child_with_entity_ref = SubElement(top,'child_with_entity_ref')
child_with_entity_ref.text = 'This & that'
print(prettify(top))

输出变得更易读。

除了增加用于格式化的额外空白符，xml.dom.minidom美观打印器还会向输出增加一个XML声明。

栏目列表