有如下的xml文件:
下面介绍python解析xml文件的几种方法,使用python模块实现。
方式1,python模块实现自动遍历所有节点:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from xml.sax.handler import ContentHandler
from xml.sax import parse
class TestHandle(ContentHandler):
def __init__(self, inlist):
self.inlist = inlist
def startElement(self,name,attrs):
print 'name:',name, 'attrs:',attrs.keys()
def endElement(self,name):
print 'endname',name
def characters(self,chars):
print 'chars',chars
self.inlist.append(chars)
if __name__ == '__main__':
lt = []
parse('test.xml', TestHandle(lt))
print lt
结果:
[html] view plaincopy
name: root attrs: []
chars
name: childs attrs: []
chars
name: child attrs: [u'name']
chars 1
endname child
chars
name: child attrs: [u'value']
chars 2
endname child
chars
endname childs
chars
endname root
[u'n', u'n', u'1', u'n', u'2', u'n', u'n']
方式2,python模块实现获取根节点,按需查找指定节点:
结果:
[html] view plaincopy
Dom:
<?xml version="1.0" ?><hash>
<request name="first">/2/photos/square/type.xml</request>
<error_code>21301</error_code>
<error>auth faild!</error>
</hash>
root:
<hash>
<request name="first">/2/photos/square/type.xml</request>
<error_code>21301</error_code>
<error>auth faild!</error>
</hash>
<request name="first">/2/photos/square/type.xml</request>
child node attribute name: first
child node name: request
child node len: 1
child data: /2/photos/square/type.xml
=======================================
more help info to see:
两种方法各有其优点,python的xml处理模块太多,目前只用到这2个。
=====补充分割线================
实际工作中发现python的mimidom无法解析其它编码的xml,只能解析utf-8的编码,而其xml文件的头部申明也必须是utf-8,为其它编码会报错误。
网上的解决办法都是替换xml文件头部的编码申明,然后转换编码为utf-8再用minidom解码,实际测试为可行,不过有点累赘的感觉。