pthon 解析xml

上一篇 / 下一篇  2014-12-01 14:11:14 / 个人分类:python


python有三种方法解析XML,SAX,DOM,以及ElementTree
###1.SAX (simple API for XML )
       pyhton 标准库包含SAX解析器,SAX是一种典型的极为快速的工具,在解析XML时,不会占用大量内存。
但是这是基于回调机制的,因此在某些数据中,它会调用某些方法进行传递。这意味着必须为数据指定句柄,
以维持自己的状态,这是非常困难的。


###2.DOM(Document Object Model)
       与SAX比较,DOM典型的缺点是比较慢,消耗更多的内存,因为DOM会将整个XML数读入内存中,并为树
中的第一个节点建立一个对象。使用DOM的好处是你不需要对状态进行追踪,因为每一个节点都知道谁是它的
父节点,谁是子节点。但是DOM用起来有些麻烦。


###3.ElementTree(元素树)
     ElementTree就像一个轻量级的DOM,具有方便友好的API。代码可用性好,速度快,消耗内存少,这里主要
介绍ElementTree。


1.加载xml文件

加载XML文件共有2种方法,一是加载指定字符串,二是加载指定文件
  • ElementTree.fromstring(text) 
  • ElementTree.parse

2.获取element的方法

  a) 通过getiterator

  b) 过 getchildren

  c) find方法

  d) findall方法


XML文件:

<?xml version="1.0" encoding="UTF-8" ?>
<users>
    <user id="1000001">
        <username>Admin</username>
        <email>admin@live.cn</email>
        <age>23</age>
        <sex>男</sex>
    </user>
    <user id="1000002">
        <username>Admin2</username>
        <email>admin2@live.cn</email>
        <age>22</age>
        <sex>男</sex>
    </user>

</users>


例子:

#coding=utf-8
from xml.etree import ElementTree as ET


if __name__ == '__main__':
    root = ET.parse("D:\consumer.xml")
    #通过findall,find 获取具体tag 的值
    p=root.findall('user')
    for per in p:
        print 'email',per.find('email').text
   #获取xml中所有tag 和对应值
    iters=root.getiterator('user')
    for it in iters:
        for child in it.getchildren():
            print child.tag,':',child.text




Element具有的属性和方法:
tag
A string identifying what kind of data this element represents (the element type, in other words).
text
Thetextattribute can be used to hold additional data associated with the element. As the name implies this attribute is usually a string but may be any application-specific object. If the element is created from an XML file the attribute will contain any text found between the element tags.
tail
Thetailattribute can be used to hold additional data associated with the element. This attribute is usually a string but may be any application-specific object. If the element is created from an XML file the attribute will contain any text found after the element’s end tag and before the next tag.
attrib
A dictionary containing the element’s attributes. Note that while theattribvalue is always a real mutable Python dictionary, an ElementTree implementation may choose to use another internal representation, and create the dictionary only if someone asks for it. To take advantage of such implementations, use the dictionary methods below whenever possible.

The following dictionary-like methods work on the element attributes.

clear()
Resets an element. This function removes all subelements, clears all attributes, and sets the text and tail attributes to None.
get(key,default=None)

Gets the element attribute namedkey.

Returns the attribute value, ordefaultif the attribute was not found.

items()
Returns the element attributes as a sequence of (name, value) pairs. The attributes are returned in an arbitrary order.
keys()
Returns the elements attribute names as a list. The names are returned in an arbitrary order.
set(key,value)
Set the attributekeyon the element tovalue.

The following methods work on the element’s children (subelements).

append(subelement)
Adds the elementsubelementto the end of this elements internal list of subelements.
extend(subelements)

Appendssubelementsfrom a sequence object with zero or more elements. RaisesAssertionErrorif a subelement is not a valid object.

New in version 2.7.

find(match)
Finds the first subelement matchingmatch.matchmay be a tag name or path. Returns an element instance orNone.
findall(match)
Finds all matching subelements, by tag name or path. Returns a list containing all matching elements in document order.
findtext(match,default=None)
Finds text for the first subelement matchingmatch.matchmay be a tag name or path. Returns the text content of the first matching element, ordefaultif no element was found. Note that if the matching element has no text content an empty string is returned.
getchildren()

Deprecated since version 2.7:Uselist(elem)or iteration.

getiterator(tag=None)

Deprecated since version 2.7:Use methodElement.iter()instead.

insert(index,element)
Inserts a subelement at the given position in this element.
iter(tag=None)
Creates a treeiteratorwith the current element as the root. The iterator iterates over this element and all elements below it, in document (depth first) order. Iftagis notNoneor'*', only elements whose tag equalstagare returned from the iterator. If the tree structure is modified during iteration, the result is undefined.
iterfind(match)

Finds all matching subelements, by tag name or path. Returns an iterable yielding all matching elements in document order.

New in version 2.7.

itertext()

Creates a text iterator. The iterator loops over this element and all subelements, in document order, and returns all inner text.

New in version 2.7.

makeelement(tag,attrib)
Creates a new element object of the same type as this element. Do not call this method, use theSubElement()factory function instead.
remove(subelement)
Removessubelementfrom the element. Unlike the find* methods this method compares elements based on the instance identity, not on tag value or contents.
ElementTree具有的属性和方法:
_setroot(element)
Replaces the root element for this tree. This discards the current contents of the tree, and replaces it with the given element. Use with care.elementis an element instance.
find(match)
Finds the first toplevel element matchingmatch.matchmay be a tag name or path. Same as getroot().find(match). Returns the first matching element, orNoneif no element was found.
findall(match)
Finds all matching subelements, by tag name or path. Same as getroot().findall(match).matchmay be a tag name or path. Returns a list containing all matching elements, in document order.
findtext(match,default=None)
Finds the element text for the first toplevel element with given tag. Same as getroot().findtext(match).matchmay be a tag name or path.defaultis the value to return if the element was not found. Returns the text content of the first matching element, or the default value no element was found. Note that if the element is found, but has no text content, this method returns an empty string.
getiterator(tag=None)

Deprecated since version 2.7:Use methodElementTree.iter()instead.

getroot()
Returns the root element for this tree.
iter(tag=None)
Creates and returns a tree iterator for the root element. The iterator loops over all elements in this tree, in section order.tagis the tag to look for (default is to return all elements)
iterfind(match)

Finds all matching subelements, by tag name or path. Same as getroot().iterfind(match). Returns an iterable yielding all matching elements in document order.

New in version 2.7.

parse(source,parser=None)
Loads an external XML section into this element tree.sourceis a file name or file object.parseris an optional parser instance. If not given, the standard XMLParser parser is used. Returns the section root element.
write(file,encoding="us-ascii",xml_declaration=None,method="xml")
Writes the element tree to a file, as XML.fileis a file name, or a file object opened for writing.encoding[1]is the output encoding (default is US-ASCII).xml_declarationcontrols if an XML declaration should be added to the file. Use False for never, True for always, None for only if not US-ASCII or UTF-8 (default is None).methodis either"xml","html"or"text"(default is"xml"). Returns an encoded string.


TAG:

 

评分:0

我来说两句

日历

« 2024-04-24  
 123456
78910111213
14151617181920
21222324252627
282930    

数据统计

  • 访问量: 17302
  • 日志数: 11
  • 建立时间: 2014-11-07
  • 更新时间: 2016-01-06

RSS订阅

Open Toolbar