在Python中使用 DOM API 解析 XML
文档对象模型 (DOM) 是HTML和XML (可扩展标记语言)文档的编程接口。它定义了文档的逻辑结构以及访问和操作文档的方式。
在Python中使用 DOM API 解析 XML 非常简单。出于示例的目的,我们将创建一个示例 XML 文档 (sample.xml),如下所示:
GeeksForGeeks Company
Amar Pandey
8.5 LPA
Akbhar Khan
6.5 LPA
Anthony Walter
3.2 LPA
现在,让我们使用Python解析上面的 XML。下面的代码演示了这个过程,
from xml.dom import minidom
doc = minidom.parse("sample.xml")
# doc.getElementsByTagName returns the NodeList
name = doc.getElementsByTagName("name")[0]
print(name.firstChild.data)
staffs = doc.getElementsByTagName("staff")
for staff in staffs:
staff_id = staff.getAttribute("id")
name = staff.getElementsByTagName("name")[0]
salary = staff.getElementsByTagName("salary")[0]
print("id:% s, name:% s, salary:% s" %
(staff_id, name.firstChild.data, salary.firstChild.data))
输出:
GeeksForGeeks Company
id:1, name: Amar Pandey, salary:8.5 LPA
id:2, name: Akbar Khan, salary:6.5 LPA
id:3, name: Anthony Walter, salary:3.2 LPA
也可以使用用户定义的函数来完成相同的操作,如下面的代码所示:
from xml.dom import minidom
doc = minidom.parse("sample.xml")
# user-defined function
def getNodeText(node):
nodelist = node.childNodes
result = []
for node in nodelist:
if node.nodeType == node.TEXT_NODE:
result.append(node.data)
return ''.join(result)
name = doc.getElementsByTagName("name")[0]
print("Company Name : % s \n" % getNodeText(name))
staffs = doc.getElementsByTagName("staff")
for staff in staffs:
staff_id = staff.getAttribute("id")
name = staff.getElementsByTagName("name")[0]
salary = staff.getElementsByTagName("salary")[0]
print("id:% s, name:% s, salary:% s" %
(staff_id, getNodeText(name), getNodeText(salary)))
输出:
Company Name : GeeksForGeeks Company
id:1, name:Amar Pandey, salary:8.5 LPA
id:2, name:Akbhar Khan, salary:6.5 LPA
id:3, name:Anthony Walter, salary:3.2 LPA