XPath-概述(1) - 芒果文档

📌 相关文章

📜 XPath-概述(1)

📅 最后修改于: 2023-12-03 15:06:05.140000 🧑 作者: Mango

XPath 概述

什么是 XPath

XPath（XML Path Language）是一种在 XML 文档中定位元素的语言。它是 W3C XSLT 标准中用于定位元素的一个重要部分。XPath 在很多编程语言中都有实现，因此也被广泛地应用于 Web 开发、爬虫等领域。

XPath 用途

XPath 主要用于定位 XML 文档中的元素、属性和文本节点，以便进行后续的操作，例如：

从 XML 文档中获取指定元素的内容
从 XML 文档中获取符合特定条件的元素
修改 XML 文档中的元素、属性或文本节点
在 XML 文档中创建新的元素、属性或文本节点

XPath 基本语法

XPath 按照路径来定位 XML 元素。路径由一系列用斜杠 / 分隔的步骤组成，每一步描述一个元素，例如：

/bookstore/book/price

上述路径描述了一个 bookstore 元素下的 book 元素下的 price 元素。

XPath 支持使用通配符 * 匹配任意元素名：

/bookstore/*/price

上述路径描述了一个 bookstore 元素下的所有元素中的 price 元素。

XPath 还支持使用属性名和属性值来定位元素：

//book[@category="WEB"]/title

上述路径描述了一个 category 属性值为 WEB 的 book 元素下的 title 元素。

XPath 函数

XPath 还提供了一系列函数，用于在路径和条件中进行比较、计算等操作。一些常用的函数包括：

text()：用于获取当前节点的文本内容
contains(str1, str2)：用于判断 str1 字符串中是否包含 str2 字符串
starts-with(str1, str2)：用于判断 str1 字符串是否以 str2 字符串开头
not(expr)：用于对表达式取反
count(expr)：用于统计匹配到的节点数量

XPath 实例

以下示例中，我们使用 Python 的 lxml 模块来解析 XML 文档，并使用 XPath 来定位元素和属性。

from lxml import etree

xml_string = """
<bookstore>
  <book category="WEB">
    <title lang="en">Learning XML</title>
    <author>Erik T. Ray</author>
    <year>2003</year>
    <price>39.95</price>
  </book>
  <book category="WEB">
    <title lang="en">XQuery Kick Start</title>
    <author>James McGovern</author>
    <author>Per Bothner</author>
    <author>Kurt Cagle</author>
    <author>James Linn</author>
    <author>Vaidyanathan Nagarajan</author>
    <year>2003</year>
    <price>49.99</price>
  </book>
</bookstore>
"""

root = etree.fromstring(xml_string)

# 获取所有 book 元素的 price 子元素
prices = root.xpath("//book/price")
for price in prices:
    print(price.text)

# 获取 category 属性值为 WEB 的 book 元素的 title 子元素
titles = root.xpath("//book[@category='WEB']/title")
for title in titles:
    print(title.text)

# 获取 book 元素的 author 元素数量
count = root.xpath("count(//book/author)")
print(count)

输出：

39.95
49.99
Learning XML
XQuery Kick Start
2.0

以上示例演示了如何使用 XPath 定位元素和属性，并使用函数进行操作。