beautifulsoup 按类查找 - Python (1)

📌 相关文章

📜 beautifulsoup 按类查找 - Python (1)

📅 最后修改于: 2023-12-03 15:29:36.562000 🧑 作者: Mango

Beautiful Soup 按类查找

Beautiful Soup 是 Python 的一个 HTML/XML 解析库，它可以方便地从 HTML 或 XML 文件中提取数据。其中，按照类或者标签查找特定的元素是 Beautiful Soup 的一大特色。

安装

使用 pip 可以很方便地安装 Beautiful Soup：

pip install beautifulsoup4

导入

from bs4 import BeautifulSoup

构造函数

使用 Beautiful Soup 首先需要将 HTML 或 XML 文件解析成一个 Beautiful Soup 对象，构造函数为：

soup = BeautifulSoup(html, 'html.parser')

其中 html 为要解析的 HTML 或 XML 字符串，'html.parser' 是解析器类型。

根据类查找元素

在 Beautiful Soup 中，使用 find_all 方法按照类名查找元素，具体使用方法如下：

soup.find_all('tag', class_='class_name')

其中，tag 为标签名，class_name 为类名。

以下是一个 HTML 示例：

<html>
  <body>
    <div class="content">
      <p class="intro">Beautiful Soup example</p>
      <a class="link" href="http://www.example.com">Link</a>
    </div>
    <div class="content">
      <p class="intro">Another paragraph</p>
      <img class="pic" src="http://www.example.com/pic.jpg"/>
      <a class="link" href="http://www.example.com/another_link">Another link</a>
    </div>
  </body>
</html>

找出所有 class 为 "content" 的 div 元素：

soup.find_all('div', class_='content')

结果为：

[<div class="content">
   <p class="intro">Beautiful Soup example</p>
   <a class="link" href="http://www.example.com">Link</a>
 </div>,
 <div class="content">
   <p class="intro">Another paragraph</p>
   <img class="pic" src="http://www.example.com/pic.jpg"/>
   <a class="link" href="http://www.example.com/another_link">Another link</a>
 </div>]

获取元素的文本和属性

Beautiful Soup 还提供了 text 属性和 get 方法来获取元素的文本和属性。

element.text
element.get('attribute_name')

例如，获取第一个 div 元素的文本和 class 属性：

div = soup.find('div', class_='content')
print(div.text)
print(div.get('class'))

结果为：

Beautiful Soup example
Link
['content']

总结

Beautiful Soup 是一个方便的 HTML/XML 解析库，它通过类似于 CSS 选择器的语法来查找元素，使用起来十分方便。同时，也可以通过 text 和 get 方法来获取元素的文本和属性。