来自 beautifulsoup 的过滤器类 (1)

📌 相关文章

📜 来自 beautifulsoup 的过滤器类 (1)

📅 最后修改于: 2023-12-03 15:26:34.180000 🧑 作者: Mango

介绍BeautifulSoup中的过滤器类

在BeautifulSoup中，过滤器(Filter)是一种用于筛选HTML文档中特定部分的类。BeautifulSoup提供了多种类型的过滤器，可以根据元素名、属性、文本内容等进行过滤。

元素过滤器

元素过滤器根据元素名筛选符合条件的元素，可以使用soup.find_all()方法进行筛选。示例代码：

from bs4 import BeautifulSoup

html_doc = """
<html>
<head>
<title>BeautifulSoup</title>
</head>
<body>
<div>Some text</div>
<p>Another text</p>
<div>More text</div>
</body>
</html>
"""

soup = BeautifulSoup(html_doc, 'html.parser')
divs = soup.find_all('div')
for div in divs:
    print(div.text)

输出结果：

Some text
More text

属性过滤器

属性过滤器根据HTML元素的属性进行筛选，可以使用soup.find_all(attrs={})方法进行筛选，其中attrs为属性字典。示例代码：

from bs4 import BeautifulSoup

html_doc = """
<html>
<head>
<title>BeautifulSoup</title>
</head>
<body>
<div class="class1">Some text1</div>
<div class="class2">Some text2</div>
<div class="class1">More text1</div>
</body>
</html>
"""

soup = BeautifulSoup(html_doc, 'html.parser')
divs = soup.find_all(attrs={"class": "class1"})
for div in divs:
    print(div.text)

输出结果：

Some text1
More text1

文本内容过滤器

文本内容过滤器根据元素的文本内容进行筛选，可以使用soup.find_all(string={})方法进行筛选，其中string为文本内容。示例代码：

from bs4 import BeautifulSoup

html_doc = """
<html>
<head>
<title>BeautifulSoup</title>
</head>
<body>
<p>Some text</p>
<span>More text</span>
<p>Some other text</p>
</body>
</html>
"""

soup = BeautifulSoup(html_doc, 'html.parser')
ps = soup.find_all(string="Some text")
for p in ps:
    print(p)

输出结果：

Some text

以上就是BeautifulSoup中的过滤器类的介绍，希望对你有所帮助！