如何使用 BeautifulSoup 查找包含特定文本的 HTML 标签?
在本文中,我们将看到如何使用 BeautifulSoup 查找包含特定文本的 HTML 标签。
使用的方法:
Open( filename, mode ): It opens the given filename in that mode which we have passed.
find_all ( ): It finds all the pattern in the file which will match with the passed expression.
在这里,在下面给出的代码中,我们在各种不同的标签中找到了在程序中作为模式提及的某个文本。现在代码将提供所有这些标签,这些标签将使文本与模式匹配。
方法:
这里我们首先导入正则表达式和 BeautifulSoup 库。然后我们使用我们想要解析的 open函数打开 HTML 文件。然后使用 find_all函数,我们找到我们在该函数传递的特定标签以及我们想要在标签中包含的文本。如果传递的标签具有该特定文本,则将其添加到列表中。
因此,所有具有特定文本的标签都存储在一个列表中,然后打印该列表。如果我们得到空列表,则意味着没有这样的标签包含我们试图检查的文本。
以下是用于演示的 HTML 文件:
HTML
GFG
Geeks For Geeks
Geeks For Geeks
Dummy Text
Hello
Python Program
Geeks For Geeks
Geeks For Geeks
Python Program
Python Code
GFG Website
Python3
# Python program to find a HTML tag
# that contains certain text Using BeautifulSoup
# Importing library
from bs4 import BeautifulSoup
import re
# Opening and reading the html file
file = open("gfg.html", "r")
contents = file.read()
soup = BeautifulSoup(contents, 'html.parser')
# Finding a pattern(certain text)
pattern = 'Geeks For Geeks'
# Anchor tag
text1 = soup.find_all('a', text = pattern)
print(text1)
# Span tag
text2 = soup.find_all('span', text = pattern)
print(text2)
# Finding a pattern(certain text)
pattern2 = 'Python Program'
# Heading tag
text3 = soup.find_all('h1', text = pattern2)
print(text3)
# List tag
text4 = soup.find_all('li', text = pattern2)
print(text4)
# Finding a pattern(certain text)
pattern3 = 'GFG Website'
# Table(row) tag
text5 = soup.find_all('tr', text = pattern3)
print(text5)
输出:
下面是实现:
蟒蛇3
# Python program to find a HTML tag
# that contains certain text Using BeautifulSoup
# Importing library
from bs4 import BeautifulSoup
import re
# Opening and reading the html file
file = open("gfg.html", "r")
contents = file.read()
soup = BeautifulSoup(contents, 'html.parser')
# Finding a pattern(certain text)
pattern = 'Geeks For Geeks'
# Anchor tag
text1 = soup.find_all('a', text = pattern)
print(text1)
# Span tag
text2 = soup.find_all('span', text = pattern)
print(text2)
# Finding a pattern(certain text)
pattern2 = 'Python Program'
# Heading tag
text3 = soup.find_all('h1', text = pattern2)
print(text3)
# List tag
text4 = soup.find_all('li', text = pattern2)
print(text4)
# Finding a pattern(certain text)
pattern3 = 'GFG Website'
# Table(row) tag
text5 = soup.find_all('tr', text = pattern3)
print(text5)
输出:
[Geeks For Geeks, Geeks For Geeks]
[Geeks For Geeks, Geeks For Geeks]
[
Python Program
][
[