BeautifulSoup – 按标签内的文本搜索
先决条件: Beautifulsoup
Beautifulsoup 是一个强大的Python模块,用于网页抓取。本文讨论如何在给定标签内搜索特定文本。
方法
- 导入模块
- 传递网址
- 请求页面
- 指定要搜索的标签
- 对于按标签内的文本进行搜索,我们需要借助字符串函数检查条件。
- 字符串函数将返回标签内的文本。
- 当我们导航标签时,我们将使用文本检查条件。
- 返回文本
我们将通过两种方法查看标签内的搜索文本。
方法一:迭代
此方法使用 for 循环 for 来搜索文本。
例子
Python3
from bs4 import BeautifulSoup
import requests
# sample web page
sample_web_page = 'https://www.geeksforgeeks.org/caching-page-tables/'
# call get method to request that page
page = requests.get(sample_web_page)
# with the help of beautifulSoup and html parser create soup
soup = BeautifulSoup(page.content, "html.parser")
child_soup = soup.find_all('strong')
text = 'page table base register (PTBR)'
# we will search the tag with in which text is same as given text
for i in child_soup:
if(i.string == text):
print(i)
Python3
from bs4 import BeautifulSoup
import requests
# sample web page
sample_web_page = 'https://www.geeksforgeeks.org/caching-page-tables/'
# call get method to request that page
page = requests.get(sample_web_page)
# with the help of beautifulSoup and html parser create soup
soup = BeautifulSoup(page.content, "html.parser")
text = 'CS Theory Course'
# Search by text with the help of lambda function
gfg = soup.find_all(lambda tag: tag.name == "strong" and text in tag.text)
print(gfg)
输出
page table base register (PTBR)
方法 2:使用 lambda
它是上述示例的单内衬替代品。
例子
蟒蛇3
from bs4 import BeautifulSoup
import requests
# sample web page
sample_web_page = 'https://www.geeksforgeeks.org/caching-page-tables/'
# call get method to request that page
page = requests.get(sample_web_page)
# with the help of beautifulSoup and html parser create soup
soup = BeautifulSoup(page.content, "html.parser")
text = 'CS Theory Course'
# Search by text with the help of lambda function
gfg = soup.find_all(lambda tag: tag.name == "strong" and text in tag.text)
print(gfg)
输出
[CS Theory Course]