BeautifulSoup – 按标签内的文本搜索

先决条件： Beautifulsoup

Beautifulsoup 是一个强大的Python模块，用于网页抓取。本文讨论如何在给定标签内搜索特定文本。

方法

导入模块
传递网址
请求页面
指定要搜索的标签
对于按标签内的文本进行搜索，我们需要借助字符串函数检查条件。
字符串函数将返回标签内的文本。
当我们导航标签时，我们将使用文本检查条件。
返回文本

我们将通过两种方法查看标签内的搜索文本。

方法一：迭代

此方法使用 for 循环 for 来搜索文本。

例子

Python3

from bs4 import BeautifulSoup
import requests
  
# sample web page
sample_web_page = 'https://www.geeksforgeeks.org/caching-page-tables/'
  
# call get method to request that page
page = requests.get(sample_web_page)
  
# with the help of beautifulSoup and html parser create soup
soup = BeautifulSoup(page.content, "html.parser")
  
child_soup = soup.find_all('strong')
  
text = 'page table base register (PTBR)'
  
# we will search the tag with in which text is same as given text
for i in child_soup:
    if(i.string == text):
        print(i)

Python3

from bs4 import BeautifulSoup
import requests
  
# sample web page
sample_web_page = 'https://www.geeksforgeeks.org/caching-page-tables/'
  
# call get method to request that page
page = requests.get(sample_web_page)
  
# with the help of beautifulSoup and html parser create soup
soup = BeautifulSoup(page.content, "html.parser")
  
text = 'CS Theory Course'
  
# Search by text with the help of lambda function
gfg = soup.find_all(lambda tag: tag.name == "strong" and text in tag.text)
  
print(gfg)

输出

page table base register (PTBR)

编程需要懂一点英语

方法 2：使用 lambda

它是上述示例的单内衬替代品。

例子

蟒蛇3

from bs4 import BeautifulSoup
import requests
  
# sample web page
sample_web_page = 'https://www.geeksforgeeks.org/caching-page-tables/'
  
# call get method to request that page
page = requests.get(sample_web_page)
  
# with the help of beautifulSoup and html parser create soup
soup = BeautifulSoup(page.content, "html.parser")
  
text = 'CS Theory Course'
  
# Search by text with the help of lambda function
gfg = soup.find_all(lambda tag: tag.name == "strong" and text in tag.text)
  
print(gfg)

输出

[CS Theory Course]

编程需要懂一点英语