使用 BeautifulSoup 查找给定标签的文本

网页抓取是使用称为网页抓取工具的软件机器人从网页的 HTML 或 XML 内容中提取信息的过程。 Beautiful Soup是一个用于通过Python抓取数据的库。 Beautiful Soup 与解析器一起工作以提供迭代、搜索和修改解析器提供的内容（以解析树的形式）。使用 Beautiful Soup 抓取网页并查找给定标签的文本相当容易。

在本文中，我们将讨论从给定标签中查找文本。

循序渐进的方法：

首先导入库。

Python3

from bs4 import BeautifulSoup
import requests

Python3

# assign URL
url = "https://www.geeksforgeeks.org/"

Python3

html_content = requests.get(url).text

Python3

# Now that the content is ready, iterate 
# through the content using BeautifulSoup
soup = BeautifulSoup(html_content, "html.parser")

Python3

print(soup.find('title'))

Python3

from bs4 import BeautifulSoup
import requests
  
  
# Assign URL
url = "https://www.geeksforgeeks.org/"
  
# Fetch raw HTML content
html_content = requests.get(url).text
  
# Now that the content is ready, iterate 
# through the content using BeautifulSoup:
soup = BeautifulSoup(html_content, "html.parser")
  
# similarly to get all the occurences of a given tag
print(soup.find('title').text)

Python3

from bs4 import BeautifulSoup
import requests
  
# Assign URL
url = "https://www.geeksforgeeks.org/"
  
# Fetch raw HTML content
html_content = requests.get(url).text
  
# Now that the content is ready, iterate 
# through the content using BeautifulSoup:
soup = BeautifulSoup(html_content, "html.parser")
  
# similarly to get all the occurences of a given tag
texts = soup.find_all('p')
for text in texts:
    print(text.get_text())

现在分配 URL。

蟒蛇3

# assign URL
url = "https://www.geeksforgeeks.org/"

从 URL 中获取原始 HTML 内容。

蟒蛇3

html_content = requests.get(url).text

现在解析内容。

蟒蛇3

# Now that the content is ready, iterate 
# through the content using BeautifulSoup
soup = BeautifulSoup(html_content, "html.parser")

解析完内容后，我们搜索特定标签并打印其文本。

蟒蛇3

print(soup.find('title'))

下面是完整的程序。

蟒蛇3

from bs4 import BeautifulSoup
import requests
  
  
# Assign URL
url = "https://www.geeksforgeeks.org/"
  
# Fetch raw HTML content
html_content = requests.get(url).text
  
# Now that the content is ready, iterate 
# through the content using BeautifulSoup:
soup = BeautifulSoup(html_content, "html.parser")
  
# similarly to get all the occurences of a given tag
print(soup.find('title').text)

输出：

类似地获取给定标签的所有出现：

蟒蛇3

from bs4 import BeautifulSoup
import requests
  
# Assign URL
url = "https://www.geeksforgeeks.org/"
  
# Fetch raw HTML content
html_content = requests.get(url).text
  
# Now that the content is ready, iterate 
# through the content using BeautifulSoup:
soup = BeautifulSoup(html_content, "html.parser")
  
# similarly to get all the occurences of a given tag
texts = soup.find_all('p')
for text in texts:
    print(text.get_text())

输出：