📌  相关文章
📜  使用 BeautifulSoup 查找给定标签的文本

📅  最后修改于: 2022-05-13 01:55:28.252000             🧑  作者: Mango

使用 BeautifulSoup 查找给定标签的文本

网页抓取是使用称为网页抓取工具的软件机器人从网页的 HTML 或 XML 内容中提取信息的过程。 Beautiful Soup是一个用于通过Python抓取数据的库。 Beautiful Soup 与解析器一起工作以提供迭代、搜索和修改解析器提供的内容(以解析树的形式)。使用 Beautiful Soup 抓取网页并查找给定标签的文本相当容易。

在本文中,我们将讨论从给定标签中查找文本。

循序渐进的方法:

  • 首先导入库。
Python3
from bs4 import BeautifulSoup
import requests


Python3
# assign URL
url = "https://www.geeksforgeeks.org/"


Python3
html_content = requests.get(url).text


Python3
# Now that the content is ready, iterate 
# through the content using BeautifulSoup
soup = BeautifulSoup(html_content, "html.parser")


Python3
print(soup.find('title'))


Python3
from bs4 import BeautifulSoup
import requests
  
  
# Assign URL
url = "https://www.geeksforgeeks.org/"
  
# Fetch raw HTML content
html_content = requests.get(url).text
  
# Now that the content is ready, iterate 
# through the content using BeautifulSoup:
soup = BeautifulSoup(html_content, "html.parser")
  
# similarly to get all the occurences of a given tag
print(soup.find('title').text)


Python3
from bs4 import BeautifulSoup
import requests
  
# Assign URL
url = "https://www.geeksforgeeks.org/"
  
# Fetch raw HTML content
html_content = requests.get(url).text
  
# Now that the content is ready, iterate 
# through the content using BeautifulSoup:
soup = BeautifulSoup(html_content, "html.parser")
  
# similarly to get all the occurences of a given tag
texts = soup.find_all('p')
for text in texts:
    print(text.get_text())


  • 现在分配 URL。

蟒蛇3



# assign URL
url = "https://www.geeksforgeeks.org/"
  • 从 URL 中获取原始 HTML 内容。

蟒蛇3

html_content = requests.get(url).text
  • 现在解析内容。

蟒蛇3

# Now that the content is ready, iterate 
# through the content using BeautifulSoup
soup = BeautifulSoup(html_content, "html.parser")
  • 解析完内容后,我们搜索特定标签并打印其文本。

蟒蛇3

print(soup.find('title'))

下面是完整的程序。

蟒蛇3

from bs4 import BeautifulSoup
import requests
  
  
# Assign URL
url = "https://www.geeksforgeeks.org/"
  
# Fetch raw HTML content
html_content = requests.get(url).text
  
# Now that the content is ready, iterate 
# through the content using BeautifulSoup:
soup = BeautifulSoup(html_content, "html.parser")
  
# similarly to get all the occurences of a given tag
print(soup.find('title').text)

输出:

类似地获取给定标签的所有出现:

蟒蛇3

from bs4 import BeautifulSoup
import requests
  
# Assign URL
url = "https://www.geeksforgeeks.org/"
  
# Fetch raw HTML content
html_content = requests.get(url).text
  
# Now that the content is ready, iterate 
# through the content using BeautifulSoup:
soup = BeautifulSoup(html_content, "html.parser")
  
# similarly to get all the occurences of a given tag
texts = soup.find_all('p')
for text in texts:
    print(text.get_text())

输出: