📜  如何使用 BeautifulSoup 搜索解析树?

📅  最后修改于: 2022-05-13 01:55:27.963000             🧑  作者: Mango

如何使用 BeautifulSoup 搜索解析树?

搜索解析树意味着我们需要找到 HTML 树的标签和内容。这可以通过多种方式完成。但最常用的搜索解析树的方法是 find() 和 find_all() 方法。借助这个,我们可以使用 Beautifulsoup 解析 HTML 树。

要搜索解析树,请按照以下步骤操作。

第1步:对于抓取,我们需要导入beautifulsoup模块并导入requests方法来请求网站页面。

from bs4 import BeautifulSoup
import requests

第 2 步:第二步是使用 HTML 解析器和beautifulsoup函数创建网站或 HTML 页面的汤。

BeautifulSoup(sample_website, 'html.parser')

步骤3:我们可以在soup中使用两种方法搜索解析树,第一种是find方法,第二种是find all方法。在 find 方法中,它将返回第一个满足条件的 HTML 树,而 find_all 方法将返回所有满足条件的 HTML 解析树。



示例 1:使用 find() 方法

Python3
from bs4 import BeautifulSoup
import requests
  
  
# sample website
sample_website = 'https://www.geeksforgeeks.org/difference-between-article-and-blog/'
  
# call get method to request the page
page = requests.get(sample_website)
  
# with the help of BeautifulSoup method and
# html parser created soup
soup = BeautifulSoup(page.content, 'html.parser')
  
# With the help of find method perform searching 
# in parser tree
print(soup.find('th'))


Python3
from bs4 import BeautifulSoup
import requests
  
  
# sample website
sample_website = 'https://www.geeksforgeeks.org/difference-between-article-and-blog/'
  
# call get method to request the page
page = requests.get(sample_website)
  
# with the help of BeautifulSoup method and html
# parser created soup
soup = BeautifulSoup(page.content, 'html.parser')
  
# With the help of find_all method perform searching
# in parser tree
print(soup.find_all('th'))


输出:

S.No.

示例 2:使用 find_all() 方法

蟒蛇3

from bs4 import BeautifulSoup
import requests
  
  
# sample website
sample_website = 'https://www.geeksforgeeks.org/difference-between-article-and-blog/'
  
# call get method to request the page
page = requests.get(sample_website)
  
# with the help of BeautifulSoup method and html
# parser created soup
soup = BeautifulSoup(page.content, 'html.parser')
  
# With the help of find_all method perform searching
# in parser tree
print(soup.find_all('th'))

    

输出:

[S.No., ARTICLE, BLOG]