如何使用 BeautifulSoup 搜索解析树?
搜索解析树意味着我们需要找到 HTML 树的标签和内容。这可以通过多种方式完成。但最常用的搜索解析树的方法是 find() 和 find_all() 方法。借助这个,我们可以使用 Beautifulsoup 解析 HTML 树。
要搜索解析树,请按照以下步骤操作。
第1步:对于抓取,我们需要导入beautifulsoup模块并导入requests方法来请求网站页面。
from bs4 import BeautifulSoup
import requests
第 2 步:第二步是使用 HTML 解析器和beautifulsoup函数创建网站或 HTML 页面的汤。
BeautifulSoup(sample_website, 'html.parser')
步骤3:我们可以在soup中使用两种方法搜索解析树,第一种是find方法,第二种是find all方法。在 find 方法中,它将返回第一个满足条件的 HTML 树,而 find_all 方法将返回所有满足条件的 HTML 解析树。
示例 1:使用 find() 方法
Python3
from bs4 import BeautifulSoup
import requests
# sample website
sample_website = 'https://www.geeksforgeeks.org/difference-between-article-and-blog/'
# call get method to request the page
page = requests.get(sample_website)
# with the help of BeautifulSoup method and
# html parser created soup
soup = BeautifulSoup(page.content, 'html.parser')
# With the help of find method perform searching
# in parser tree
print(soup.find('th'))
Python3
from bs4 import BeautifulSoup
import requests
# sample website
sample_website = 'https://www.geeksforgeeks.org/difference-between-article-and-blog/'
# call get method to request the page
page = requests.get(sample_website)
# with the help of BeautifulSoup method and html
# parser created soup
soup = BeautifulSoup(page.content, 'html.parser')
# With the help of find_all method perform searching
# in parser tree
print(soup.find_all('th'))
输出:
S.No.
示例 2:使用 find_all() 方法
蟒蛇3
from bs4 import BeautifulSoup
import requests
# sample website
sample_website = 'https://www.geeksforgeeks.org/difference-between-article-and-blog/'
# call get method to request the page
page = requests.get(sample_website)
# with the help of BeautifulSoup method and html
# parser created soup
soup = BeautifulSoup(page.content, 'html.parser')
# With the help of find_all method perform searching
# in parser tree
print(soup.find_all('th'))
输出:
[S.No. , ARTICLE , BLOG ]