如何获得 BeautifulSoup 的下一页?
在本文中,我们将看到如何在 beautifulsoup 上获取下一页。
需要的模块
- BeautifulSoup: Beautiful Soup(bs4) 是一个Python库,用于从 HTML 和 XML 文件中提取数据。要安装此模块,请在终端中键入以下命令。
pip install bs4
- requests :这个库允许你非常轻松地发送 HTTP/1.1 请求。要安装此模块,请在终端中键入以下命令。
pip install requests
方法:
在 beautifulsoup 上获取下一页意味着首先我们将废弃一页内容,如果页面上提供了许多链接,我们也想废弃它们。我们可以先得到下一页我们将在找到任何其他链接后废弃示例网站,我们将再次调用请求。获取该页面的方法并将创建该页面的汤。这样我们就可以进入到beautifulsoup 的下一页。
让我们一步一步地执行脚本:
第一步:导入所有依赖
from bs4 import BeautifulSoup
import requests
第 2 步:我们需要使用 requests 请求页面 URL。
page=requests.get(sample_website)
第 3 步:借助 beautifulsoup 方法和 HTML 解析器,我们将创建一个页面汤。
soup = BeautifulSoup(page, 'html.parser')
第四步:
我们将在解析树中搜索并找到链接。如果我们想要那个 URL,那么在请求模块和漂亮模块的帮助下,我们将再次创建下一页的汤,因此我们可以在 beautifulsoup 上获得下一页。
Python3
for i in soup.find_all('a', href = True):
# check all link which is contain
# "www.geeksforgeeks.org" string
if("www.geeksforgeeks.org" in i['href']):
# call get method to request next url
nextpage = requests.get(i['href'])
# create soup for next url
nextsoup = BeautifulSoup(nextpage.content, 'html.parser')
# we can scrap any thing of the
# next page here we are scraping title of
# nexturl page string
print("next url title : ",nextsoup.find('title').string)
Python3
from bs4 import BeautifulSoup
import requests
# sample website
sample_website='https://www.geeksforgeeks.org/different-ways-to-remove-all-the-digits-from-string-in-java/'
# call get method to request the page
page=requests.get(sample_website)
# with the help of BeautifulSoup
# method and html parser created soup
soup = BeautifulSoup(page.content, 'html.parser')
# With the help of find_all
# method perform searching in parser tree
for i in soup.find_all('a', href = True):
# check all link which is contain
# "www.geeksforgeeks.org" string
if("www.geeksforgeeks.org" in i['href']):
# call get method to request next url
nextpage = requests.get(i['href'])
# create soup for next url
nextsoup = BeautifulSoup(nextpage.content, 'html.parser')
# we can scrap any thing of the
# next page here we are scraping title of
# nexturl page string
print("next url title : ",nextsoup.find('title').string)
以下是完整的实现:
蟒蛇3
from bs4 import BeautifulSoup
import requests
# sample website
sample_website='https://www.geeksforgeeks.org/different-ways-to-remove-all-the-digits-from-string-in-java/'
# call get method to request the page
page=requests.get(sample_website)
# with the help of BeautifulSoup
# method and html parser created soup
soup = BeautifulSoup(page.content, 'html.parser')
# With the help of find_all
# method perform searching in parser tree
for i in soup.find_all('a', href = True):
# check all link which is contain
# "www.geeksforgeeks.org" string
if("www.geeksforgeeks.org" in i['href']):
# call get method to request next url
nextpage = requests.get(i['href'])
# create soup for next url
nextsoup = BeautifulSoup(nextpage.content, 'html.parser')
# we can scrap any thing of the
# next page here we are scraping title of
# nexturl page string
print("next url title : ",nextsoup.find('title').string)
输出:
next url title : GeeksforGeeks | A computer science portal for geeks
next url title : Analysis of Algorithms | Set 1 (Asymptotic Analysis) - GeeksforGeeks
next url title : Analysis of Algorithms | Set 2 (Worst, Average and Best Cases) - GeeksforGeeks
next url title : Analysis of Algorithms | Set 3 (Asymptotic Notations) - GeeksforGeeks
next url title : Analysis of algorithms | little o and little omega notations - GeeksforGeeks
next url title : Lower and Upper Bound Theory - GeeksforGeeks
next url title : Analysis of Algorithms | Set 4 (Analysis of Loops) - GeeksforGeeks
next url title : Analysis of Algorithm | Set 4 (Solving Recurrences) - GeeksforGeeks
next url title : Analysis of Algorithm | Set 5 (Amortized Analysis Introduction) - GeeksforGeeks
next url title : What does 'Space Complexity' mean? - GeeksforGeeks
next url title : Pseudo-polynomial Algorithms - GeeksforGeeks
next url title : Polynomial Time Approximation Scheme - GeeksforGeeks
next url title : A Time Complexity Question - GeeksforGeeks
.................................................................