如何获得 BeautifulSoup 的下一页？

在本文中，我们将看到如何在 beautifulsoup 上获取下一页。

需要的模块

BeautifulSoup： Beautiful Soup(bs4) 是一个Python库，用于从 HTML 和 XML 文件中提取数据。要安装此模块，请在终端中键入以下命令。

pip install bs4

requests ：这个库允许你非常轻松地发送 HTTP/1.1 请求。要安装此模块，请在终端中键入以下命令。

pip install requests

方法：

在 beautifulsoup 上获取下一页意味着首先我们将废弃一页内容，如果页面上提供了许多链接，我们也想废弃它们。我们可以先得到下一页我们将在找到任何其他链接后废弃示例网站，我们将再次调用请求。获取该页面的方法并将创建该页面的汤。这样我们就可以进入到beautifulsoup 的下一页。

让我们一步一步地执行脚本：

第一步：导入所有依赖

from bs4 import BeautifulSoup
import requests

第 2 步：我们需要使用 requests 请求页面 URL。

page=requests.get(sample_website)

第 3 步：借助 beautifulsoup 方法和 HTML 解析器，我们将创建一个页面汤。

soup = BeautifulSoup(page, 'html.parser')

第四步：

我们将在解析树中搜索并找到链接。如果我们想要那个 URL，那么在请求模块和漂亮模块的帮助下，我们将再次创建下一页的汤，因此我们可以在 beautifulsoup 上获得下一页。

Python3

for i in soup.find_all('a', href = True):
    
  # check all link which is contain
  # "www.geeksforgeeks.org" string 
  if("www.geeksforgeeks.org" in i['href']):
      
    # call get method to request next url
    nextpage = requests.get(i['href'])
      
    # create soup for next url
    nextsoup = BeautifulSoup(nextpage.content, 'html.parser')
      
    # we can scrap any thing of the
    # next page here we are scraping title of 
    # nexturl page string
    print("next url title : ",nextsoup.find('title').string)

Python3

from bs4 import BeautifulSoup
import requests
  
# sample website
sample_website='https://www.geeksforgeeks.org/different-ways-to-remove-all-the-digits-from-string-in-java/'
  
# call get method to request the page
page=requests.get(sample_website)
  
# with the help of BeautifulSoup
# method and html parser created soup
soup = BeautifulSoup(page.content, 'html.parser')
  
# With the help of find_all
# method perform searching in parser tree
for i in soup.find_all('a', href = True):
    
  # check all link which is contain
  # "www.geeksforgeeks.org" string 
  if("www.geeksforgeeks.org" in i['href']):
      
    # call get method to request next url
    nextpage = requests.get(i['href'])
      
    # create soup for next url
    nextsoup = BeautifulSoup(nextpage.content, 'html.parser')
      
    # we can scrap any thing of the
    # next page here we are scraping title of 
    # nexturl page string
    print("next url title : ",nextsoup.find('title').string)

以下是完整的实现：

蟒蛇3

from bs4 import BeautifulSoup
import requests
  
# sample website
sample_website='https://www.geeksforgeeks.org/different-ways-to-remove-all-the-digits-from-string-in-java/'
  
# call get method to request the page
page=requests.get(sample_website)
  
# with the help of BeautifulSoup
# method and html parser created soup
soup = BeautifulSoup(page.content, 'html.parser')
  
# With the help of find_all
# method perform searching in parser tree
for i in soup.find_all('a', href = True):
    
  # check all link which is contain
  # "www.geeksforgeeks.org" string 
  if("www.geeksforgeeks.org" in i['href']):
      
    # call get method to request next url
    nextpage = requests.get(i['href'])
      
    # create soup for next url
    nextsoup = BeautifulSoup(nextpage.content, 'html.parser')
      
    # we can scrap any thing of the
    # next page here we are scraping title of 
    # nexturl page string
    print("next url title : ",nextsoup.find('title').string)

输出：

next url title :  GeeksforGeeks | A computer science portal for geeks
next url title :  Analysis of Algorithms | Set 1 (Asymptotic Analysis) - GeeksforGeeks
next url title :  Analysis of Algorithms | Set 2 (Worst, Average and Best Cases) - GeeksforGeeks
next url title :  Analysis of Algorithms | Set 3 (Asymptotic Notations) - GeeksforGeeks
next url title :  Analysis of algorithms | little o and little omega notations - GeeksforGeeks
next url title :  Lower and Upper Bound Theory - GeeksforGeeks
next url title :  Analysis of Algorithms | Set 4 (Analysis of Loops) - GeeksforGeeks
next url title :  Analysis of Algorithm | Set 4 (Solving Recurrences) - GeeksforGeeks
next url title :  Analysis of Algorithm | Set 5 (Amortized Analysis Introduction) - GeeksforGeeks
next url title :  What does 'Space Complexity' mean? - GeeksforGeeks
next url title :  Pseudo-polynomial Algorithms - GeeksforGeeks
next url title :  Polynomial Time Approximation Scheme - GeeksforGeeks
next url title :  A Time Complexity Question - GeeksforGeeks
.................................................................