📜  如何获得 BeautifulSoup 的下一页?

📅  最后修改于: 2022-05-13 01:55:24.411000             🧑  作者: Mango

如何获得 BeautifulSoup 的下一页?

在本文中,我们将看到如何在 beautifulsoup 上获取下一页。

需要的模块

  • BeautifulSoup: Beautiful Soup(bs4) 是一个Python库,用于从 HTML 和 XML 文件中提取数据。要安装此模块,请在终端中键入以下命令。
pip install bs4
  • requests :这个库允许你非常轻松地发送 HTTP/1.1 请求。要安装此模块,请在终端中键入以下命令。
pip install requests

方法:

在 beautifulsoup 上获取下一页意味着首先我们将废弃一页内容,如果页面上提供了许多链接,我们也想废弃它们。我们可以先得到下一页我们将在找到任何其他链接后废弃示例网站,我们将再次调用请求。获取该页面的方法并将创建该页面的汤。这样我们就可以进入到beautifulsoup 的下一页。

让我们一步一步地执行脚本:

第一步:导入所有依赖

from bs4 import BeautifulSoup
import requests

第 2 步:我们需要使用 requests 请求页面 URL。

page=requests.get(sample_website)

第 3 步:借助 beautifulsoup 方法和 HTML 解析器,我们将创建一个页面汤。

soup = BeautifulSoup(page, 'html.parser')

第四步:

我们将在解析树中搜索并找到链接。如果我们想要那个 URL,那么在请求模块和漂亮模块的帮助下,我们将再次创建下一页的汤,因此我们可以在 beautifulsoup 上获得下一页。

Python3
for i in soup.find_all('a', href = True):
    
  # check all link which is contain
  # "www.geeksforgeeks.org" string 
  if("www.geeksforgeeks.org" in i['href']):
      
    # call get method to request next url
    nextpage = requests.get(i['href'])
      
    # create soup for next url
    nextsoup = BeautifulSoup(nextpage.content, 'html.parser')
      
    # we can scrap any thing of the
    # next page here we are scraping title of 
    # nexturl page string
    print("next url title : ",nextsoup.find('title').string)


Python3
from bs4 import BeautifulSoup
import requests
  
# sample website
sample_website='https://www.geeksforgeeks.org/different-ways-to-remove-all-the-digits-from-string-in-java/'
  
# call get method to request the page
page=requests.get(sample_website)
  
# with the help of BeautifulSoup
# method and html parser created soup
soup = BeautifulSoup(page.content, 'html.parser')
  
# With the help of find_all
# method perform searching in parser tree
for i in soup.find_all('a', href = True):
    
  # check all link which is contain
  # "www.geeksforgeeks.org" string 
  if("www.geeksforgeeks.org" in i['href']):
      
    # call get method to request next url
    nextpage = requests.get(i['href'])
      
    # create soup for next url
    nextsoup = BeautifulSoup(nextpage.content, 'html.parser')
      
    # we can scrap any thing of the
    # next page here we are scraping title of 
    # nexturl page string
    print("next url title : ",nextsoup.find('title').string)


以下是完整的实现:

蟒蛇3

from bs4 import BeautifulSoup
import requests
  
# sample website
sample_website='https://www.geeksforgeeks.org/different-ways-to-remove-all-the-digits-from-string-in-java/'
  
# call get method to request the page
page=requests.get(sample_website)
  
# with the help of BeautifulSoup
# method and html parser created soup
soup = BeautifulSoup(page.content, 'html.parser')
  
# With the help of find_all
# method perform searching in parser tree
for i in soup.find_all('a', href = True):
    
  # check all link which is contain
  # "www.geeksforgeeks.org" string 
  if("www.geeksforgeeks.org" in i['href']):
      
    # call get method to request next url
    nextpage = requests.get(i['href'])
      
    # create soup for next url
    nextsoup = BeautifulSoup(nextpage.content, 'html.parser')
      
    # we can scrap any thing of the
    # next page here we are scraping title of 
    # nexturl page string
    print("next url title : ",nextsoup.find('title').string)

输出:

next url title :  GeeksforGeeks | A computer science portal for geeks
next url title :  Analysis of Algorithms | Set 1 (Asymptotic Analysis) - GeeksforGeeks
next url title :  Analysis of Algorithms | Set 2 (Worst, Average and Best Cases) - GeeksforGeeks
next url title :  Analysis of Algorithms | Set 3 (Asymptotic Notations) - GeeksforGeeks
next url title :  Analysis of algorithms | little o and little omega notations - GeeksforGeeks
next url title :  Lower and Upper Bound Theory - GeeksforGeeks
next url title :  Analysis of Algorithms | Set 4 (Analysis of Loops) - GeeksforGeeks
next url title :  Analysis of Algorithm | Set 4 (Solving Recurrences) - GeeksforGeeks
next url title :  Analysis of Algorithm | Set 5 (Amortized Analysis Introduction) - GeeksforGeeks
next url title :  What does 'Space Complexity' mean? - GeeksforGeeks
next url title :  Pseudo-polynomial Algorithms - GeeksforGeeks
next url title :  Polynomial Time Approximation Scheme - GeeksforGeeks
next url title :  A Time Complexity Question - GeeksforGeeks
.................................................................