使用Python BeautifulSoup 抓取 Google 搜索结果

在本文中，我们将了解如何使用Python BeautifulSoup 抓取 Google 搜索结果。

所需模块：

bs4: Beautiful Soup(bs4) 是一个Python库，用于从 HTML 和 XML 文件中提取数据。这个模块没有内置于Python中。要安装此类型，请在终端中输入以下命令。

pip install bs4

请求：请求允许您非常轻松地发送 HTTP/1.1 请求。这个模块也没有内置于Python中。要安装此类型，请在终端中输入以下命令。

pip install requests

方法：

导入 beautifulsoup 和 request 库。
使用默认的 Google 搜索 URL 'https://google.com/search?q='和我们自定义的搜索关键字创建两个字符串。
连接这两个字符串以获取我们的搜索 URL。
使用requests.get(url) 获取 URL 数据，将其存储在变量request_result中。
使用request_result.text创建一个字符串并存储我们获取的请求的结果。
现在我们使用 BeautifulSoup 来分析提取的页面。我们可以简单地创建一个对象来执行这些操作，但是beautifulsoup 带有许多内置功能来抓取网络。我们首先使用来自请求-响应的 beautifulsoup 创建了一个汤对象
我们可以执行soup.find.all(h3)来获取搜索结果的所有主要标题，遍历对象并将其打印为字符串。

例1：下面是上述方法的实现。

Python3

# Import the beautifulsoup 
# and request libraries of python.
import requests
import bs4
  
# Make two strings with default google search URL
# 'https://google.com/search?q=' and
# our customized search keyword.
# Concatenate them
text= "geeksforgeeks"
url = 'https://google.com/search?q=' + text
  
# Fetch the URL data using requests.get(url),
# store it in a variable, request_result.
request_result=requests.get( url )
  
# Creating soup from the fetched request
soup = bs4.BeautifulSoup(request_result.text,
                         "html.parser")
print(soup)

Python3

# soup.find.all( h3 ) to grab 
# all major headings of our search result,
heading_object=soup.find_all( 'h3' )
  
# Iterate through the object 
# and print it as a string.
for info in heading_object:
    print(info.getText())
    print("------")

Python

# import module
import requests 
import bs4 
  
# Taking thecity name as an input from the user
city = "Imphal"
  
# Generating the url  
url = "https://google.com/search?q=weather+in+" + city
  
# Sending HTTP request 
request_result = requests.get( url )
  
# Pulling HTTP data from internet 
soup = bs4.BeautifulSoup( request_result.text 
                         , "html.parser" )
  
# Finding temperature in Celsius.
# The temperature is stored inside the class "BNeawe". 
temp = soup.find( "div" , class_='BNeawe' ).text 
    
print( temp )

输出：

让我们可以做soup.find.all(h3)来获取搜索结果的所有主要标题，遍历对象并将其打印为字符串。

蟒蛇3

# soup.find.all( h3 ) to grab 
# all major headings of our search result,
heading_object=soup.find_all( 'h3' )
  
# Iterate through the object 
# and print it as a string.
for info in heading_object:
    print(info.getText())
    print("------")

输出：

例2：下面是实现。以使用谷歌搜索提取城市温度的形式：

Python

# import module
import requests 
import bs4 
  
# Taking thecity name as an input from the user
city = "Imphal"
  
# Generating the url  
url = "https://google.com/search?q=weather+in+" + city
  
# Sending HTTP request 
request_result = requests.get( url )
  
# Pulling HTTP data from internet 
soup = bs4.BeautifulSoup( request_result.text 
                         , "html.parser" )
  
# Finding temperature in Celsius.
# The temperature is stored inside the class "BNeawe". 
temp = soup.find( "div" , class_='BNeawe' ).text 
    
print( temp )

输出：