使用Python BeautifulSoup 抓取 Google 搜索结果
在本文中,我们将了解如何使用Python BeautifulSoup 抓取 Google 搜索结果。
所需模块:
- bs4: Beautiful Soup(bs4) 是一个Python库,用于从 HTML 和 XML 文件中提取数据。这个模块没有内置于Python中。要安装此类型,请在终端中输入以下命令。
pip install bs4
- 请求:请求允许您非常轻松地发送 HTTP/1.1 请求。这个模块也没有内置于Python中。要安装此类型,请在终端中输入以下命令。
pip install requests
方法:
- 导入 beautifulsoup 和 request 库。
- 使用默认的 Google 搜索 URL 'https://google.com/search?q='和我们自定义的搜索关键字创建两个字符串。
- 连接这两个字符串以获取我们的搜索 URL。
- 使用requests.get(url) 获取 URL 数据,将其存储在变量request_result中。
- 使用request_result.text创建一个字符串并存储我们获取的请求的结果。
- 现在我们使用 BeautifulSoup 来分析提取的页面。我们可以简单地创建一个对象来执行这些操作,但是beautifulsoup 带有许多内置功能来抓取网络。我们首先使用来自请求-响应的 beautifulsoup 创建了一个汤对象
- 我们可以执行soup.find.all(h3)来获取搜索结果的所有主要标题,遍历对象并将其打印为字符串。
例1:下面是上述方法的实现。
Python3
# Import the beautifulsoup
# and request libraries of python.
import requests
import bs4
# Make two strings with default google search URL
# 'https://google.com/search?q=' and
# our customized search keyword.
# Concatenate them
text= "geeksforgeeks"
url = 'https://google.com/search?q=' + text
# Fetch the URL data using requests.get(url),
# store it in a variable, request_result.
request_result=requests.get( url )
# Creating soup from the fetched request
soup = bs4.BeautifulSoup(request_result.text,
"html.parser")
print(soup)
Python3
# soup.find.all( h3 ) to grab
# all major headings of our search result,
heading_object=soup.find_all( 'h3' )
# Iterate through the object
# and print it as a string.
for info in heading_object:
print(info.getText())
print("------")
Python
# import module
import requests
import bs4
# Taking thecity name as an input from the user
city = "Imphal"
# Generating the url
url = "https://google.com/search?q=weather+in+" + city
# Sending HTTP request
request_result = requests.get( url )
# Pulling HTTP data from internet
soup = bs4.BeautifulSoup( request_result.text
, "html.parser" )
# Finding temperature in Celsius.
# The temperature is stored inside the class "BNeawe".
temp = soup.find( "div" , class_='BNeawe' ).text
print( temp )
输出:
让我们可以做soup.find.all(h3)来获取搜索结果的所有主要标题,遍历对象并将其打印为字符串。
蟒蛇3
# soup.find.all( h3 ) to grab
# all major headings of our search result,
heading_object=soup.find_all( 'h3' )
# Iterate through the object
# and print it as a string.
for info in heading_object:
print(info.getText())
print("------")
输出:
例2:下面是实现。以使用谷歌搜索提取城市温度的形式:
Python
# import module
import requests
import bs4
# Taking thecity name as an input from the user
city = "Imphal"
# Generating the url
url = "https://google.com/search?q=weather+in+" + city
# Sending HTTP request
request_result = requests.get( url )
# Pulling HTTP data from internet
soup = bs4.BeautifulSoup( request_result.text
, "html.parser" )
# Finding temperature in Celsius.
# The temperature is stored inside the class "BNeawe".
temp = soup.find( "div" , class_='BNeawe' ).text
print( temp )
输出: