使用Python进行图像抓取

抓取是每个人从任何网站获取数据的一项非常重要的技能。在本文中，我们将了解如何使用Python从网站上抓取图像。对于抓取图像，我们将尝试不同的方法。

方法一：使用 BeautifulSoup 和 Requests

bs4 : Beautiful Soup(bs4) 是一个Python库，用于从 HTML 和 XML 文件中提取数据。这个模块没有内置于Python中。要安装此类型，请在终端中输入以下命令。

pip install bs4

requests ： Requests 允许您非常轻松地发送 HTTP/1.1 请求。这个模块也没有内置于Python中。要安装此类型，请在终端中输入以下命令。

pip install requests

方法：

导入模块
制作请求实例并传入 URL
将请求传递给 Beautifulsoup()函数
使用 'img' 标签查找所有标签 ('src')

执行：

Python3

import requests 
from bs4 import BeautifulSoup 
    
def getdata(url): 
    r = requests.get(url) 
    return r.text 
    
htmldata = getdata("https://www.geeksforgeeks.org/") 
soup = BeautifulSoup(htmldata, 'html.parser') 
for item in soup.find_all('img'):
    print(item['src'])

Python3

from urllib.request import urlopen
from bs4 import BeautifulSoup
  
htmldata = urlopen('https://www.geeksforgeeks.org/')
soup = BeautifulSoup(htmldata, 'html.parser')
images = soup.find_all('img')
  
for item in images:
    print(item['src'])

输出：

https://media.geeksforgeeks.org/wp-content/cdn-uploads/20201018234700/GFG-RT-DSA-Creative.png
https://media.geeksforgeeks.org/wp-content/cdn-uploads/logo-new-2.svg

编程需要懂一点英语

方法二：使用urllib和BeautifulSoup

urllib ：它是一个Python模块，允许您通过 URL 访问网站并与之交互。要安装此类型，请在终端中输入以下命令。

pip install urllib

方法：

导入模块
使用 urlopen() 读取 URL
将请求传递给 Beautifulsoup()函数
使用 'img' 标签查找所有标签 ('src')

执行：

蟒蛇3

from urllib.request import urlopen
from bs4 import BeautifulSoup
  
htmldata = urlopen('https://www.geeksforgeeks.org/')
soup = BeautifulSoup(htmldata, 'html.parser')
images = soup.find_all('img')
  
for item in images:
    print(item['src'])

输出：

https://media.geeksforgeeks.org/wp-content/cdn-uploads/20201018234700/GFG-RT-DSA-Creative.png
https://media.geeksforgeeks.org/wp-content/cdn-uploads/logo-new-2.svg

编程需要懂一点英语