📜  如何使用Python抓取视频?

📅  最后修改于: 2022-05-13 01:54:25.889000             🧑  作者: Mango

如何使用Python抓取视频?

先决条件:

  • 要求
  • 美汤

在本文中,我们将讨论使用Python对视频进行网络抓取。对于网页抓取,我们将使用Python的请求和BeautifulSoup模块。 requests库是Python不可或缺的一部分,用于向指定的 URL 发出 HTTP 请求。无论是 REST API 还是 Web Scrapping,都必须学习请求才能进一步使用这些技术。当一个人向一个 URI 发出请求时,它会返回一个响应。 Python请求提供了用于管理请求和响应的内置功能。

pip install requests

Beautiful Soup是一个Python库,专为快速周转项目(如屏幕抓取)而设计。

pip install bs4

让我们一步一步地理解实现:

  • 导入所需模块
Python3
# Import Required Module
import requests 
from bs4 import BeautifulSoup


Python3
# Web URL
Web_url = "Enter WEB URL"
  
# Get URL Content
r = requests.get(Web_url) 
  
# Parse HTML Code
soup = BeautifulSoup(r.content, 'html5lib')


Python3
# List of all video tag
video_tags = soup.findAll('video') 
print("Total ",len(video_tags),"videos found")


Python3
for video_tag in video_tags:
    video_url = video_tag.find("a")['href']
    print(video_url)


Python3
# Import Required Module
import requests
from bs4 import BeautifulSoup
  
# Web URL
Web_url = "https://www.geeksforgeeks.org/make-notepad-using-tkinter/"
  
# Get URL Content
r = requests.get(Web_url)
  
# Parse HTML Code
soup = BeautifulSoup(r.content, 'html.parser')
  
# List of all video tag
video_tags = soup.findAll('video')
print("Total ", len(video_tags), "videos found")
  
if len(video_tags) != 0:
    for video_tag in video_tags:
        video_url = video_tag.find("a")['href']
        print(video_url)
else:
    print("no videos found")


  • 解析 HTML 内容

蟒蛇3

# Web URL
Web_url = "Enter WEB URL"
  
# Get URL Content
r = requests.get(Web_url) 
  
# Parse HTML Code
soup = BeautifulSoup(r.content, 'html5lib')
  • 计数网页上有多少视频。在 HTML 中,为了显示视频,我们使用video标签。

蟒蛇3

# List of all video tag
video_tags = soup.findAll('video') 
print("Total ",len(video_tags),"videos found")
  • 遍历所有视频标签并获取视频 URL

蟒蛇3

for video_tag in video_tags:
    video_url = video_tag.find("a")['href']
    print(video_url)

下面是实现:

蟒蛇3

# Import Required Module
import requests
from bs4 import BeautifulSoup
  
# Web URL
Web_url = "https://www.geeksforgeeks.org/make-notepad-using-tkinter/"
  
# Get URL Content
r = requests.get(Web_url)
  
# Parse HTML Code
soup = BeautifulSoup(r.content, 'html.parser')
  
# List of all video tag
video_tags = soup.findAll('video')
print("Total ", len(video_tags), "videos found")
  
if len(video_tags) != 0:
    for video_tag in video_tags:
        video_url = video_tag.find("a")['href']
        print(video_url)
else:
    print("no videos found")

输出:

Total  1 videos found
https://media.geeksforgeeks.org/wp-content/uploads/15.webm