📅  最后修改于: 2023-12-03 15:27:22.666000             🧑  作者: Mango
Instagram是一款基于图片和视频分享的社交媒体应用,每个月拥有超过10亿的活跃用户。Python是一种流行的编程语言,拥有强大的现成模块和库,适用于各种不同类型的应用。本文介绍如何使用Python通过Instagram的API和爬虫来获取和处理数据。
Instagram 提供了一个开放的API,可以让开发者通过编程的方式访问他们的平台数据。Instagram API支持多种编程语言,其中包括Python。使用API,可以访问Instagram用户的个人信息、发布的内容、评论、点赞数据等。以下是通过API实现的一些功能:
使用用户的用户名获取用户的信息,如姓名、ID、关注/粉丝数量等
import requests
username = "instagram"
url = f"https://www.instagram.com/{username}/?__a=1"
response = requests.get(url)
if response.ok:
user_data = response.json()["graphql"]["user"]
name = user_data["full_name"]
followers = user_data["edge_followed_by"]["count"]
following = user_data["edge_follow"]["count"]
print(f"User: {username}\nName: {name}\nFollowers: {followers}\nFollowing: {following}")
else:
print(f"Failed to get user data for {username}.")
使用用户的用户名获取用户发布的最新内容
import requests
username = "instagram"
url = f"https://www.instagram.com/{username}/?__a=1"
response = requests.get(url)
if response.ok:
user_data = response.json()["graphql"]["user"]
edges = user_data["edge_owner_to_timeline_media"]["edges"]
for edge in edges:
node = edge["node"]
post_url = f"https://www.instagram.com/p/{node['shortcode']}/"
caption = node.get("edge_media_to_caption", {}).get("edges", [{}])[0].get("node", {}).get("text", "")
likes = node.get("edge_media_preview_like", {}).get("count", 0)
print(f"Post URL: {post_url}\nCaption: {caption}\nLikes: {likes}\n")
else:
print(f"Failed to get user data for {username}.")
除了使用Instagram API外,可以使用Python爬虫从Instagram网站上爬取用户数据。爬虫可以用来获取大量数据,但需要遵守Instagram的数据使用政策。以下是使用爬虫获取Instagram数据的示例:
使用用户的用户名获取用户信息,如姓名、ID、关注/粉丝数量等
import requests
from bs4 import BeautifulSoup
username = "instagram"
url = f"https://www.instagram.com/{username}/"
response = requests.get(url)
if response.ok:
soup = BeautifulSoup(response.text, "html.parser")
user_data_script = soup.select_one("script[type='application/ld+json']").string
user_data = json.loads(user_data_script)["mainEntityofPage"]
name = user_data["name"]
followers = user_data["interactionStatistic"][0]["userInteractionCount"]
following = user_data["interactionStatistic"][1]["userInteractionCount"]
print(f"User: {username}\nName: {name}\nFollowers: {followers}\nFollowing: {following}")
else:
print(f"Failed to get user data for {username}.")
使用用户的用户名获取用户发布的最新内容
import requests
from bs4 import BeautifulSoup
username = "instagram"
url = f"https://www.instagram.com/{username}/"
response = requests.get(url)
if response.ok:
soup = BeautifulSoup(response.text, "html.parser")
posts = soup.select("div.v1Nh3 a")
for post in posts:
post_url = f"https://www.instagram.com{post['href']}"
caption = post.select_one("div > span").text
likes = post.select_one("div.Nm9Fw > button > span").text
print(f"Post URL: {post_url}\nCaption: {caption}\nLikes: {likes}\n")
else:
print(f"Failed to get user data for {username}.")
使用Python可以获取和处理Instagram平台数据,可以通过API和爬虫两种方式。需要遵守Instagram的数据使用政策,以确保数据安全和合法。如果您想要使用Python来处理Instagram数据,可以根据需要选择使用API或爬虫来实现。