创建 GitHub API 以使用Python和 Flask 获取用户个人资料图像和存储库数量
GitHub 是开发人员共同塑造软件的未来、为开源社区做出贡献、管理 Git 存储库等的地方。它是开发人员最常用的工具之一,它的个人资料被共享以展示或让其他人为其做出贡献项目。使用Python进行 Web Scraping 也是获取数据的最佳方法之一。
在本文中,我们将创建一个 API 来获取用户的个人资料图片及其关注者。以下是本博客将指导创建 API 的流程:
- 设置应用目录
- 来自 GitHub 的网页抓取数据。
- 将使用Python中的 Beautiful Soup。
- 创建一个 API。
- 将使用烧瓶。
设置应用目录
第 1 步:创建一个文件夹(例如 GitHubGFG)。
第二步:搭建虚拟环境。这里我们创建一个环境 .env
python -m venv .env
第三步:激活环境。
.env\Scripts\activate
抓取数据
第 1 步:在Python中,我们有 Beautiful Soup,这是一个从 HTML 文件中提取数据的库。要安装 Beautiful Soup,运行一个简单的命令;
pip install beautifulsoup4
第二步:安装Python的 Requests 模块。 Requests 允许非常轻松地发送 HTTP/1.1 请求。
pip install requests
创建一个Python文件。 (例如:github.py)
第 3 步:以下是从网页中抓取数据的步骤。从网页中获取 HTML 文本;
github_html = requests.get(f'https://github.com/{username}').text
{username} 将具有所需用户的 GitHub 用户名。为了将解析后的对象表示为一个整体,我们使用 BeautifulSoup 对象,
soup = BeautifulSoup(github_html, "html.parser")
例子:
Python3
from bs4 import BeautifulSoup
import requests
username = "kothawleprem"
github_html = requests.get(f'https://github.com/{username}').text
soup = BeautifulSoup(github_html, "html.parser")
print(soup)
Python3
avatar_block = soup.find_all('img',class_='avatar')
print(avatar_block)
Python3
img_url = avatar_block[4].get('src')
print(img_url)
Python3
from bs4 import BeautifulSoup
import requests
username = "kothawleprem"
github_html = requests.get(f'https://github.com/{username}').text
soup = BeautifulSoup(github_html, "html.parser")
avatar_block = soup.find_all('img',class_='avatar')
img_url = avatar_block[4].get('src')
repos = soup.find('span',class_="Counter").text
print(img_url)
print(repos)
Python3
# We import the Flask Class, an instance of
# this class will be our WSGI application.
from flask import Flask
# We create an instance of this class. The first
# argument is the name of the application’s module
# or package.
# __name__ is a convenient shortcut for
# this that is appropriate for most cases.This is
# needed so that Flask knows where to look for resources
# such as templates and static files.
app = Flask(__name__)
# We use the route() decorator to tell Flask what URL
# should trigger our function.
@app.route('/')
def github():
return "Welcome to GitHubGFG!"
# main driver function
if __name__ == "__main__":
# run() method of Flask class runs the
# application on the local development server.
app.run(debug=True)
Python3
from flask import Flask
app = Flask(__name__)
@app.route('/')
def github(username):
return f"Username: {username}"
if __name__ == "__main__":
app.run(debug=True)
Python3
import requests
from bs4 import BeautifulSoup
from flask import Flask
app = Flask(__name__)
@app.route('/')
def github(username):
github_html = requests.get(f'https://github.com/{username}').text
soup = BeautifulSoup(github_html, "html.parser")
avatar_block = soup.find_all('img',class_='avatar')
img_url = avatar_block[4].get('src')
repos = soup.find('span',class_="Counter").text
# Creating a dictionary for our data
result = {
'imgUrl' : img_url,
'numRepos' : repos,
}
return result
if __name__ == "__main__":
app.run(debug=True)
Python3
import requests
from bs4 import BeautifulSoup
from flask import Flask
app = Flask(__name__)
@app.route('/')
def github(username):
try:
github_html = requests.get(f'https://github.com/{username}').text
soup = BeautifulSoup(github_html, "html.parser")
avatar_block = soup.find_all('img',class_='avatar')
img_url = avatar_block[4].get('src')
repos = soup.find('span',class_="Counter").text
# Creating a dictionary for our data
result = {
'imgUrl' : img_url,
'numRepos' : repos,
}
except:
result = {
"message": "Invalid Username!"
}, 400
return result
if __name__ == "__main__":
app.run(debug=True)
输出:
现在在 HTML 文档中找到头像类,因为它具有配置文件图像所需的 URL。
find_all(): find_all() 方法查看标签的后代并检索与过滤器匹配的所有后代。这里我们的过滤器是一个 img 标签,类为头像。
Python3
avatar_block = soup.find_all('img',class_='avatar')
print(avatar_block)
以下是 avatar_block 的输出:
图片 URL 位于 src 属性中,使用 .get() 获取 URL 文本:
Python3
img_url = avatar_block[4].get('src')
print(img_url)
以下是 img_url 的输出:
在 HTML 文档中找到第一个 Counter 类,因为它具有存储库数量所需的数据。
find(): find() 方法查看标签的后代并检索与过滤器匹配的单个后代。这里我们的过滤器是一个跨度标签,其类为 Counter。
repos = soup.find('span',class_="Counter").text
整个代码如下:
Python3
from bs4 import BeautifulSoup
import requests
username = "kothawleprem"
github_html = requests.get(f'https://github.com/{username}').text
soup = BeautifulSoup(github_html, "html.parser")
avatar_block = soup.find_all('img',class_='avatar')
img_url = avatar_block[4].get('src')
repos = soup.find('span',class_="Counter").text
print(img_url)
print(repos)
输出:
https://avatars.githubusercontent.com/u/59017652?v=4
33
创建 API
我们将使用 Flask,这是一个用Python编写的微型 Web 框架。
pip install Flask
以下是我们的烧瓶应用程序的启动代码。
Python3
# We import the Flask Class, an instance of
# this class will be our WSGI application.
from flask import Flask
# We create an instance of this class. The first
# argument is the name of the application’s module
# or package.
# __name__ is a convenient shortcut for
# this that is appropriate for most cases.This is
# needed so that Flask knows where to look for resources
# such as templates and static files.
app = Flask(__name__)
# We use the route() decorator to tell Flask what URL
# should trigger our function.
@app.route('/')
def github():
return "Welcome to GitHubGFG!"
# main driver function
if __name__ == "__main__":
# run() method of Flask class runs the
# application on the local development server.
app.run(debug=True)
在浏览器上打开 localhost:
从 URL 获取 GitHub 用户名:
Python3
from flask import Flask
app = Flask(__name__)
@app.route('/')
def github(username):
return f"Username: {username}"
if __name__ == "__main__":
app.run(debug=True)
输出:
我们现在将添加我们的 Web Scrapping 代码和 Flask 提供的一些辅助方法来正确返回 JSON 数据。 jsonify 是 Flask 中的一个函数。它将数据序列化为 JavaScript Object Notation (JSON) 格式。考虑以下代码:
Python3
import requests
from bs4 import BeautifulSoup
from flask import Flask
app = Flask(__name__)
@app.route('/')
def github(username):
github_html = requests.get(f'https://github.com/{username}').text
soup = BeautifulSoup(github_html, "html.parser")
avatar_block = soup.find_all('img',class_='avatar')
img_url = avatar_block[4].get('src')
repos = soup.find('span',class_="Counter").text
# Creating a dictionary for our data
result = {
'imgUrl' : img_url,
'numRepos' : repos,
}
return result
if __name__ == "__main__":
app.run(debug=True)
输出:
如果用户名不正确或出于任何其他原因,我们需要在 try 和 except 块中添加我们的代码来处理异常。最终代码如下:
Python3
import requests
from bs4 import BeautifulSoup
from flask import Flask
app = Flask(__name__)
@app.route('/')
def github(username):
try:
github_html = requests.get(f'https://github.com/{username}').text
soup = BeautifulSoup(github_html, "html.parser")
avatar_block = soup.find_all('img',class_='avatar')
img_url = avatar_block[4].get('src')
repos = soup.find('span',class_="Counter").text
# Creating a dictionary for our data
result = {
'imgUrl' : img_url,
'numRepos' : repos,
}
except:
result = {
"message": "Invalid Username!"
}, 400
return result
if __name__ == "__main__":
app.run(debug=True)