使用 BeautifulSoup 抓取 Covid-19 统计数据
冠状病毒是最大的流行病之一,已将全世界置于危险之中。除此之外,它是热门新闻之一,每个人都有这一天。在本文中,我们将以人类可读的形式抓取数据并打印 Covid-19 统计数据。数据将从本网站抓取
先决条件:
- 必须安装库“requests”、“bs4”和“texttable”——
pip install bs4
pip install requests
pip install texttable
项目:让我们开始编写代码,创建一个名为 run.py 的文件。
Python3
# importing modules
import requests
from bs4 import BeautifulSoup
# URL for scrapping data
url = 'https://www.worldometers.info/coronavirus/countries-where-coronavirus-has-spread/'
# get URL html
page = requests.get(url)
soup = BeautifulSoup(page.text, 'html.parser')
data = []
# soup.find_all('td') will scrape every
# element in the url's table
data_iterator = iter(soup.find_all('td'))
# data_iterator is the iterator of the table
# This loop will keep repeating till there is
# data available in the iterator
while True:
try:
country = next(data_iterator).text
confirmed = next(data_iterator).text
deaths = next(data_iterator).text
continent = next(data_iterator).text
# For 'confirmed' and 'deaths',
# make sure to remove the commas
# and convert to int
data.append((
country,
int(confirmed.replace(', ', '')),
int(deaths.replace(', ', '')),
continent
))
# StopIteration error is raised when
# there are no more elements left to
# iterate through
except StopIteration:
break
# Sort the data by the number of confirmed cases
data.sort(key = lambda row: row[1], reverse = True)
Python3
# create texttable object
import texttable as tt
table = tt.Texttable()
# Add an empty row at the beginning for the headers
table.add_rows([(None, None, None, None)] + data)
# 'l' denotes left, 'c' denotes center,
# and 'r' denotes right
table.set_cols_align(('c', 'c', 'c', 'c'))
table.header((' Country ', ' Number of cases ', ' Deaths ', ' Continent '))
print(table.draw())
为了以人类可读的格式打印数据,我们将使用库“ texttable ”
Python3
# create texttable object
import texttable as tt
table = tt.Texttable()
# Add an empty row at the beginning for the headers
table.add_rows([(None, None, None, None)] + data)
# 'l' denotes left, 'c' denotes center,
# and 'r' denotes right
table.set_cols_align(('c', 'c', 'c', 'c'))
table.header((' Country ', ' Number of cases ', ' Deaths ', ' Continent '))
print(table.draw())
输出:
+---------------------------+-------------------+----------+-------------------+
| Country | Number of cases | Deaths | Continent |
+===========================+===================+==========+===================+
| United States | 644348 | 28554 | North America |
+---------------------------+-------------------+----------+-------------------+
| Spain | 180659 | 18812 | Europe |
+---------------------------+-------------------+----------+-------------------+
| Italy | 165155 | 21645 | Europe |
+---------------------------+-------------------+----------+-------------------+
...
注意:输出取决于当前统计信息
待在家里,注意安全!