📅  最后修改于: 2023-12-03 15:37:56.391000             🧑  作者: Mango
在进行数据分析、机器学习等任务时,经常需要从网站中获取数据。如果只是简单地从网页源码中复制粘贴数据,工作量会很大,而且容易出错。本文介绍如何使用 Python 中的 BeautifulSoup 库来从网站中提取段落并将其保存为文本文件。
!pip install beautifulsoup4
from urllib.request import urlopen
from bs4 import BeautifulSoup
url = "https://www.example.com"
page = urlopen(url)
html = page.read().decode("utf-8")
soup = BeautifulSoup(html, "html.parser")
paragraphs = soup.find_all("p")
with open("paragraphs.txt", "w") as file:
for p in paragraphs:
file.write(p.get_text() + "\n")
from urllib.request import urlopen
from bs4 import BeautifulSoup
url = "https://www.example.com"
page = urlopen(url)
html = page.read().decode("utf-8")
soup = BeautifulSoup(html, "html.parser")
paragraphs = soup.find_all("p")
with open("paragraphs.txt", "w") as file:
for p in paragraphs:
file.write(p.get_text() + "\n")
使用 BeautifulSoup 库可以轻松地从网站中提取段落并将其保存为文本文件。这样做不仅可以提高效率,还可以减少错误发生的概率。