📜  如何使用Python BeautifulSoup 将输出写入 HTML 文件?

📅  最后修改于: 2022-05-13 01:54:49.550000             🧑  作者: Mango

如何使用Python BeautifulSoup 将输出写入 HTML 文件?

在本文中,我们将使用Python BeautifulSoup 将输出写入 HTML 文件。 BeautifulSoup 是一个主要用于网页抓取的Python库,但在本文中,我们将讨论如何将输出写入 HTML 文件。

需要的模块和安装:

pip install bs4

方法:

  • 我们将首先导入所有必需的库。
  • 向所需的 URL 发出 get 请求并提取其页面内容。
  • 使用Python的文件数据类型将输出写入一个新文件。

应遵循的步骤:

第 1 步:导入所需的库。



Python3
# Import libraries
from bs4 import BeautifulSoup
import requests


Python3
# set the url to perform the get request
URL = 'https://www.geeksforgeeks.org/how-to-scrape-all-pdf-files-in-a-website/'
page = requests.get(URL)
  
# load the page content
text = page.content
  
# make a soup object by using beautiful
# soup and set the markup as html parser
soup = BeautifulSoup(text, "html.parser")


Python3
# open the file in w mode
# set encoding to UTF-8
with open("output.html", "w", encoding = 'utf-8') as file:
    
    # prettify the soup object and convert it into a string  
    file.write(str(soup.prettify()))


Python3
# Import libraries
from bs4 import BeautifulSoup
import requests
  
# set the url to perform the get request
URL = 'https://www.geeksforgeeks.org/how-to-scrape-all-pdf-files-in-a-website/'
page = requests.get(URL)
  
# load the page content
text = page.content
  
# make a soup object by using
# beautiful soup and set the markup as html parser
soup = BeautifulSoup(text, "html.parser")
  
# open the file in w mode
# set encoding to UTF-8
with open("output.html", "w", encoding = 'utf-8') as file:
    
    # prettify the soup object and convert it into a string
    file.write(str(soup.prettify()))


第2步:我们将对Google搜索引擎主页执行get请求并提取其页面内容并通过将其传递给beautiful soup从中制作一个soup对象,我们将标记设置为html.parser。

注意:如果您要提取 xml 页面,请将标记设置为 xml.parser

蟒蛇3

# set the url to perform the get request
URL = 'https://www.geeksforgeeks.org/how-to-scrape-all-pdf-files-in-a-website/'
page = requests.get(URL)
  
# load the page content
text = page.content
  
# make a soup object by using beautiful
# soup and set the markup as html parser
soup = BeautifulSoup(text, "html.parser")

第三步:我们使用Python的文件数据类型,在输出文件中写入soup对象。我们将编码设置为 UTF-8。我们将在汤对象上使用 .prettify()函数,这将使其更易于阅读。在写入之前,我们将把汤对象转换为字符串。

我们将输出文件存储在名称为 output.html 的同一目录中

蟒蛇3

# open the file in w mode
# set encoding to UTF-8
with open("output.html", "w", encoding = 'utf-8') as file:
    
    # prettify the soup object and convert it into a string  
    file.write(str(soup.prettify()))

下面是完整的实现:

蟒蛇3

# Import libraries
from bs4 import BeautifulSoup
import requests
  
# set the url to perform the get request
URL = 'https://www.geeksforgeeks.org/how-to-scrape-all-pdf-files-in-a-website/'
page = requests.get(URL)
  
# load the page content
text = page.content
  
# make a soup object by using
# beautiful soup and set the markup as html parser
soup = BeautifulSoup(text, "html.parser")
  
# open the file in w mode
# set encoding to UTF-8
with open("output.html", "w", encoding = 'utf-8') as file:
    
    # prettify the soup object and convert it into a string
    file.write(str(soup.prettify()))

输出: