如何使用Python BeautifulSoup 将输出写入 HTML 文件？

在本文中，我们将使用Python BeautifulSoup 将输出写入 HTML 文件。 BeautifulSoup 是一个主要用于网页抓取的Python库，但在本文中，我们将讨论如何将输出写入 HTML 文件。

需要的模块和安装：

pip install bs4

方法：

我们将首先导入所有必需的库。
向所需的 URL 发出 get 请求并提取其页面内容。
使用Python的文件数据类型将输出写入一个新文件。

应遵循的步骤：

第 1 步：导入所需的库。

Python3

# Import libraries
from bs4 import BeautifulSoup
import requests

Python3

# set the url to perform the get request
URL = 'https://www.geeksforgeeks.org/how-to-scrape-all-pdf-files-in-a-website/'
page = requests.get(URL)
  
# load the page content
text = page.content
  
# make a soup object by using beautiful
# soup and set the markup as html parser
soup = BeautifulSoup(text, "html.parser")

Python3

# open the file in w mode
# set encoding to UTF-8
with open("output.html", "w", encoding = 'utf-8') as file:
    
    # prettify the soup object and convert it into a string  
    file.write(str(soup.prettify()))

Python3

# Import libraries
from bs4 import BeautifulSoup
import requests
  
# set the url to perform the get request
URL = 'https://www.geeksforgeeks.org/how-to-scrape-all-pdf-files-in-a-website/'
page = requests.get(URL)
  
# load the page content
text = page.content
  
# make a soup object by using
# beautiful soup and set the markup as html parser
soup = BeautifulSoup(text, "html.parser")
  
# open the file in w mode
# set encoding to UTF-8
with open("output.html", "w", encoding = 'utf-8') as file:
    
    # prettify the soup object and convert it into a string
    file.write(str(soup.prettify()))

第2步：我们将对Google搜索引擎主页执行get请求并提取其页面内容并通过将其传递给beautiful soup从中制作一个soup对象，我们将标记设置为html.parser。

注意：如果您要提取 xml 页面，请将标记设置为 xml.parser

蟒蛇3

# set the url to perform the get request
URL = 'https://www.geeksforgeeks.org/how-to-scrape-all-pdf-files-in-a-website/'
page = requests.get(URL)
  
# load the page content
text = page.content
  
# make a soup object by using beautiful
# soup and set the markup as html parser
soup = BeautifulSoup(text, "html.parser")

第三步：我们使用Python的文件数据类型，在输出文件中写入soup对象。我们将编码设置为 UTF-8。我们将在汤对象上使用 .prettify()函数，这将使其更易于阅读。在写入之前，我们将把汤对象转换为字符串。

我们将输出文件存储在名称为 output.html 的同一目录中

蟒蛇3

# open the file in w mode
# set encoding to UTF-8
with open("output.html", "w", encoding = 'utf-8') as file:
    
    # prettify the soup object and convert it into a string  
    file.write(str(soup.prettify()))

下面是完整的实现：

蟒蛇3

# Import libraries
from bs4 import BeautifulSoup
import requests
  
# set the url to perform the get request
URL = 'https://www.geeksforgeeks.org/how-to-scrape-all-pdf-files-in-a-website/'
page = requests.get(URL)
  
# load the page content
text = page.content
  
# make a soup object by using
# beautiful soup and set the markup as html parser
soup = BeautifulSoup(text, "html.parser")
  
# open the file in w mode
# set encoding to UTF-8
with open("output.html", "w", encoding = 'utf-8') as file:
    
    # prettify the soup object and convert it into a string
    file.write(str(soup.prettify()))

输出：