使用Python将 HTML 源代码转换为 JSON 对象

在这篇文章中，我们将看到如何将 HTML 源代码转换为 JSON 对象。 JSON 对象可以轻松传输，并且大多数现代编程语言都支持它们。我们可以从 Javascript 中读取 JSON 并将其轻松解析为 Javascript 对象。 Javascript 可用于为您的网页制作 HTML。

我们将在这篇文章中使用xmltojson模块。该模块的 parse函数将 HTML 作为输入并返回解析后的 JSON字符串。

Syntax: xmltojson.parse(xml_input, xml_attribs=True, item_depth=0, item_callback)

Parameters:

xml_input can be either a file or a string.
xml_attribs will include attributes if set to True. Otherwise, ignore them if set to False.
item_depth is the depth of children for which item_callback function is called when found.
item_callback is a callback function

编程需要懂一点英语

环境设置：

安装所需的模块：

pip install xmltojson
pip install requests

脚步：

导入库

Python3

import xmltojson
import json
import requests

Python3

# Sample URL to fetch the html page
url = "https://geeksforgeeks-example.surge.sh"
  
# Headers to mimic the browser
headers = {
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 \
    (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'
}
  
# Get the page through get() method
html_response = requests.get(url=url, headers = headers)
  
# Save the page content as sample.html
with open("sample.html", "w") as html_file:
    html_file.write(html_response.text)

Python3

with open("sample.html", "r") as html_file:
    html = html_file.read()
    json_ = xmltojson.parse(html)

Python3

with open("data.json", "w") as file:
    json.dump(json_, file)

Python3

print(json_)

Python3

import xmltojson
import json
import requests
  
  
# Sample URL to fetch the html page
url = "https://geeksforgeeks-example.surge.sh"
  
# Headers to mimic the browser
headers = {
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 \
    (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'
}
  
# Get the page through get() method
html_response = requests.get(url=url, headers = headers)
  
# Save the page content as sample.html
with open("sample.html", "w") as html_file:
    html_file.write(html_response.text)
      
with open("sample.html", "r") as html_file:
    html = html_file.read()
    json_ = xmltojson.parse(html)
      
with open("data.json", "w") as file:
    json.dump(json_, file)
      
print(json_)

获取 HTML 代码并将其保存到文件中。

蟒蛇3

# Sample URL to fetch the html page
url = "https://geeksforgeeks-example.surge.sh"
  
# Headers to mimic the browser
headers = {
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 \
    (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'
}
  
# Get the page through get() method
html_response = requests.get(url=url, headers = headers)
  
# Save the page content as sample.html
with open("sample.html", "w") as html_file:
    html_file.write(html_response.text)

使用 parse函数将此 HTML 转换为 JSON。打开 HTML 文件，使用xmltojson模块的解析函数。

蟒蛇3

with open("sample.html", "r") as html_file:
    html = html_file.read()
    json_ = xmltojson.parse(html)

json_变量包含一个 JSON字符串，我们可以将其打印或转储到文件中。

蟒蛇3

with open("data.json", "w") as file:
    json.dump(json_, file)

打印输出。

蟒蛇3

print(json_)

完整代码：

蟒蛇3

import xmltojson
import json
import requests
  
  
# Sample URL to fetch the html page
url = "https://geeksforgeeks-example.surge.sh"
  
# Headers to mimic the browser
headers = {
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 \
    (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'
}
  
# Get the page through get() method
html_response = requests.get(url=url, headers = headers)
  
# Save the page content as sample.html
with open("sample.html", "w") as html_file:
    html_file.write(html_response.text)
      
with open("sample.html", "r") as html_file:
    html = html_file.read()
    json_ = xmltojson.parse(html)
      
with open("data.json", "w") as file:
    json.dump(json_, file)
      
print(json_)

输出：

{“html”: {“@lang”: “en”, “head”: {“title”: “Document”}, “body”: {“div”: {“h1”: “Geeks For Geeks”, “p”:

“Welcome to the world of programming geeks!”, “input”: [{“@type”: “text”, “@placeholder”: “Enter your name”},

{“@type”: “button”, “@value”: “submit”}]}}}}

编程需要懂一点英语