如何使用 BeautifulSoup 修改 HTML?
美汤 在Python有助于从由 HTML 或 XML 组成的网页中抓取信息。它不仅涉及刮取数据,还涉及搜索、修改和迭代解析树。在本文中,我们将讨论使用 BeautifulSoup 直接在 HTML 网页上修改内容。
句法:
old_text=soup.find(“#Widget”, {“id”:”#Id name of widget in which you want to edit”})
new_text=old_text.find(text=re.compile(‘#Text which you want to edit’)).replace_with(‘#New text which you want to replace with’)
使用的术语:
- 小部件:在这里,小部件代表您希望从网站替换的文本当前存储在其中的特定小部件。
- Id Name:这里,Id Name 代表您为存储文本的特定小部件的 Id 指定的名称。
例子:
例如,考虑这个简单的页面源。
HTML
My First Heading
Geeks For Geeks
Python
# Python program to modify HTML
# with the help of Beautiful Soup
# Import the libraries
from bs4 import BeautifulSoup as bs
import os
import re
# Remove the last segment of the path
base = os.path.dirname(os.path.abspath(__file__))
# Open the HTML in which you want to make changes
html = open(os.path.join(base, 'gfg.html'))
# Parse HTML file in Beautiful Soup
soup = bs(html, 'html.parser')
# Give location where text is
# stored which you wish to alter
old_text = soup.find("p", {"id": "para"})
# Replace the already stored text with
# the new text which you wish to assign
new_text = old_text.find(text=re.compile(
'Geeks For Geeks')).replace_with('Vinayak Rai')
# Alter HTML file to see the changes done
with open("gfg.html", "wb") as f_output:
f_output.write(soup.prettify("utf-8"))
创建驱动程序后,您可以使用以下命令将文本“ Geeks For Geeks ”替换为“ Vinayak Rai ” –
old_text=soup.find(“p”, {“id”:”para”})
new_text=old_text.find(text=re.compile(‘Geeks For Geeks’)).replace_with(‘Vinayak Rai’)
循序渐进的方法:
步骤 1:首先,导入库 Beautiful Soup、os 和 re。
from bs4 import BeautifulSoup as bs
import os
import re
第 2 步:现在,删除路径的最后一段。
base=os.path.dirname(os.path.abspath(__file__))
第 3 步:然后,打开要在其中进行更改的 HTML 文件。
html=open(os.path.join(base, ‘#Name of HTML file in which you want to edit’))
第 4 步:此外,解析 Beautiful Soup 中的 HTML 文件。
soup=bs(html, ‘html.parser’)
第 5 步:此外,给出要替换的文本的适当位置。
old_text=soup.find(“#Widget Name”, {“id”:”#Id name of widget in which you want to edit”})
第 6 步:接下来,用您要分配的新文本替换已存储的文本。
new_text=old_text.find(text=re.compile(‘#Text which you want to edit’)).replace_with(‘#New Text which you want to replace with’)
第 7 步:最后,更改 HTML 文件以查看上一步中所做的更改。
with open(“#Name of HTML file in which you want to store the edited text”, “wb”) as f_output:
f_output.write(soup.prettify(“utf-8”))
执行:
Python
# Python program to modify HTML
# with the help of Beautiful Soup
# Import the libraries
from bs4 import BeautifulSoup as bs
import os
import re
# Remove the last segment of the path
base = os.path.dirname(os.path.abspath(__file__))
# Open the HTML in which you want to make changes
html = open(os.path.join(base, 'gfg.html'))
# Parse HTML file in Beautiful Soup
soup = bs(html, 'html.parser')
# Give location where text is
# stored which you wish to alter
old_text = soup.find("p", {"id": "para"})
# Replace the already stored text with
# the new text which you wish to assign
new_text = old_text.find(text=re.compile(
'Geeks For Geeks')).replace_with('Vinayak Rai')
# Alter HTML file to see the changes done
with open("gfg.html", "wb") as f_output:
f_output.write(soup.prettify("utf-8"))
输出: