BeautifulSoup – 修改树
先决条件: BeautifulSoup
Beautifulsoup 是一个用于网页抓取的Python库。这个强大的Python工具也可以用来修改html网页。本文描述了如何使用 beautifulsoup 来修改解析树。 BeautifulSoup 用于搜索解析树并允许您修改树。您可以重命名标签、更改其属性值、添加和删除属性。
修改标签名称及其属性
您可以更改标签的名称并通过添加或删除它们来修改其属性。
- 要更改标签名称:
Syntax: tag.name = “new_tag”
- 要修改其属性或添加新属性:
Syntax: tag[“attribute”] = “value”
- 要删除任何属性:
Syntax: del tag[“attribute”]
还可以通过在所需位置插入新元素来修改树。
- insert()函数会在任意位置插入新元素
Syntax: tag.insert()
- insert_after()函数将在解析树中的某些内容之后插入元素。
Syntax: tag.insert_after()
- insert_before()函数将在解析树中的某些内容之前插入元素。
Syntax: tag.insert_before()
方法 :
- 导入模块
- 从网页中抓取数据
- 解析抓取到 html 的字符串
- 选择必须在其中执行修改的标签
- 进行必要的更改
示例 1:
Python3
# importing module
from bs4 import BeautifulSoup
markup = """gfg
"""
# parsering string to HTML
soup = BeautifulSoup(markup, 'html.parser')
# extracting a tag
tag = soup.p
print("Before modifying the tag name: ")
print(tag)
print()
# modifying tag name
tag.name = "div"
print("After modifying the tag name: ")
print(tag)
print()
# modifying its class attribute
tag['class'] = "div_class"
# adding new attribute
tag['id'] = "div_id"
print("After modifying and adding attributes: ")
print(tag)
print()
# to delete any attributes
del tag["class"]
print("After deleting class attribute: ")
print(tag)
print()
# modifying the tags content
tag.string = "Geeks"
print("After modifying tag string: ")
print(tag)
print()
# using insert function.
tag = soup.div
print("Before inserting: ")
print(tag)
print()
# inserting content
tag.insert(1, " for Geeks")
print("After inserting: ")
print(tag)
print()
Python3
# importing module
from bs4 import BeautifulSoup
soup = BeautifulSoup("| A Computer Science portal", 'html.parser')
tag = soup.new_tag("p")
tag.string = "Geeks"
# insert before
soup.b.string.insert_before(tag)
print(soup.b)
print()
# insert after
soup.b.p.insert_after(soup.new_string(" for Geeks"))
print(soup.b)
Python3
# importing module
from bs4 import BeautifulSoup
markup = '
Geeks for Geeks
'
# parsering string to HTML
soup = BeautifulSoup(markup, 'html.parser')
print(soup)
# wrapping around the string
soup.p.string.wrap(soup.new_tag("i"))
print(soup)
# wrapping around the tag
soup.p.wrap(soup.new_tag("div"))
print(soup)
# unwrapping the i tag
soup.p.i.unwrap()
print(soup)
old_tag = soup.div
# new tag
new_tag = soup.new_tag('div')
new_tag.string = "| A Computer Science portal for geeks"
# adding new tag
old_tag.append(new_tag)
print(soup)
Python3
# importing BeautifulSoup Module
from bs4 import BeautifulSoup
markup = 'Geeks for Geeks gfg.com'
# parsering string to HTML
soup = BeautifulSoup(markup, 'html.parser')
# tag to be replaced
old_tag = soup.a
# new tag
new_tag = soup.new_tag("p")
# input string
new_tag.string = "gfg.in"
'''replacing tag page_element.replace_with("string")
removes a tag or string from the tree, and replaces
it with the tag or string of your choice.'''
old_tag.i.replace_with(new_tag)
print(old_tag)
Python3
# importing module
from bs4 import BeautifulSoup
from bs4 import NavigableString
markup = """Geeks for Geeks"""
# parsering string to HTML
soup = BeautifulSoup(markup, 'html.parser')
# extracting a tag
tag = soup.a
# appending content
tag.append("| A Computer Science portal")
print(tag)
# appending content using navigableString constructor
new_str = NavigableString(" for geeks")
tag.append(new_str)
print(tag)
Python3
# importing module
from bs4 import BeautifulSoup
markup = 'Geeks for Geeks | A Computer Science portal'
# parsering string to HTML
soup = BeautifulSoup(markup, 'html.parser')
tag = soup.a
print(tag)
print()
# clearing its all content
tag.clear()
print(tag)
print()
# extracting i tag
# parsering string to HTML
soup2 = BeautifulSoup(markup, 'html.parser')
a_tag = soup2.a
print(a_tag)
print()
i_tag = soup2.i.extract()
print(a_tag)
print()
# decomposing i tag
# parsering string to HTML
soup2 = BeautifulSoup(markup, 'html.parser')
a_tag = soup2.a
print(a_tag)
print()
i_tag = soup2.i.decompose()
print(a_tag)
输出:
示例 2:
蟒蛇3
# importing module
from bs4 import BeautifulSoup
soup = BeautifulSoup("| A Computer Science portal", 'html.parser')
tag = soup.new_tag("p")
tag.string = "Geeks"
# insert before
soup.b.string.insert_before(tag)
print(soup.b)
print()
# insert after
soup.b.p.insert_after(soup.new_string(" for Geeks"))
print(soup.b)
输出:
添加新标签和包装元素
可以通过在任何需要的位置添加新标签来修改树。我们也可以包裹元素来修改它。
- new_tag()函数将添加一个新标签
Syntax: new_tag(“attribute”)
- wrap()函数将在您指定的标签中包含一个元素并返回一个新的包装器
Syntax: wrap()
- unwrap()函数解开包装的元素。
Syntax: unwrap()
例子:
蟒蛇3
# importing module
from bs4 import BeautifulSoup
markup = '
Geeks for Geeks
'
# parsering string to HTML
soup = BeautifulSoup(markup, 'html.parser')
print(soup)
# wrapping around the string
soup.p.string.wrap(soup.new_tag("i"))
print(soup)
# wrapping around the tag
soup.p.wrap(soup.new_tag("div"))
print(soup)
# unwrapping the i tag
soup.p.i.unwrap()
print(soup)
old_tag = soup.div
# new tag
new_tag = soup.new_tag('div')
new_tag.string = "| A Computer Science portal for geeks"
# adding new tag
old_tag.append(new_tag)
print(soup)
输出:
替换元素
replace_with()函数将在解析树中用新标签或字符串替换旧标签或字符串。
Syntax: replace_with()
例子:
蟒蛇3
# importing BeautifulSoup Module
from bs4 import BeautifulSoup
markup = 'Geeks for Geeks gfg.com'
# parsering string to HTML
soup = BeautifulSoup(markup, 'html.parser')
# tag to be replaced
old_tag = soup.a
# new tag
new_tag = soup.new_tag("p")
# input string
new_tag.string = "gfg.in"
'''replacing tag page_element.replace_with("string")
removes a tag or string from the tree, and replaces
it with the tag or string of your choice.'''
old_tag.i.replace_with(new_tag)
print(old_tag)
输出:
向现有标签添加新内容
可以通过 append()函数或 NavigableString() 构造函数向现有标签添加新内容。
Syntax: tag.append(“content”)
例子:
蟒蛇3
# importing module
from bs4 import BeautifulSoup
from bs4 import NavigableString
markup = """Geeks for Geeks"""
# parsering string to HTML
soup = BeautifulSoup(markup, 'html.parser')
# extracting a tag
tag = soup.a
# appending content
tag.append("| A Computer Science portal")
print(tag)
# appending content using navigableString constructor
new_str = NavigableString(" for geeks")
tag.append(new_str)
print(tag)
输出:
删除内容和元素
可以通过从中删除内容或删除元素来修改树。
- clear() 删除标签的内容。
Syntax: clear()
- extract() 从树中删除一个标签或字符串。
Syntax: extract()
- 分解()删除标签并删除它的所有内容。
Syntax: decompose()
例子:
蟒蛇3
# importing module
from bs4 import BeautifulSoup
markup = 'Geeks for Geeks | A Computer Science portal'
# parsering string to HTML
soup = BeautifulSoup(markup, 'html.parser')
tag = soup.a
print(tag)
print()
# clearing its all content
tag.clear()
print(tag)
print()
# extracting i tag
# parsering string to HTML
soup2 = BeautifulSoup(markup, 'html.parser')
a_tag = soup2.a
print(a_tag)
print()
i_tag = soup2.i.extract()
print(a_tag)
print()
# decomposing i tag
# parsering string to HTML
soup2 = BeautifulSoup(markup, 'html.parser')
a_tag = soup2.a
print(a_tag)
print()
i_tag = soup2.i.decompose()
print(a_tag)
输出: