📜  BeautifulSoup – 修改树

📅  最后修改于: 2022-05-13 01:55:11.021000             🧑  作者: Mango

BeautifulSoup – 修改树

先决条件: BeautifulSoup

Beautifulsoup 是一个用于网页抓取的Python库。这个强大的Python工具也可以用来修改html网页。本文描述了如何使用 beautifulsoup 来修改解析树。 BeautifulSoup 用于搜索解析树并允许您修改树。您可以重命名标签、更改其属性值、添加和删除属性。

修改标签名称及其属性

您可以更改标签的名称并通过添加或删除它们来修改其属性。

  • 要更改标签名称:
  • 要修改其属性或添加新属性:
  • 要删除任何属性:

还可以通过在所需位置插入新元素来修改树。

  • insert()函数会在任意位置插入新元素
  • insert_after()函数将在解析树中的某些内容之后插入元素。
  • insert_before()函数将在解析树中的某些内容之前插入元素。

方法 :

  • 导入模块
  • 从网页中抓取数据
  • 解析抓取到 html 的字符串
  • 选择必须在其中执行修改的标签
  • 进行必要的更改

示例 1:

Python3
# importing module
from bs4 import BeautifulSoup
 
markup = """

gfg

        """   # parsering string to HTML soup = BeautifulSoup(markup, 'html.parser')   # extracting a tag tag = soup.p   print("Before modifying the tag name: ") print(tag) print()   # modifying tag name tag.name = "div"   print("After modifying the tag name: ") print(tag) print() # modifying its class attribute tag['class'] = "div_class"   # adding new attribute tag['id'] = "div_id"   print("After modifying and adding attributes: ") print(tag) print()   # to delete any attributes del tag["class"]   print("After deleting class attribute: ") print(tag) print()   # modifying the tags content tag.string = "Geeks"   print("After modifying tag string: ") print(tag) print()   # using insert function. tag = soup.div print("Before inserting: ") print(tag) print()   # inserting content tag.insert(1, " for Geeks") print("After inserting: ") print(tag) print()


Python3
# importing module
from bs4 import BeautifulSoup
 
soup = BeautifulSoup("| A Computer Science portal", 'html.parser')
 
tag = soup.new_tag("p")
tag.string = "Geeks"
 
 
# insert before
soup.b.string.insert_before(tag)
print(soup.b)
print()
 
# insert after
soup.b.p.insert_after(soup.new_string(" for Geeks"))
print(soup.b)


Python3
# importing module
from bs4 import BeautifulSoup
 
markup = '
 
 
 
 

Geeks for Geeks

        '   # parsering string to HTML soup = BeautifulSoup(markup, 'html.parser') print(soup)   # wrapping around the string soup.p.string.wrap(soup.new_tag("i")) print(soup)   # wrapping around the tag soup.p.wrap(soup.new_tag("div")) print(soup)   # unwrapping the i tag   soup.p.i.unwrap()   print(soup)   old_tag = soup.div   # new tag new_tag = soup.new_tag('div') new_tag.string = "| A Computer Science portal for geeks"   # adding new tag old_tag.append(new_tag)   print(soup)


Python3
# importing BeautifulSoup Module
from bs4 import BeautifulSoup
 
markup = 'Geeks for Geeks gfg.com'
 
# parsering string to HTML
soup = BeautifulSoup(markup, 'html.parser')
 
# tag to be replaced
old_tag = soup.a
 
# new tag
new_tag = soup.new_tag("p")
 
# input string
new_tag.string = "gfg.in"
 
'''replacing tag  page_element.replace_with("string")
removes a tag or string from the tree, and replaces
it with the tag or string of your choice.'''
 
old_tag.i.replace_with(new_tag)
 
print(old_tag)


Python3
# importing module
from bs4 import BeautifulSoup
from bs4 import NavigableString
 
markup = """Geeks for Geeks"""
 
# parsering string to HTML
soup = BeautifulSoup(markup, 'html.parser')
 
# extracting a tag
tag = soup.a
 
# appending content
tag.append("| A Computer Science portal")
print(tag)
 
# appending content using navigableString constructor
new_str = NavigableString(" for geeks")
tag.append(new_str)
print(tag)


Python3
# importing module
from bs4 import BeautifulSoup
 
markup = 'Geeks for Geeks | A Computer Science portal'
 
# parsering string to HTML
soup = BeautifulSoup(markup, 'html.parser')
 
tag = soup.a
print(tag)
print()
 
# clearing its all content
tag.clear()
print(tag)
print()
 
# extracting i tag
# parsering string to HTML
soup2 = BeautifulSoup(markup, 'html.parser')
 
a_tag = soup2.a
 
print(a_tag)
print()
i_tag = soup2.i.extract()
 
print(a_tag)
print()
 
# decomposing i tag
# parsering string to HTML
soup2 = BeautifulSoup(markup, 'html.parser')
 
a_tag = soup2.a
 
print(a_tag)
print()
i_tag = soup2.i.decompose()
 
print(a_tag)


输出:

示例 2:

蟒蛇3

# importing module
from bs4 import BeautifulSoup
 
soup = BeautifulSoup("| A Computer Science portal", 'html.parser')
 
tag = soup.new_tag("p")
tag.string = "Geeks"
 
 
# insert before
soup.b.string.insert_before(tag)
print(soup.b)
print()
 
# insert after
soup.b.p.insert_after(soup.new_string(" for Geeks"))
print(soup.b)

输出:

修改树python bs4

添加新标签和包装元素

可以通过在任何需要的位置添加新标签来修改树。我们也可以包裹元素来修改它。

  • new_tag()函数将添加一个新标签
  • wrap()函数将在您指定的标签中包含一个元素并返回一个新的包装器
  • unwrap()函数解开包装的元素。

例子:

蟒蛇3

# importing module
from bs4 import BeautifulSoup
 
markup = '
 
 
 
 

Geeks for Geeks

        '   # parsering string to HTML soup = BeautifulSoup(markup, 'html.parser') print(soup)   # wrapping around the string soup.p.string.wrap(soup.new_tag("i")) print(soup)   # wrapping around the tag soup.p.wrap(soup.new_tag("div")) print(soup)   # unwrapping the i tag   soup.p.i.unwrap()   print(soup)   old_tag = soup.div   # new tag new_tag = soup.new_tag('div') new_tag.string = "| A Computer Science portal for geeks"   # adding new tag old_tag.append(new_tag)   print(soup)

输出:

替换元素

replace_with()函数将在解析树中用新标签或字符串替换旧标签或字符串。

例子:

蟒蛇3

# importing BeautifulSoup Module
from bs4 import BeautifulSoup
 
markup = 'Geeks for Geeks gfg.com'
 
# parsering string to HTML
soup = BeautifulSoup(markup, 'html.parser')
 
# tag to be replaced
old_tag = soup.a
 
# new tag
new_tag = soup.new_tag("p")
 
# input string
new_tag.string = "gfg.in"
 
'''replacing tag  page_element.replace_with("string")
removes a tag or string from the tree, and replaces
it with the tag or string of your choice.'''
 
old_tag.i.replace_with(new_tag)
 
print(old_tag)

输出:

向现有标签添加新内容

可以通过 append()函数或 NavigableString() 构造函数向现有标签添加新内容。

例子:

蟒蛇3

# importing module
from bs4 import BeautifulSoup
from bs4 import NavigableString
 
markup = """Geeks for Geeks"""
 
# parsering string to HTML
soup = BeautifulSoup(markup, 'html.parser')
 
# extracting a tag
tag = soup.a
 
# appending content
tag.append("| A Computer Science portal")
print(tag)
 
# appending content using navigableString constructor
new_str = NavigableString(" for geeks")
tag.append(new_str)
print(tag)

输出:

删除内容和元素

可以通过从中删除内容或删除元素来修改树。

  • clear() 删除标签的内容。
  • extract() 从树中删除一个标签或字符串。
  • 分解()删除标签并删除它的所有内容。

例子:

蟒蛇3

# importing module
from bs4 import BeautifulSoup
 
markup = 'Geeks for Geeks | A Computer Science portal'
 
# parsering string to HTML
soup = BeautifulSoup(markup, 'html.parser')
 
tag = soup.a
print(tag)
print()
 
# clearing its all content
tag.clear()
print(tag)
print()
 
# extracting i tag
# parsering string to HTML
soup2 = BeautifulSoup(markup, 'html.parser')
 
a_tag = soup2.a
 
print(a_tag)
print()
i_tag = soup2.i.extract()
 
print(a_tag)
print()
 
# decomposing i tag
# parsering string to HTML
soup2 = BeautifulSoup(markup, 'html.parser')
 
a_tag = soup2.a
 
print(a_tag)
print()
i_tag = soup2.i.decompose()
 
print(a_tag)

输出:

删除内容python bs4