Beautifulsoup – 对象的种类
先决条件: BeautifulSoup
在本文中,我们将讨论 Beautifullsoup 中不同类型的对象。当在 BeautifulSoup 的构造函数中给出字符串或 HTML 文档时,这个构造函数将这个文档转换为不同的Python对象。
四个主要和重要的对象是:
- 美汤
- 标签
- 导航字符串
- 注释
1. BeautifulSoup 对象: BeautifulSoup 对象代表整个解析的文档。因此,这是我们试图抓取的完整文档。大多数情况下,您可以将其视为 Tag 对象。
Python3
# importing the module
from bs4 import BeautifulSoup
# parsing the document
soup = BeautifulSoup('''Geeks for Geeks
''',
"html.parser")
print(type(soup))
Python3
# Import Beautiful Soup
from bs4 import BeautifulSoup
# Initialize the object with an HTML page
soup = BeautifulSoup('''
Geeks for Geeks
''', "html.parser")
# Get the tag
tag = soup.b
# Print the output
print(type(tag))
Python3
# Import Beautiful Soup
from bs4 import BeautifulSoup
# Initialize the object with an HTML page
soup = BeautifulSoup('''
Geeks for Geeks
''', "html.parser")
# Get the tag
tag = soup.b
# Print the output
print(tag.name)
# changing the tag
tag.name = "Strong"
print(tag)
Python3
# Import Beautiful Soup
from bs4 import BeautifulSoup
# Initialize the object with an HTML page
soup = BeautifulSoup('''
Geeks for Geeks
''', "html.parser")
# Get the tag
tag = soup.b
print(tag["class"])
# modifying class
tag["class"] = "geeks"
print(tag)
# delete the class attributes
del tag["class"]
print(tag)
Python3
# Import Beautiful Soup
from bs4 import BeautifulSoup
# Initialize the object with an HTML page
# soup for multi_valued attributes
soup = BeautifulSoup('''
Geeks for Geeks
''', "html.parser")
# Get the tag
tag = soup.b
print(tag["class"])
Python3
# Import Beautiful Soup
from bs4 import BeautifulSoup
# Initialize the object with an HTML page
soup = BeautifulSoup('''
Geeks for Geeks
''', "html.parser")
# Get the tag
tag = soup.b
# Get the string inside the tag
string = tag.string
# Print the output
print(type(string))
Python3
# Import Beautiful Soup
from bs4 import BeautifulSoup
# Create the document
markup = ""
# Initialize the object with the document
soup = BeautifulSoup(markup, "html.parser")
# Get the whole comment inside b tag
comment = soup.b.string
# Print the type of the comment
print(type(comment))
输出:
2.标签对象: Tag 对象对应于原始文档中的 XML 或 HTML 标签。此外,该对象通常用于从整个 HTML 文档中提取标签。此外,Beautiful Soup 不是 HTTP 客户端,这意味着您首先必须使用请求模块下载在线网站,然后将它们提供给 Beautiful Soup 进行抓取。此外,如果您的文档有多个同名标签,则此对象返回第一个找到的标签。
蟒蛇3
# Import Beautiful Soup
from bs4 import BeautifulSoup
# Initialize the object with an HTML page
soup = BeautifulSoup('''
Geeks for Geeks
''', "html.parser")
# Get the tag
tag = soup.b
# Print the output
print(type(tag))
输出:
该标签包含许多方法和属性。标签的两个重要特征是其名称和属性。
- 姓名
- 属性
# 姓名 :
标签的名称可以通过“.name”作为后缀访问。
Syntax: tag.name
Return: the type of tag it is.
我们还可以更改标签的名称。
蟒蛇3
# Import Beautiful Soup
from bs4 import BeautifulSoup
# Initialize the object with an HTML page
soup = BeautifulSoup('''
Geeks for Geeks
''', "html.parser")
# Get the tag
tag = soup.b
# Print the output
print(tag.name)
# changing the tag
tag.name = "Strong"
print(tag)
输出:
b
Geeks for Geeks
# 属性 :
示例 1:任何不是标记的东西基本上都是一个属性并且必须包含一个值。一个标签对象可以有许多属性,可以通过访问键或直接通过值访问来访问。我们还可以修改属性及其值。
蟒蛇3
# Import Beautiful Soup
from bs4 import BeautifulSoup
# Initialize the object with an HTML page
soup = BeautifulSoup('''
Geeks for Geeks
''', "html.parser")
# Get the tag
tag = soup.b
print(tag["class"])
# modifying class
tag["class"] = "geeks"
print(tag)
# delete the class attributes
del tag["class"]
print(tag)
输出:
['gfg']
Geeks for Geeks
Geeks for Geeks
示例 2:一个文档可能包含多值属性,并且可以使用键值对进行访问。
蟒蛇3
# Import Beautiful Soup
from bs4 import BeautifulSoup
# Initialize the object with an HTML page
# soup for multi_valued attributes
soup = BeautifulSoup('''
Geeks for Geeks
''', "html.parser")
# Get the tag
tag = soup.b
print(tag["class"])
输出:
['gfg', 'geeks']
3. NavigableString 对象: 字符串对应于标签中的一小段文本。 Beautiful Soup 使用 NavigableString 类来包含这些文本位。
Syntax:
蟒蛇3
# Import Beautiful Soup
from bs4 import BeautifulSoup
# Initialize the object with an HTML page
soup = BeautifulSoup('''
Geeks for Geeks
''', "html.parser")
# Get the tag
tag = soup.b
# Get the string inside the tag
string = tag.string
# Print the output
print(type(string))
输出:
4. Comment 对象: Comment 对象只是 NavigableString 的一种特殊类型,用于使代码库更具可读性。
蟒蛇3
# Import Beautiful Soup
from bs4 import BeautifulSoup
# Create the document
markup = ""
# Initialize the object with the document
soup = BeautifulSoup(markup, "html.parser")
# Get the whole comment inside b tag
comment = soup.b.string
# Print the type of the comment
print(type(comment))
输出: