Beautifulsoup – 对象的种类

先决条件： BeautifulSoup

在本文中，我们将讨论 Beautifullsoup 中不同类型的对象。当在 BeautifulSoup 的构造函数中给出字符串或 HTML 文档时，这个构造函数将这个文档转换为不同的Python对象。

四个主要和重要的对象是：

美汤
标签
导航字符串
注释

1. BeautifulSoup 对象： BeautifulSoup 对象代表整个解析的文档。因此，这是我们试图抓取的完整文档。大多数情况下，您可以将其视为 Tag 对象。

Python3

# importing the module
from bs4 import BeautifulSoup
 
# parsing the document
soup = BeautifulSoup('''Geeks for Geeks''',
                     "html.parser")
 
print(type(soup))

Python3

# Import Beautiful Soup
from bs4 import BeautifulSoup
   
# Initialize the object with an HTML page
soup = BeautifulSoup('''
    
        Geeks for Geeks
    
    ''', "html.parser")
   
# Get the tag
tag = soup.b
   
# Print the output
print(type(tag))

Python3

# Import Beautiful Soup
from bs4 import BeautifulSoup
   
# Initialize the object with an HTML page
soup = BeautifulSoup('''
    
        Geeks for Geeks
    
    ''', "html.parser")
   
# Get the tag
tag = soup.b
   
# Print the output
print(tag.name)
 
# changing the tag
tag.name = "Strong"
print(tag)

Python3

# Import Beautiful Soup
from bs4 import BeautifulSoup
   
# Initialize the object with an HTML page
soup = BeautifulSoup('''
    
        Geeks for Geeks
    
    ''', "html.parser")
   
# Get the tag
tag = soup.b
 
print(tag["class"])
 
# modifying class
tag["class"] = "geeks"
print(tag)
 
# delete the class attributes
del tag["class"]
print(tag)

Python3

# Import Beautiful Soup
from bs4 import BeautifulSoup
 
# Initialize the object with an HTML page
# soup for multi_valued attributes
soup = BeautifulSoup('''
    
        Geeks for Geeks
    
    ''', "html.parser")
 
# Get the tag
tag = soup.b
 
print(tag["class"])

Python3

# Import Beautiful Soup
from bs4 import BeautifulSoup
   
# Initialize the object with an HTML page
soup = BeautifulSoup('''
    
        Geeks for Geeks
    
    ''', "html.parser")
   
# Get the tag
tag = soup.b
 
# Get the string inside the tag
string = tag.string
   
# Print the output
print(type(string))

Python3

# Import Beautiful Soup
from bs4 import BeautifulSoup
   
# Create the document
markup = ""
   
# Initialize the object with the document
soup = BeautifulSoup(markup, "html.parser")
   
# Get the whole comment inside b tag
comment = soup.b.string
   
# Print the type of the comment
print(type(comment))

输出：

2.标签对象： Tag 对象对应于原始文档中的 XML 或 HTML 标签。此外，该对象通常用于从整个 HTML 文档中提取标签。此外，Beautiful Soup 不是 HTTP 客户端，这意味着您首先必须使用请求模块下载在线网站，然后将它们提供给 Beautiful Soup 进行抓取。此外，如果您的文档有多个同名标签，则此对象返回第一个找到的标签。

蟒蛇3

# Import Beautiful Soup
from bs4 import BeautifulSoup
   
# Initialize the object with an HTML page
soup = BeautifulSoup('''
    
        Geeks for Geeks
    
    ''', "html.parser")
   
# Get the tag
tag = soup.b
   
# Print the output
print(type(tag))

输出：

该标签包含许多方法和属性。标签的两个重要特征是其名称和属性。

姓名
属性

＃姓名：

标签的名称可以通过“.name”作为后缀访问。

Syntax: tag.name

Return: the type of tag it is.

编程需要懂一点英语

我们还可以更改标签的名称。

蟒蛇3

# Import Beautiful Soup
from bs4 import BeautifulSoup
   
# Initialize the object with an HTML page
soup = BeautifulSoup('''
    
        Geeks for Geeks
    
    ''', "html.parser")
   
# Get the tag
tag = soup.b
   
# Print the output
print(tag.name)
 
# changing the tag
tag.name = "Strong"
print(tag)

输出：

b
Geeks for Geeks

＃属性：

示例 1：任何不是标记的东西基本上都是一个属性并且必须包含一个值。一个标签对象可以有许多属性，可以通过访问键或直接通过值访问来访问。我们还可以修改属性及其值。

蟒蛇3

# Import Beautiful Soup
from bs4 import BeautifulSoup
   
# Initialize the object with an HTML page
soup = BeautifulSoup('''
    
        Geeks for Geeks
    
    ''', "html.parser")
   
# Get the tag
tag = soup.b
 
print(tag["class"])
 
# modifying class
tag["class"] = "geeks"
print(tag)
 
# delete the class attributes
del tag["class"]
print(tag)

输出：

['gfg']
Geeks for Geeks
Geeks for Geeks

示例 2：一个文档可能包含多值属性，并且可以使用键值对进行访问。

蟒蛇3

# Import Beautiful Soup
from bs4 import BeautifulSoup
 
# Initialize the object with an HTML page
# soup for multi_valued attributes
soup = BeautifulSoup('''
    
        Geeks for Geeks
    
    ''', "html.parser")
 
# Get the tag
tag = soup.b
 
print(tag["class"])

输出：

['gfg', 'geeks']

3. NavigableString 对象： 字符串对应于标签中的一小段文本。 Beautiful Soup 使用 NavigableString 类来包含这些文本位。

Syntax: String here

编程需要懂一点英语

蟒蛇3

# Import Beautiful Soup
from bs4 import BeautifulSoup
   
# Initialize the object with an HTML page
soup = BeautifulSoup('''
    
        Geeks for Geeks
    
    ''', "html.parser")
   
# Get the tag
tag = soup.b
 
# Get the string inside the tag
string = tag.string
   
# Print the output
print(type(string))

输出：

4. Comment 对象： Comment 对象只是 NavigableString 的一种特殊类型，用于使代码库更具可读性。

蟒蛇3

# Import Beautiful Soup
from bs4 import BeautifulSoup
   
# Create the document
markup = ""
   
# Initialize the object with the document
soup = BeautifulSoup(markup, "html.parser")
   
# Get the whole comment inside b tag
comment = soup.b.string
   
# Print the type of the comment
print(type(comment))

输出：