📜  Beautifulsoup – 对象的种类

📅  最后修改于: 2022-05-13 01:54:36.164000             🧑  作者: Mango

Beautifulsoup – 对象的种类

先决条件: BeautifulSoup

在本文中,我们将讨论 Beautifullsoup 中不同类型的对象。当在 BeautifulSoup 的构造函数中给出字符串或 HTML 文档时,这个构造函数将这个文档转换为不同的Python对象。

四个主要和重要的对象是:

  1. 美汤
  2. 标签
  3. 导航字符串
  4. 注释

1. BeautifulSoup 对象: BeautifulSoup 对象代表整个解析的文档。因此,这是我们试图抓取的完整文档。大多数情况下,您可以将其视为 Tag 对象。

Python3
# importing the module
from bs4 import BeautifulSoup
 
# parsing the document
soup = BeautifulSoup('''

Geeks for Geeks

''',                      "html.parser")   print(type(soup))


Python3
# Import Beautiful Soup
from bs4 import BeautifulSoup
   
# Initialize the object with an HTML page
soup = BeautifulSoup('''
    
        Geeks for Geeks
    
    ''', "html.parser")
   
# Get the tag
tag = soup.b
   
# Print the output
print(type(tag))


Python3
# Import Beautiful Soup
from bs4 import BeautifulSoup
   
# Initialize the object with an HTML page
soup = BeautifulSoup('''
    
        Geeks for Geeks
    
    ''', "html.parser")
   
# Get the tag
tag = soup.b
   
# Print the output
print(tag.name)
 
# changing the tag
tag.name = "Strong"
print(tag)


Python3
# Import Beautiful Soup
from bs4 import BeautifulSoup
   
# Initialize the object with an HTML page
soup = BeautifulSoup('''
    
        Geeks for Geeks
    
    ''', "html.parser")
   
# Get the tag
tag = soup.b
 
print(tag["class"])
 
# modifying class
tag["class"] = "geeks"
print(tag)
 
# delete the class attributes
del tag["class"]
print(tag)


Python3
# Import Beautiful Soup
from bs4 import BeautifulSoup
 
# Initialize the object with an HTML page
# soup for multi_valued attributes
soup = BeautifulSoup('''
    
        Geeks for Geeks
    
    ''', "html.parser")
 
# Get the tag
tag = soup.b
 
print(tag["class"])


Python3
# Import Beautiful Soup
from bs4 import BeautifulSoup
   
# Initialize the object with an HTML page
soup = BeautifulSoup('''
    
        Geeks for Geeks
    
    ''', "html.parser")
   
# Get the tag
tag = soup.b
 
# Get the string inside the tag
string = tag.string
   
# Print the output
print(type(string))


Python3
# Import Beautiful Soup
from bs4 import BeautifulSoup
   
# Create the document
markup = ""
   
# Initialize the object with the document
soup = BeautifulSoup(markup, "html.parser")
   
# Get the whole comment inside b tag
comment = soup.b.string
   
# Print the type of the comment
print(type(comment))


输出:

2.标签对象: Tag 对象对应于原始文档中的 XML 或 HTML 标签。此外,该对象通常用于从整个 HTML 文档中提取标签。此外,Beautiful Soup 不是 HTTP 客户端,这意味着您首先必须使用请求模块下载在线网站,然后将它们提供给 Beautiful Soup 进行抓取。此外,如果您的文档有多个同名标签,则此对象返回第一个找到的标签。

蟒蛇3



# Import Beautiful Soup
from bs4 import BeautifulSoup
   
# Initialize the object with an HTML page
soup = BeautifulSoup('''
    
        Geeks for Geeks
    
    ''', "html.parser")
   
# Get the tag
tag = soup.b
   
# Print the output
print(type(tag))

输出:

该标签包含许多方法和属性。标签的两个重要特征是其名称和属性。

  • 姓名
  • 属性

# 姓名 :



标签的名称可以通过“.name”作为后缀访问。

我们还可以更改标签的名称。

蟒蛇3

# Import Beautiful Soup
from bs4 import BeautifulSoup
   
# Initialize the object with an HTML page
soup = BeautifulSoup('''
    
        Geeks for Geeks
    
    ''', "html.parser")
   
# Get the tag
tag = soup.b
   
# Print the output
print(tag.name)
 
# changing the tag
tag.name = "Strong"
print(tag)



输出:

b
Geeks for Geeks

# 属性 :

示例 1:任何不是标记的东西基本上都是一个属性并且必须包含一个值。一个标签对象可以有许多属性,可以通过访问键或直接通过值访问来访问。我们还可以修改属性及其值。

蟒蛇3

# Import Beautiful Soup
from bs4 import BeautifulSoup
   
# Initialize the object with an HTML page
soup = BeautifulSoup('''
    
        Geeks for Geeks
    
    ''', "html.parser")
   
# Get the tag
tag = soup.b
 
print(tag["class"])
 
# modifying class
tag["class"] = "geeks"
print(tag)
 
# delete the class attributes
del tag["class"]
print(tag)



输出:

['gfg']
Geeks for Geeks
Geeks for Geeks

示例 2:一个文档可能包含多值属性,并且可以使用键值对进行访问。

蟒蛇3

# Import Beautiful Soup
from bs4 import BeautifulSoup
 
# Initialize the object with an HTML page
# soup for multi_valued attributes
soup = BeautifulSoup('''
    
        Geeks for Geeks
    
    ''', "html.parser")
 
# Get the tag
tag = soup.b
 
print(tag["class"])

输出:

['gfg', 'geeks']

3. NavigableString 对象: 字符串对应于标签中的一小段文本。 Beautiful Soup 使用 NavigableString 类来包含这些文本位。

蟒蛇3



# Import Beautiful Soup
from bs4 import BeautifulSoup
   
# Initialize the object with an HTML page
soup = BeautifulSoup('''
    
        Geeks for Geeks
    
    ''', "html.parser")
   
# Get the tag
tag = soup.b
 
# Get the string inside the tag
string = tag.string
   
# Print the output
print(type(string))

输出:

4. Comment 对象: Comment 对象只是 NavigableString 的一种特殊类型,用于使代码库更具可读性。

蟒蛇3

# Import Beautiful Soup
from bs4 import BeautifulSoup
   
# Create the document
markup = ""
   
# Initialize the object with the document
soup = BeautifulSoup(markup, "html.parser")
   
# Get the whole comment inside b tag
comment = soup.b.string
   
# Print the type of the comment
print(type(comment))

输出: