在Python中使用beautifulsoup提取属性值

先决条件： Beautifulsoup 安装

属性由 Beautiful Soup 提供，它是Python的网络抓取框架。网络抓取是使用自动化工具从网站中提取数据的过程，以加快过程。一个标签可以有任意数量的属性。例如，标签 有一个属性“class”，其值为“active”。我们可以通过将其视为字典来访问标签的属性。

句法：

tag.attrs

执行：
示例 1：使用 attrs 方法提取属性的程序。

Python3

# Import Beautiful Soup
from bs4 import BeautifulSoup
  
# Initialize the object with a HTML page
soup = BeautifulSoup('''
    
         Heading 1 
         Heading 2 
    
    ''', "lxml")
  
# Get the whole h2 tag
tag = soup.h2
  
# Get the attribute
attribute = tag.attrs
  
# Print the output
print(attribute)

Python3
# Import Beautiful Soup from bs4 import BeautifulSoup # Initialize the object with a HTML page soup = BeautifulSoup(''' Heading 1 Heading 2 ''', "lxml") # Get the whole h2 tag tag = soup.h2 # Get the attribute attribute = tag['class'] # Print the output print(attribute)

Python3
# Import Beautiful Soup from bs4 import BeautifulSoup # Initialize the object with a HTML page soup = BeautifulSoup(''' Heading 1 Heading 2 ''', "lxml") # Get the whole h2 tag tag = soup.h2 # Get the attribute attribute = tag['class'] # Print the output print(attribute)

输出：

{'class': ['hello']}

示例 2：使用字典方法提取属性的程序。

蟒蛇3

# Import Beautiful Soup
from bs4 import BeautifulSoup
  
# Initialize the object with a HTML page
soup = BeautifulSoup('''
    
         Heading 1 
         Heading 2 
    
    ''', "lxml")
  
# Get the whole h2 tag
tag = soup.h2
  
# Get the attribute
attribute = tag['class']
  
# Print the output
print(attribute)

输出：

['hello']

示例 3：使用字典方法提取多个属性值的程序。

蟒蛇3

# Import Beautiful Soup
from bs4 import BeautifulSoup
  
# Initialize the object with a HTML page
soup = BeautifulSoup('''
    
         Heading 1 
         Heading 2 
    
    ''', "lxml")
  
# Get the whole h2 tag
tag = soup.h2
  
# Get the attribute
attribute = tag['class']
  
# Print the output
print(attribute)

输出：

['first', 'second', 'third']