📌  相关文章
📜  在Python中使用beautifulsoup提取属性值

📅  最后修改于: 2022-05-13 01:54:19.935000             🧑  作者: Mango

在Python中使用beautifulsoup提取属性值

先决条件: Beautifulsoup 安装

属性由 Beautiful Soup 提供,它是Python的网络抓取框架。网络抓取是使用自动化工具从网站中提取数据的过程,以加快过程。一个标签可以有任意数量的属性。例如,标签 有一个属性“class”,其值为“active”。我们可以通过将其视为字典来访问标签的属性。

句法:

tag.attrs

执行:
示例 1:使用 attrs 方法提取属性的程序。

Python3
# Import Beautiful Soup
from bs4 import BeautifulSoup
  
# Initialize the object with a HTML page
soup = BeautifulSoup('''
    
        

Heading 1

        

Heading 2

         ''', "lxml")    # Get the whole h2 tag tag = soup.h2    # Get the attribute attribute = tag.attrs    # Print the output print(attribute)


Python3
# Import Beautiful Soup
from bs4 import BeautifulSoup
  
# Initialize the object with a HTML page
soup = BeautifulSoup('''
    
        

Heading 1

        

Heading 2

         ''', "lxml")    # Get the whole h2 tag tag = soup.h2    # Get the attribute attribute = tag['class']    # Print the output print(attribute)


Python3
# Import Beautiful Soup
from bs4 import BeautifulSoup
  
# Initialize the object with a HTML page
soup = BeautifulSoup('''
    
        

Heading 1

        

Heading 2

         ''', "lxml")    # Get the whole h2 tag tag = soup.h2    # Get the attribute attribute = tag['class']    # Print the output print(attribute)


输出:

{'class': ['hello']}

示例 2:使用字典方法提取属性的程序。

蟒蛇3

# Import Beautiful Soup
from bs4 import BeautifulSoup
  
# Initialize the object with a HTML page
soup = BeautifulSoup('''
    
        

Heading 1

        

Heading 2

         ''', "lxml")    # Get the whole h2 tag tag = soup.h2    # Get the attribute attribute = tag['class']    # Print the output print(attribute)

输出:

['hello']

示例 3:使用字典方法提取多个属性值的程序。

蟒蛇3

# Import Beautiful Soup
from bs4 import BeautifulSoup
  
# Initialize the object with a HTML page
soup = BeautifulSoup('''
    
        

Heading 1

        

Heading 2

         ''', "lxml")    # Get the whole h2 tag tag = soup.h2    # Get the attribute attribute = tag['class']    # Print the output print(attribute)

输出:

['first', 'second', 'third']