在Python中使用beautifulsoup提取属性值
先决条件: Beautifulsoup 安装
属性由 Beautiful Soup 提供,它是Python的网络抓取框架。网络抓取是使用自动化工具从网站中提取数据的过程,以加快过程。一个标签可以有任意数量的属性。例如,标签 有一个属性“class”,其值为“active”。我们可以通过将其视为字典来访问标签的属性。
句法:
tag.attrs
执行:
示例 1:使用 attrs 方法提取属性的程序。
Python3
# Import Beautiful Soup
from bs4 import BeautifulSoup
# Initialize the object with a HTML page
soup = BeautifulSoup('''
Heading 1
Heading 2
''', "lxml")
# Get the whole h2 tag
tag = soup.h2
# Get the attribute
attribute = tag.attrs
# Print the output
print(attribute)
Python3
# Import Beautiful Soup
from bs4 import BeautifulSoup
# Initialize the object with a HTML page
soup = BeautifulSoup('''
Heading 1
Heading 2
''', "lxml")
# Get the whole h2 tag
tag = soup.h2
# Get the attribute
attribute = tag['class']
# Print the output
print(attribute)
Python3
# Import Beautiful Soup
from bs4 import BeautifulSoup
# Initialize the object with a HTML page
soup = BeautifulSoup('''
Heading 1
Heading 2
''', "lxml")
# Get the whole h2 tag
tag = soup.h2
# Get the attribute
attribute = tag['class']
# Print the output
print(attribute)
输出:
{'class': ['hello']}
示例 2:使用字典方法提取属性的程序。
蟒蛇3
# Import Beautiful Soup
from bs4 import BeautifulSoup
# Initialize the object with a HTML page
soup = BeautifulSoup('''
Heading 1
Heading 2
''', "lxml")
# Get the whole h2 tag
tag = soup.h2
# Get the attribute
attribute = tag['class']
# Print the output
print(attribute)
输出:
['hello']
示例 3:使用字典方法提取多个属性值的程序。
蟒蛇3
# Import Beautiful Soup
from bs4 import BeautifulSoup
# Initialize the object with a HTML page
soup = BeautifulSoup('''
Heading 1
Heading 2
''', "lxml")
# Get the whole h2 tag
tag = soup.h2
# Get the attribute
attribute = tag['class']
# Print the output
print(attribute)
输出:
['first', 'second', 'third']