BeautifulSoup 对象 – Python Beautifulsoup
BeautifulSoup对象由 Beautiful Soup 提供,它是Python的网络抓取框架。网络抓取是使用自动化工具从网站中提取数据的过程,以加快过程。 BeautifulSoup 对象将解析后的文档表示为一个整体。大多数情况下,您可以将其视为 Tag 对象。
Syntax: BeautifulSoup(document, parser)
Parameters: This function accepts two parameters as explained below:
- document: This parameter contains the XML or HTML document.
- parser: This parameter contains the name of the parser to be used to parse the document.
下面给出的例子解释了 Beautiful Soup 中 BeautifulSoup 对象的概念。
示例 1:在本示例中,我们将创建一个带有 BeautifulSoup 对象的文档并打印一个标签。
Python3
# Import Beautiful Soup
from bs4 import BeautifulSoup
# Initialize the object with a HTML page
soup = BeautifulSoup('''
Heading 1
Heading 2
''', "lxml")
# Get the whole h2 tag
tag = soup.h2
# Print the tag
print(tag)
Python3
# Import Beautiful Soup
from bs4 import BeautifulSoup
# Initialize the object with a HTML page
soup = BeautifulSoup('''
Heading 1
Heading 2
''', "lxml")
# Get the whole h2 tag
tag = soup.h2
# Get the attribute
attribute = tag.attrs
# Print the output
print(attribute)
输出:
Heading 1
示例 2:在本示例中,我们将使用 BeautifulSoup 对象创建一个文档,然后使用 attrs 方法提取属性。
蟒蛇3
# Import Beautiful Soup
from bs4 import BeautifulSoup
# Initialize the object with a HTML page
soup = BeautifulSoup('''
Heading 1
Heading 2
''', "lxml")
# Get the whole h2 tag
tag = soup.h2
# Get the attribute
attribute = tag.attrs
# Print the output
print(attribute)
输出:
{'class': ['hello']}