如何在Python中使用 Beautifulsoup 从 body 标签中抓取所有文本?
字符串生成器由 Beautiful Soup 提供,它是Python的网络抓取框架。网络抓取是使用自动化工具从网站中提取数据的过程,以加快过程。 字符串属性的一个缺点是它仅适用于其中包含字符串的标签,并且对于包含更多标签的标签不返回任何内容。因此,为了解决这个问题,使用字符串生成器递归地获取标签内的所有字符串。
句法:
tag.strings
下面给出的例子解释了 Beautiful Soup 中字符串的概念。
示例 1:在此示例中,我们将获取字符串。
Python3
# Import Beautiful Soup
from bs4 import BeautifulSoup
# Create the document
doc = " Hello world New heading
"
# Initialize the object with the document
soup = BeautifulSoup(doc, "html.parser")
# Get the whole body tag
tag = soup.body
# Print each string recursively
for string in tag.strings:
print(string)
Python3
import requests
from bs4 import BeautifulSoup
# url of the website
doc = "https://www.geeksforgeeks.org"
# getting response object
res = requests.get(doc)
# Initialize the object with the document
soup = BeautifulSoup(res.content, "html.parser")
# Get the whole body tag
tag = soup.body
# Print each string recursively
for string in tag.strings:
print(string)
输出:
Hello world
New heading
示例 2:
蟒蛇3
import requests
from bs4 import BeautifulSoup
# url of the website
doc = "https://www.geeksforgeeks.org"
# getting response object
res = requests.get(doc)
# Initialize the object with the document
soup = BeautifulSoup(res.content, "html.parser")
# Get the whole body tag
tag = soup.body
# Print each string recursively
for string in tag.strings:
print(string)
输出: