如何在Python中使用 Beautifulsoup 从 body 标签中抓取所有文本？

字符串生成器由 Beautiful Soup 提供，它是Python的网络抓取框架。网络抓取是使用自动化工具从网站中提取数据的过程，以加快过程。 字符串属性的一个缺点是它仅适用于其中包含字符串的标签，并且对于包含更多标签的标签不返回任何内容。因此，为了解决这个问题，使用字符串生成器递归地获取标签内的所有字符串。

句法：

tag.strings

下面给出的例子解释了 Beautiful Soup 中字符串的概念。
示例 1：在此示例中，我们将获取字符串。

Python3

# Import Beautiful Soup
from bs4 import BeautifulSoup
 
# Create the document
doc = " Hello world  New heading "
 
# Initialize the object with the document
soup = BeautifulSoup(doc, "html.parser")
 
# Get the whole body tag
tag = soup.body
 
# Print each string recursively
for string in tag.strings:
    print(string)

Python3

import requests
from bs4 import BeautifulSoup
 
# url of the website
doc = "https://www.geeksforgeeks.org"
 
# getting response object
res = requests.get(doc)
 
# Initialize the object with the document
soup = BeautifulSoup(res.content, "html.parser")
 
# Get the whole body tag
tag = soup.body
 
# Print each string recursively
for string in tag.strings:
    print(string)

输出：

Hello world 
 New heading

示例 2：

蟒蛇3

import requests
from bs4 import BeautifulSoup
 
# url of the website
doc = "https://www.geeksforgeeks.org"
 
# getting response object
res = requests.get(doc)
 
# Initialize the object with the document
soup = BeautifulSoup(res.content, "html.parser")
 
# Get the whole body tag
tag = soup.body
 
# Print each string recursively
for string in tag.strings:
    print(string)

输出：