📌  相关文章
📜  如何在Python中使用 Beautifulsoup 从 body 标签中抓取所有文本?

📅  最后修改于: 2022-05-13 01:54:34.463000             🧑  作者: Mango

如何在Python中使用 Beautifulsoup 从 body 标签中抓取所有文本?

字符串生成器由 Beautiful Soup 提供,它是Python的网络抓取框架。网络抓取是使用自动化工具从网站中提取数据的过程,以加快过程。 字符串属性的一个缺点是它仅适用于其中包含字符串的标签,并且对于包含更多标签的标签不返回任何内容。因此,为了解决这个问题,使用字符串生成器递归地获取标签内的所有字符串。

句法:

tag.strings 

下面给出的例子解释了 Beautiful Soup 中字符串的概念。
示例 1:在此示例中,我们将获取字符串。

Python3
# Import Beautiful Soup
from bs4 import BeautifulSoup
 
# Create the document
doc = " Hello world 

New heading

"   # Initialize the object with the document soup = BeautifulSoup(doc, "html.parser")   # Get the whole body tag tag = soup.body   # Print each string recursively for string in tag.strings:     print(string)


Python3
import requests
from bs4 import BeautifulSoup
 
# url of the website
doc = "https://www.geeksforgeeks.org"
 
# getting response object
res = requests.get(doc)
 
# Initialize the object with the document
soup = BeautifulSoup(res.content, "html.parser")
 
# Get the whole body tag
tag = soup.body
 
# Print each string recursively
for string in tag.strings:
    print(string)


输出:

Hello world 
 New heading 

示例 2:

蟒蛇3

import requests
from bs4 import BeautifulSoup
 
# url of the website
doc = "https://www.geeksforgeeks.org"
 
# getting response object
res = requests.get(doc)
 
# Initialize the object with the document
soup = BeautifulSoup(res.content, "html.parser")
 
# Get the whole body tag
tag = soup.body
 
# Print each string recursively
for string in tag.strings:
    print(string)

输出: