使用 BeautifulSoup 计算段落标签的数量
有时,在从 HTML 网页中提取数据时,您是否想知道在给定的 HTML 文档中使用了多少段落标记?别担心,我们将在本文中讨论这个问题。
句法:
print(len(soup.find_all("p")))
方法:
步骤 1:首先,导入库、BeautifulSoup 和 os。
from bs4 import BeautifulSoup as bs
import os
第 2 步:现在,通过输入您当前在其中工作的Python文件的名称,删除路径的最后一段。
base=os.path.dirname(os.path.abspath(‘#Name of Python file in which you are currently working’))
第 3 步:然后,打开要从中读取值的 HTML 文件。
html=open(os.path.join(base, ‘#Name of HTML file from which you wish to read value’))
第 4 步:此外,在 BeautifulSoup 中解析 HTML 文件。
soup=bs(html, 'html.parser')
第 5 步:接下来,如果需要,打印某行。
print("Number of paragraph tags:")
第六步:最后,计算并打印HTML文档中段落标签的数量。
print(len(soup.find_all("p")))
执行:
示例 1
让我们考虑一个简单的 HTML 网页,它有许多段落标签。
HTML
Geeks For Geeks
King
Prince
Queen
Princess
Python
# Python program to get number of paragraph tags
# of a given HTML document in Beautifulsoup
# Import the libraries beautifulsoup
# and os
from bs4 import BeautifulSoup as bs
import os
# Open the HTML file
html = open('gfg.html')
# Parse HTML file in Beautiful Soup
soup = bs(html, 'html.parser')
# Print a certain line
print("Number of paragraph tags:")
# Calculating and printing the
# number of paragraph tags
print(len(soup.find_all("p")))
Python
# Python program to get number of paragraph tags
# of a given Website in Beautifulsoup
# Import the libraries beautifulsoup
# and os
from bs4 import BeautifulSoup as bs
import os
import requests
# Assign URL
URL = 'https://www.geeksforgeeks.org/'
# Page content from Website URL
page = requests.get(URL)
# Parse HTML file in Beautiful Soup
soup = bs(page.content, 'html.parser')
# Print a certain line
print("Number of paragraph tags:")
# Calculating and printing the
# number of paragraph tags
print(len(soup.find_all("p")))
要在上述 HTML 网页中查找段落标记的数量,请执行以下代码。
Python
# Python program to get number of paragraph tags
# of a given HTML document in Beautifulsoup
# Import the libraries beautifulsoup
# and os
from bs4 import BeautifulSoup as bs
import os
# Open the HTML file
html = open('gfg.html')
# Parse HTML file in Beautiful Soup
soup = bs(html, 'html.parser')
# Print a certain line
print("Number of paragraph tags:")
# Calculating and printing the
# number of paragraph tags
print(len(soup.find_all("p")))
输出:
示例 2
在下面的程序中,我们将找到特定网站上段落标签的数量。
Python
# Python program to get number of paragraph tags
# of a given Website in Beautifulsoup
# Import the libraries beautifulsoup
# and os
from bs4 import BeautifulSoup as bs
import os
import requests
# Assign URL
URL = 'https://www.geeksforgeeks.org/'
# Page content from Website URL
page = requests.get(URL)
# Parse HTML file in Beautiful Soup
soup = bs(page.content, 'html.parser')
# Print a certain line
print("Number of paragraph tags:")
# Calculating and printing the
# number of paragraph tags
print(len(soup.find_all("p")))
输出: