用于提取 HTML 标签之间的字符串的Python程序
给定一个字符串和 HTML 标签,提取指定标签之间的所有字符串。
Input : ‘Gfg is Best. I love Reading CS from it.’ , tag = “br”
Output : [‘Gfg’, ‘Best’, ‘Reading CS’]
Explanation : All strings between “br” tag are extracted.
Input : ‘
Gfg
isBest
I loveReading CS
’ , tag = “h1”Output : [‘Gfg’, ‘Best’, ‘Reading CS’]
Explanation : All strings between “h1” tag are extracted.
使用re 模块可以执行此任务。在此,我们使用 findall()函数通过匹配使用标记和符号构建的适当正则表达式来提取所有字符串。
Python3
# importing re module
import re
# initializing string
test_str = 'Gfg is Best. I love Reading CS from it.'
# printing original string
print("The original string is : " + str(test_str))
# initializing tag
tag = "b"
# regex to extract required strings
reg_str = "<" + tag + ">(.*?)" + tag + ">"
res = re.findall(reg_str, test_str)
# printing result
print("The Strings extracted : " + str(res))
输出:
The original string is : Gfg is Best. I love Reading CS from it.
The Strings extracted : [‘Gfg’, ‘Best’, ‘Reading CS’]