在Python中创建 GUI 到 Web Scrape 文章
先决条件- 使用 Tkinter 的 GUI 应用程序
在本文中,我们将编写脚本以从给定 URL 中的文章中提取信息。将提取标题、元信息、文章描述等信息。
我们将使用Goose 模块。
Goose 模块有助于提取以下信息:
- 文章的主要文本。
- 文章的主图。
- 文章中嵌入的任何 YouTube/Vimeo 电影。
- 元描述。
- 元标签。
首先,使用以下命令安装所需的模块。
pip install goose3
方法
- 导入模块。
- 使用 Goose().extract(URL)函数创建一个对象。
- 使用 obj.title 属性获取标题。
- 使用 obj.meta_description 属性获取元描述。
- 使用 obj.article.cleaned_text 属性获取文本。
执行
第 1 步:初始化需求。
Python3
# import module
from goose3 import Goose
# var for URL
url = "https://www.geeksforgeeks.org/python-programming-language/?ref=leftbar"
# initialization with
article = Goose().extract(url)
Python3
print("Title of the article :\n",article.title)
Python3
print("Meta information :\n",article.meta_description)
Python3
print("Article Text :\n",article.cleaned_text[:300])
Python3
# import modules
from tkinter import *
from goose3 import Goose
# for getting information
def info():
article = Goose().extract(e1.get())
title.set(article.title)
meta.set(article.meta_description)
string = article.cleaned_text[:150]
art_dec.set(string.split("\n"))
# object of tkinter
# and background set to grey
master = Tk()
master.configure(bg='light grey')
# Variable Classes in tkinter
title = StringVar();
meta = StringVar();
art_dec = StringVar();
# Creating label for each information
# name using widget Label
Label(master, text="Website URL : " ,
bg = "light grey").grid(row=0, sticky=W)
Label(master, text="Title :",
bg = "light grey").grid(row=3, sticky=W)
Label(master, text="Meta information :",
bg = "light grey").grid(row=4, sticky=W)
Label(master, text="Article description :",
bg = "light grey").grid(row=5, sticky=W)
# Creating lebel for class variable
# name using widget Entry
Label(master, text="", textvariable=title,
bg = "light grey").grid(row=3,column=1, sticky=W)
Label(master, text="", textvariable=meta,
bg = "light grey").grid(row=4,column=1, sticky=W)
Label(master, text="", textvariable=art_dec,
bg = "light grey").grid(row=5,column=1, sticky=W)
e1 = Entry(master, width = 100)
e1.grid(row=0, column=1)
# creating a button using the widget
# to call the submit function
b = Button(master, text="Show", command=info , bg = "Blue")
b.grid(row=0, column=2,columnspan=2, rowspan=2,padx=5, pady=5,)
mainloop()
第二步:提取标题。
蟒蛇3
print("Title of the article :\n",article.title)
输出:
第 3 步:提取元信息
蟒蛇3
print("Meta information :\n",article.meta_description)
输出:
第 4 步:提取文章
蟒蛇3
print("Article Text :\n",article.cleaned_text[:300])
输出:
第 5 步:使用Tkinter进行可视化
蟒蛇3
# import modules
from tkinter import *
from goose3 import Goose
# for getting information
def info():
article = Goose().extract(e1.get())
title.set(article.title)
meta.set(article.meta_description)
string = article.cleaned_text[:150]
art_dec.set(string.split("\n"))
# object of tkinter
# and background set to grey
master = Tk()
master.configure(bg='light grey')
# Variable Classes in tkinter
title = StringVar();
meta = StringVar();
art_dec = StringVar();
# Creating label for each information
# name using widget Label
Label(master, text="Website URL : " ,
bg = "light grey").grid(row=0, sticky=W)
Label(master, text="Title :",
bg = "light grey").grid(row=3, sticky=W)
Label(master, text="Meta information :",
bg = "light grey").grid(row=4, sticky=W)
Label(master, text="Article description :",
bg = "light grey").grid(row=5, sticky=W)
# Creating lebel for class variable
# name using widget Entry
Label(master, text="", textvariable=title,
bg = "light grey").grid(row=3,column=1, sticky=W)
Label(master, text="", textvariable=meta,
bg = "light grey").grid(row=4,column=1, sticky=W)
Label(master, text="", textvariable=art_dec,
bg = "light grey").grid(row=5,column=1, sticky=W)
e1 = Entry(master, width = 100)
e1.grid(row=0, column=1)
# creating a button using the widget
# to call the submit function
b = Button(master, text="Show", command=info , bg = "Blue")
b.grid(row=0, column=2,columnspan=2, rowspan=2,padx=5, pady=5,)
mainloop()
输出: