使用Python从网页中提取标题

先决条件使用Python 、 Python Urllib 模块、 Web Scraping 工具在 Python 中实现 Web Scraping

在本文中，我们将编写Python脚本从给定的网页 URL 中提取网页的标题。

方法一：bs4 Beautiful Soup(bs4) 是一个Python库，用于从 HTML 和 XML 文件中提取数据。这个模块没有内置于Python中。要安装此类型，请在终端中输入以下命令。

pip install bs4

requests模块允许您非常轻松地发送 HTTP/1.1 请求。这个模块也没有内置于Python中。要安装此类型，请在终端中输入以下命令。

pip install requests

方法：

导入模块
制作请求实例并传入 URL
将请求传递给 Beautifulsoup()函数
使用'title'标签找到他们所有的标签（'title'）

代码：

Python3

# importing the modules
import requests
from bs4 import BeautifulSoup
  
# target url
url = 'https://www.geeksforgeeks.org/'
  
# making requests instance
reqs = requests.get(url)
  
# using the BeaitifulSoup module
soup = BeautifulSoup(reqs.text, 'html.parser')
  
# displaying the title
print("Title of the website is : ")
for title in soup.find_all('title'):
    print(title.get_text())

Python3

# importing the modules
from urllib.request import urlopen
from bs4 import BeautifulSoup
  
# target url
url = 'https://www.geeksforgeeks.org/'
  
# using the BeaitifulSoup module
soup = BeautifulSoup(urlopen(url))
  
# displaying the title
print("Title of the website is : ")
print (soup.title.get_text())

Python3

# importing the module
from mechanize import Browser
  
# target url
url = 'https://www.geeksforgeeks.org/'
  
# creating a Browser instance
br = Browser()
br.open(url)
  
# displaying the title
print("Title of the website is : ")
print( br.title())

输出：

Title of the website is : 
GeeksforGeeks | A computer science portal for geeks

方法二：在这个方法中，我们将使用urllib和Beautifulsoup模块来提取网站的标题。 urllib 是一个包，允许您使用程序访问网页。

安装：

pip install urllib

方法：

导入模块
使用 request.urlopen(URL) 读取 URL。
从 HTML 文档中找到带有 soup.title 的标题

执行：

蟒蛇3

# importing the modules
from urllib.request import urlopen
from bs4 import BeautifulSoup
  
# target url
url = 'https://www.geeksforgeeks.org/'
  
# using the BeaitifulSoup module
soup = BeautifulSoup(urlopen(url))
  
# displaying the title
print("Title of the website is : ")
print (soup.title.get_text())

输出：

Title of the website is : 
GeeksforGeeks | A computer science portal for geeks

方法三：在这个方法中，我们将使用机械化模块。它是Python中的有状态程序化网页浏览。通过简单的 HTML 表单填写和点击链接以编程方式浏览页面。

安装：

pip install mechanize

方法：

导入模块。
初始化 Browser() 实例。
检索网页内容 Browser.open()。
使用 Browser.title() 显示标题

执行：

蟒蛇3

# importing the module
from mechanize import Browser
  
# target url
url = 'https://www.geeksforgeeks.org/'
  
# creating a Browser instance
br = Browser()
br.open(url)
  
# displaying the title
print("Title of the website is : ")
print( br.title())

输出：

Title of the website is : 
GeeksforGeeks | A computer science portal for geeks