使用 BeautifulSoup 导航

BeautifulSoup是一个用于解析 HTML 和 XML 文档的Python包，它为解析的分页创建一个解析树，可用于网页抓取，它从 HTML 和 XML 文件中提取数据，并与您最喜欢的解析器配合使用以提供惯用的导航方式、搜索和修改解析树。

安装

这个模块没有内置于Python。要安装此类型，请在终端中输入以下命令。

pip install bs4

使用 BeautifulSoup 导航

下面的代码片段是我们将使用的 HTML 文档，以使用此代码片段作为参考使用 BeautifulSoup 标签进行导航。

Python3

ht_doc = """
 
Geeks For Geeks
 

 
most viewed courses in GFG,its all free
 
 
 
 
Top 5 Popular Programming Languages
 
 
 
 
Java
c/c++
Python
Javascript
Ruby
 
 
 
 
according to an online survey. 
 
 
 
 
 Programming Languages
 
 
 
 

 
"""

Python3

soup = BeautifulSoup(ht_doc, 'html.parser')
print(soup.head)
print(soup.title)

Python3

soup = BeautifulSoup(ht_doc, 'html.parser')
 
# retrieving b tag element
print(soup.body.b)
 
# retrieving a tag leement from BeautifulSoup assigned variable
print(soup.a)
 
# retrieving all elements tagged with a in ht_doc
print(soup.find_all("a"))

Python3

soup = BeautifulSoup(ht_doc, 'html.parser')
 
# assigning head tag of BeautifulSoup variable
hTag = soup.head
print(hTag)
 
# retrieving contents of BeautifulSoup variable
print(hTag.contents)

Python3

# embedding html document inyto BeautifulSoup variable
soup = BeautifulSoup(ht_doc, 'html.parser')
 
# assigning head element of BeautifulSoup-assigned Variable
htag=soup.head
 
# iterating through child in descendants of htag variable
for child in htag.descendants:
    print(child)

Python3

soup = BeautifulSoup(ht_doc, 'html.parser')
htag = soup.head
print(htag.string)

Python3

soup = BeautifulSoup(ht_doc, 'html.parser')
for string in soup.strings :
    print(repr(string))

Python3

# embedding HTML document in BeautifulSoup-assigned variable
soup = BeautifulSoup(ht_doc, 'html.parser')
 
# iterating through string in stripped_strings of
# BeautifulSoup assigned variable
for string in soup.stripped_strings :
    print(repr(string))

Python3

ht_doc = """
Geeks For Geeks

most viewed courses in GFG,its all free
 
 
 
Top 5 Popular Programming Languages
 
 
 
Java
c/c++
Python
Javascript
Ruby
according to an online survey. 
 Programming Languages
 
 
 

"""
from bs4 import BeautifulSoup
soup = BeautifulSoup(ht_doc, 'html.parser')
 
# embedding html document
Itag = soup.title
 
# assigning title tag of BeautifulSoup-assigned variable
# to print parent element in Itag variable
print(Itag.parent)
htmlTag = soup.html
print(type(htmlTag.parent))
print(soup.parent)

Python3

# embedding html doc into BeautifulSoup
soup = BeautifulSoup(ht_doc, 'html.parser')
 
# embedding a tag into link variable
link = soup.a
print(link)
 
# iterating through parent in link variable
for parent in link.parents :
     
    # printing statement for Parent is empty case
    if parent is None :
        print(parent)
    else :
        print(parent.name)

Python3

from bs4 import BeautifulSoup
sibling_soup = BeautifulSoup("Geeks For GeeksThe \
Biggest Online Tutorials Library, It's all Free")
 
# to retrieve next sibling of b tag
print(sibling_soup.b.next_sibling)
 
# for retrieving previous sibling of c tag
print(sibling_soup.c.previous_sibling)

现在让我们通过在上面的代码片段上应用Python中的 BeautifulSoup 以所有可能的方式导航，Html 文档中最重要的组件是标签，它也可能包含其他标签/字符串（标签的子标签）。 BeautifulSoup 提供了不同的方式来迭代这些孩子，让我们看看所有可能的情况

向下导航

使用标签名称导航：

示例 1：获取 Head Tag。

使用 .head 到 BeautifulSoup 对象来获取 HTML 文档中的 head 标签。

Syntax : (BeautifulSoup Variable).head

编程需要懂一点英语

示例 2：获取标题标签

使用 .title 标签检索嵌入在 BeautifulSoup 变量中的 HTML 文档的标题

Syntax : (BeautifulSoup Variable).title

编程需要懂一点英语

代码：

蟒蛇3

soup = BeautifulSoup(ht_doc, 'html.parser')
print(soup.head)
print(soup.title)

输出：

Geeks For Geeks
Geeks For Geeks

示例 3：获取特定标签。

我们可以检索一些特定的标签，例如 body 标签中的第一个标签

Syntax : (BeautifulSoup Variable).body.b

编程需要懂一点英语

使用标签名称作为属性将使您获得该名称的名字

Syntax: (BeautifulSoup Variable).(tag attribute)

编程需要懂一点英语

通过使用find_all，我们可以得到与该属性关联的所有内容

Syntax: (BeautifulSoup Variable).find_all(tag value)

编程需要懂一点英语

代码：

蟒蛇3

soup = BeautifulSoup(ht_doc, 'html.parser') # retrieving b tag element print(soup.body.b) # retrieving a tag leement from BeautifulSoup assigned variable print(soup.a) # retrieving all elements tagged with a in ht_doc print(soup.find_all("a"))

输出：

most viewed courses in GFG,its all free

Java

[Java,

c/c++,

Python,

Javascript,

Ruby]

编程需要懂一点英语

示例 4：内容和 .children

我们可以使用 .contents 获取列表中的标签子项。

Syntax: (BeautifulSoup Variable).contents

编程需要懂一点英语
代码：
蟒蛇3
soup = BeautifulSoup(ht_doc, 'html.parser') # assigning head tag of BeautifulSoup variable hTag = soup.head print(hTag) # retrieving contents of BeautifulSoup variable print(hTag.contents)
输出：
Geeks For Geeks [Geeks For Geeks]
示例 5 ：.后代
.descendants 属性允许您遍历标签的所有子项，递归地 - 它的直接子项和其直接子项的子项等等......
Syntax: (Variable assigned from BeautifulSoup Variable).descendants

编程需要懂一点英语
代码：
蟒蛇3
# embedding html document inyto BeautifulSoup variable soup = BeautifulSoup(ht_doc, 'html.parser') # assigning head element of BeautifulSoup-assigned Variable htag=soup.head # iterating through child in descendants of htag variable for child in htag.descendants:     print(child)
输出：

Geeks For Geeks Geeks For Geeks
例 6：。字符串
如果标签只有一个子级，并且该子级是 NavigableString，则子级作为可用。字符串
但是，如果一个标签包含不止一个东西，那么不清楚是什么。字符串应该是指，所以 . 字符串被定义为 None，我们可以在下面的代码中看到这个实际工作。
蟒蛇3
soup = BeautifulSoup(ht_doc, 'html.parser') htag = soup.head print(htag.string)
输出：
Geeks For Geeks
例 7：。字符串和 stripped_strings
如果标签中包含多个内容，您仍然可以只查看字符串。使用。字符串生成器。
蟒蛇3
soup = BeautifulSoup(ht_doc, 'html.parser') for string in soup.strings :     print(repr(string))
输出：
'\n' 'Geeks For Geeks' '\n' '\n' 'most viewed courses in GFG,its all free' '\n' 'Top 5 Popular Programming Languages' '\n' 'Java' '\n' 'c/c++' '\n' 'Python' '\n' 'Javascript' '\n' 'Ruby' '\naccording to an online survey. ' '\n' ' Programming Languages' '\n'
为了去除多余的空格，我们使用 .stripped_strings 生成器：

蟒蛇3
# embedding HTML document in BeautifulSoup-assigned variable soup = BeautifulSoup(ht_doc, 'html.parser') # iterating through string in stripped_strings of # BeautifulSoup assigned variable for string in soup.stripped_strings :     print(repr(string))
输出：
'Geeks For Geeks' 'most viewed courses in GFG,its all free' 'Top 5 Popular Programming Languages' 'Java' 'c/c++' 'Python' 'Javascript' 'Ruby' 'according to an online survey.' 'Programming Languages'
通过 BeautifulSoup 向上导航：
如果我们考虑一个“家谱”类比，每个标签和每个字符串都有一个父级：包含它的标签：
示例 1： .parent。
.parent 标签用于检索元素的父元素
Syntax : (BeautifulSoup Variable).parent

编程需要懂一点英语
代码：
蟒蛇3
ht_doc = """ Geeks For Geeks most viewed courses in GFG,its all free Top 5 Popular Programming Languages Java c/c++ Python Javascript Ruby according to an online survey. Programming Languages """ from bs4 import BeautifulSoup soup = BeautifulSoup(ht_doc, 'html.parser') # embedding html document Itag = soup.title # assigning title tag of BeautifulSoup-assigned variable # to print parent element in Itag variable print(Itag.parent) htmlTag = soup.html print(type(htmlTag.parent)) print(soup.parent)
输出：
Geeks For Geeks None
示例 2： .parents

为了遍历父元素，可以使用 .parents 标签：
Syntax :(BeautifulSoup Variable).parents

编程需要懂一点英语
蟒蛇3
# embedding html doc into BeautifulSoup soup = BeautifulSoup(ht_doc, 'html.parser') # embedding a tag into link variable link = soup.a print(link) # iterating through parent in link variable for parent in link.parents :           # printing statement for Parent is empty case     if parent is None :         print(parent)     else :         print(parent.name)
输出：
Java
body
html
[document]

编程需要懂一点英语
使用 BeautifulSoup 横向导航
.next_sibling 和 .previous_sibling 是用于在解析树相同级别的页面元素之间导航的标签。
Syntax:

(BeautifulSoup Variable).(tag attribute).next_sibling
(BeautifulSoup Variable).(tag attribute).previous_sibling

编程需要懂一点英语
代码：

蟒蛇3

from bs4 import BeautifulSoup
sibling_soup = BeautifulSoup("Geeks For GeeksThe \
Biggest Online Tutorials Library, It's all Free")
 
# to retrieve next sibling of b tag
print(sibling_soup.b.next_sibling)
 
# for retrieving previous sibling of c tag
print(sibling_soup.c.previous_sibling)

输出：

The Biggest Online Tutorials Library, It's all Free
Geeks For Geeks