使用 Squarify 在Python中创建树形图
数据可视化是一种通过图形表示来分析大型数据集的强大技术。 Python提供了各种支持数据图形表示的模块。广泛使用的模块是 Matplotlib、Seaborn 和 Plotly。我们还有一个名为 Squarify 的模块,主要用于绘制 Treemap。
何时使用 Squarify?
这里的问题是何时使用 Squarify 而不是为什么使用。因为Python已经有 2 到 3 个数据可视化模块来完成大部分任务。当您必须绘制 Treemap 时,Squarify 是最合适的。树形图将分层数据显示为一组基于嵌套正方形/矩形的可视化。
Squarify 是一个不错的选择:
- 绘制大量数据。
- 条形图不能有效地处理和可视化大数据,因此使用了树形图,而 Squarify 开始发挥作用。
- 通过向它们提供标签来绘制每个部分和整体之间的比例。
- 显示层次结构中每个类别级别的度量分布模式。
- 使用大小和颜色编码显示属性。
- 发现模式、异常值、最重要的贡献者和异常。
使用 Squarify 绘制树形图
当数据集以分层顺序结构化并具有具有根、分支和节点的树形布局时,树形图是一种适当的可视化类型。它使我们能够在有限的空间内以非常有效的方式显示有关重要数据量的信息。
我们现在将使用 Squarify 绘制树形图。使用 pip install module_name 安装模块。
pip install squarify
导入必要的模块。
Python3
import squarify
import matplotlib.pyplot as plt
Python3
squarify.plot(sizes=[1, 2, 3, 4, 5],
color="yellow")
Python3
data = [300, 400, 120, 590, 600, 760]
colors = ["red", "black", "green",
"violet", "yellow", "blue"]
squarify.plot(sizes=data, color=colors)
plt.axis("off")
Python3
import seaborn as sb
data = [300, 400, 120, 590, 600, 760]
squarify.plot(sizes=data,
color=sb.color_palette("Spectral",
len(data)))
plt.axis("off")
Python3
data = [300,400,720,213]
colors = ["red","black","green","violet"]
squarify.plot(sizes=data,color=colors,alpha=0.8)
plt.axis("off")
Python3
data = [300,400,720,213]
colors = ["red","black","green","violet"]
squarify.plot(sizes=data,color=colors,alpha=0.3)
plt.axis("off")
Python3
data = [100, 20, 50, 1000]
colors = ["red", "yellow", "blue", "green"]
squarify.plot(sizes=data, color=colors)
Python3
data = [100, 20, 50, 1000]
colors = ["red", "yellow", "blue", "green"]
squarify.plot(sizes=data, norm_x=1000,
norm_y=10, color=colors)
Python3
episode_data = [1004, 720, 366, 360, 80]
anime_names = ["One Piece", "Naruto", "Bleach",
"Gintama", "Attack On Titan"]
squarify.plot(episode_data, label=anime_names)
plt.axis("off")
Python3
squarify.plot(episode_data, label=anime_names, pad=2)
plt.axis("off")
Python3
# import required modules
import pandas as pd
import squarify
import matplotlib.pyplot as plt
import seaborn as sb
# read the dataset and create a DataFrame
dataset = pd.read_csv("pokemons dataset.csv")
df = pd.DataFrame(dataset)
# select top 20 pokemons from 3 columns
# and sort them by Total Strength
top20_pokemon = df.loc[:, ["Name", "Total",
'Primary Type']].sort_values(
by="Total", ascending=False)[:20]
# create a plot figure with figsize
plt.figure(figsize=(12, 6))
# we don't require the axis values so lets remove it
plt.axis("off")
axis = squarify.plot(top20_pokemon['Primary Type'].value_counts(),
label=top20_pokemon['Primary Type'].value_counts().index,
color=sb.color_palette("tab20", len(
top20_pokemon['Primary Type'].value_counts())),
pad=1,
text_kwargs={'fontsize': 18})
axis.set_title("Primary Data Types Of Top 20 Pokemons", fontsize=24)
阴谋
绘图是您可以使用 Squarify 创建 Treemap 的方法。 Squarify 将大小作为第一个参数,并且还支持许多我们将逐个查看的功能。最初, plot 方法绘制一个尺寸为 100×100 的正方形。
Python3
squarify.plot(sizes=[1, 2, 3, 4, 5],
color="yellow")
输出:
颜色
为了使情节更有吸引力,我们将改变情节的颜色。我们可以通过两种方式更改图表的颜色:
- 颜色列表
- 调色板
方法 1:我们将传递一个带有颜色名称的列表,它可能与数据的长度匹配也可能不匹配。如果您的颜色列表小于数据长度,则会重复相同的颜色。
Python3
data = [300, 400, 120, 590, 600, 760]
colors = ["red", "black", "green",
"violet", "yellow", "blue"]
squarify.plot(sizes=data, color=colors)
plt.axis("off")
输出:
方法2:我们将导入Python Seaborn模块并选择调色板方法。
Syntax: seaborn.color_palette(type,total_colors_required)
#total_colors_required should be integer
#you can choose any type from this list:
“””
‘Accent’, ‘Accent_r’, ‘Blues’, ‘Blues_r’, ‘BrBG’, ‘BrBG_r’, ‘BuGn’, ‘BuGn_r’, ‘BuPu’, ‘BuPu_r’, ‘CMRmap’, ‘CMRmap_r’, ‘Dark2’, ‘Dark2_r’, ‘GnBu’, ‘GnBu_r’, ‘Greens’, ‘Greens_r’, ‘Greys’, ‘Greys_r’, ‘OrRd’, ‘OrRd_r’, ‘Oranges’, ‘Oranges_r’, ‘PRGn’, ‘PRGn_r’, ‘Paired’, ‘Paired_r’, ‘Pastel1’, ‘Pastel1_r’, ‘Pastel2’, ‘Pastel2_r’, ‘PiYG’, ‘PiYG_r’, ‘PuBu’, ‘PuBuGn’, ‘PuBuGn_r’, ‘PuBu_r’, ‘PuOr’, ‘PuOr_r’, ‘PuRd’, ‘PuRd_r’, ‘Purples’, ‘Purples_r’, ‘RdBu’, ‘RdBu_r’, ‘RdGy’, ‘RdGy_r’, ‘RdPu’, ‘RdPu_r’, ‘RdYlBu’, ‘RdYlBu_r’, ‘RdYlGn’, ‘RdYlGn_r’, ‘Reds’, ‘Reds_r’, ‘Set1’, ‘Set1_r’, ‘Set2’, ‘Set2_r’, ‘Set3’, ‘Set3_r’, ‘Spectral’, ‘Spectral_r’, ‘Wistia’, ‘Wistia_r’, ‘YlGn’, ‘YlGnBu’, ‘YlGnBu_r’, ‘YlGn_r’, ‘YlOrBr’, ‘YlOrBr_r’, ‘YlOrRd’, ‘YlOrRd_r’, ‘afmhot’, ‘afmhot_r’, ‘autumn’, ‘autumn_r’, ‘binary’, ‘binary_r’, ‘bone’, ‘bone_r’, ‘brg’, ‘brg_r’, ‘bwr’, ‘bwr_r’, ‘cividis’, ‘cividis_r’, ‘cool’, ‘cool_r’, ‘coolwarm’, ‘coolwarm_r’, ‘copper’, ‘copper_r’, ‘crest’, ‘crest_r’, ‘cubehelix’, ‘cubehelix_r’, ‘flag’, ‘flag_r’, ‘flare’, ‘flare_r’, ‘gist_earth’, ‘gist_earth_r’, ‘gist_gray’, ‘gist_gray_r’, ‘gist_heat’, ‘gist_heat_r’, ‘gist_ncar’, ‘gist_ncar_r’, ‘gist_rainbow’, ‘gist_rainbow_r’, ‘gist_stern’, ‘gist_stern_r’, ‘gist_yarg’, ‘gist_yarg_r’, ‘gnuplot’, ‘gnuplot2’, ‘gnuplot2_r’, ‘gnuplot_r’, ‘gray’, ‘gray_r’, ‘hot’, ‘hot_r’, ‘hsv’, ‘hsv_r’, ‘icefire’, ‘icefire_r’, ‘inferno’, ‘inferno_r’, ‘jet’, ‘jet_r’, ‘magma’, ‘magma_r’, ‘mako’, ‘mako_r’, ‘nipy_spectral’, ‘nipy_spectral_r’, ‘ocean’, ‘ocean_r’, ‘pink’, ‘pink_r’, ‘plasma’, ‘plasma_r’, ‘prism’, ‘prism_r’, ‘rainbow’, ‘rainbow_r’, ‘rocket’, ‘rocket_r’, ‘seismic’, ‘seismic_r’, ‘spring’, ‘spring_r’, ‘summer’, ‘summer_r’, ‘tab10’, ‘tab10_r’, ‘tab20’, ‘tab20_r’, ‘tab20b’, ‘tab20b_r’, ‘tab20c’, ‘tab20c_r’, ‘terrain’, ‘terrain_r’, ‘turbo’, ‘turbo_r’, ‘twilight’, ‘twilight_r’, ‘twilight_shifted’, ‘twilight_shifted_r’, ‘viridis’, ‘viridis_r’, ‘vlag’, ‘vlag_r’, ‘winter’, ‘winter_r’
“””
Python3
import seaborn as sb
data = [300, 400, 120, 590, 600, 760]
squarify.plot(sizes=data,
color=sb.color_palette("Spectral",
len(data)))
plt.axis("off")
输出:
Α
alpha 参数用于改变图像的不透明度。它可以是 0 到 1 范围内的整数或浮点值。1 附近的 alpha 值具有较高的不透明度,而 0 附近的 alpha 值具有较低的不透明度。
Python3
data = [300,400,720,213]
colors = ["red","black","green","violet"]
squarify.plot(sizes=data,color=colors,alpha=0.8)
plt.axis("off")
输出:
在这里,我们将看到较低的 alpha 值。
Python3
data = [300,400,720,213]
colors = ["red","black","green","violet"]
squarify.plot(sizes=data,color=colors,alpha=0.3)
plt.axis("off")
输出:
缩放图表
Scale 用于更改图表的范围,默认情况下,绘图的范围为 100×100。使用 norm_x 您可以缩放 x 轴数据,而 norm_y 您可以缩放 y 轴。
Python3
data = [100, 20, 50, 1000]
colors = ["red", "yellow", "blue", "green"]
squarify.plot(sizes=data, color=colors)
输出:
使用两个轴进行缩放。
Python3
data = [100, 20, 50, 1000]
colors = ["red", "yellow", "blue", "green"]
squarify.plot(sizes=data, norm_x=1000,
norm_y=10, color=colors)
输出:
标签
没有标签的 Treemap 只是一个没有意义的盒子。标签为树形图的划分增加了意义,并表示特定的图代表什么。您可以通过添加额外的参数 text_kwargs 来增加标签的字体大小。
Python3
episode_data = [1004, 720, 366, 360, 80]
anime_names = ["One Piece", "Naruto", "Bleach",
"Gintama", "Attack On Titan"]
squarify.plot(episode_data, label=anime_names)
plt.axis("off")
输出:
填充
填充采用整数值,用于在树形图之间添加空格以进行适当的可视化。
Python3
squarify.plot(episode_data, label=anime_names, pad=2)
plt.axis("off")
输出:
使用 Squarify 在真实数据集上构建树形图
我们现在将了解如何在真实数据集上实现 Treemap。您可以从 https://www.kaggle.com/hamdallak/the-world-of-pokemons 下载数据集。在下面的代码中,我们选取了前 20 个口袋妖怪,并根据前 20 个口袋妖怪的主要类型创建了一个树形图。
Python3
# import required modules
import pandas as pd
import squarify
import matplotlib.pyplot as plt
import seaborn as sb
# read the dataset and create a DataFrame
dataset = pd.read_csv("pokemons dataset.csv")
df = pd.DataFrame(dataset)
# select top 20 pokemons from 3 columns
# and sort them by Total Strength
top20_pokemon = df.loc[:, ["Name", "Total",
'Primary Type']].sort_values(
by="Total", ascending=False)[:20]
# create a plot figure with figsize
plt.figure(figsize=(12, 6))
# we don't require the axis values so lets remove it
plt.axis("off")
axis = squarify.plot(top20_pokemon['Primary Type'].value_counts(),
label=top20_pokemon['Primary Type'].value_counts().index,
color=sb.color_palette("tab20", len(
top20_pokemon['Primary Type'].value_counts())),
pad=1,
text_kwargs={'fontsize': 18})
axis.set_title("Primary Data Types Of Top 20 Pokemons", fontsize=24)
输出: