使用 Matplotlib 和 Seaborn 在Python中进行数据可视化
- 更轻松地表示强制数据
- 突出表现好的和差的领域
- 探索数据点之间的关系
- 即使对于更大的数据点也能识别数据模式
- 在构建可视化时确保适当使用形状、颜色和大小
- 使用坐标系的绘图/图形更明显
- 关于数据类型的合适图的知识使信息更加清晰
- 标签、标题、图例和指针的使用向更广泛的受众传递无缝信息
有很多Python库可用于构建可视化,如matplotlib、vispy、bokeh、seaborn、pygal、folium、plotly、cufflinks和networkx 。其中, matplotlib和seaborn似乎非常广泛地用于基础到中级的可视化。
它是一个令人惊叹的Python中用于二维数组绘图的可视化库,它是一个多平台数据可视化库,构建在NumPy数组上,旨在与更广泛的SciPy堆栈一起使用。它是由 John Hunter 在 2002 年引入的。让我们尝试了解matplotlib的一些好处和特性
- 它快速、高效,因为它基于numpy并且更容易构建
- 自成立以来,已经经历了开源社区的大量改进,因此也是一个具有高级功能的更好的库
- 维护良好的具有高质量图形的可视化输出吸引了大量用户
- 可以非常轻松地构建基本和高级图表
- 从用户/开发者的角度来看,由于它拥有庞大的社区支持,解决问题和调试变得更加容易
- 内置主题有助于更好的可视化
- 统计功能有助于获得更好的数据洞察力
- 更好的美学和内置的情节
- 有用的文档和有效的例子
- 单变量图(只涉及一个变量)
- 双变量图(需要一个以上的变量)
Syntax: seaborn.boxplot(x=None, y=None, hue=None, data=None, order=None, hue_order=None, orient=None, color=None, palette=None, saturation=0.75, width=0.8, dodge=True, fliersize=5, linewidth=None, whis=1.5, ax=None, **kwargs)
x, y, hue: Inputs for plotting long-form data.
data: Dataset for plotting. If x and y are absent, this is interpreted as wide-form.
color: Color for all of the elements.
Returns: It returns the Axes object with the plot drawn onto it.
- 最小值显示在图表的最左侧,左侧“晶须”的末端
- 第一个四分位数 Q1 是盒子的最左边(左须)
- 中位数显示为框中心的一条线
- 第三个四分位数 Q3,显示在框的最右侧(右须)
- 最大值位于框的最右侧
# import required modules
import matplotlib as plt
import seaborn as sns
# Box plot and violin plot for Outcome vs BloodPressure
_, axes = plt.subplots(1, 2, sharey=True, figsize=(10, 4))
# box plot illutration
sns.boxplot(x='Outcome', y='BloodPressure', data=diabetes, ax=axes[0])
# violin plot illustration
sns.violinplot(x='Outcome', y='BloodPressure', data=diabetes, ax=axes[1])
# Box plot for all the numerical variables
sns.set(rc={'figure.figsize': (16, 5)})
# multiple box plot illustration
features = ['BloodPressure', 'SkinThickness']
diabetes[features].hist(figsize=(10, 4))
# import required module
import seaborn as sns
# assign required values
_, axes = plt.subplots(nrows=1, ncols=2, figsize=(12, 4))
# illustrate count plots
sns.countplot(x='Outcome', data=diabetes, ax=axes[0])
sns.countplot(x='BloodPressure', data=diabetes, ax=axes[1])
# Finding and plotting the correlation for
# the independent variables
# import required module
import seaborn as sns
# adjust plot
sns.set(rc={'figure.figsize': (14, 5)})
# assign data
ind_var = ['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM',
'AGE', 'DIS', 'RAD', 'TAX', 'PTRATIO', 'B', 'LSTAT']
# illustrate heat map.
cmap=sns.cubehelix_palette(20, light=0.95, dark=0.15))
# import required module
import seaborn as sns
import numpy as np
# assign data
data = np.random.randn(50, 20)
# illustrate heat map
ax = sns.heatmap(data, xticklabels=2, yticklabels=False)
# import required module
import matplotlib.pyplot as plt
# Creating dataset
cars = ['AUDI', 'BMW', 'FORD', 'TESLA', 'JAGUAR', 'MERCEDES']
data = [23, 17, 35, 29, 12, 41]
# Creating plot
fig = plt.figure(figsize=(10, 7))
plt.pie(data, labels=cars)
# Show plot
# Import required module
import matplotlib.pyplot as plt
import numpy as np
# Creating dataset
cars = ['AUDI', 'BMW', 'FORD', 'TESLA', 'JAGUAR', 'MERCEDES']
data = [23, 17, 35, 29, 12, 41]
# Creating explode data
explode = (0.1, 0.0, 0.2, 0.3, 0.0, 0.0)
# Creating color parameters
colors = ("orange", "cyan", "brown", "grey", "indigo", "beige")
# Wedge properties
wp = {'linewidth': 1, 'edgecolor': "green"}
# Creating autocpt arguments
def func(pct, allvalues):
absolute = int(pct / 100.*np.sum(allvalues))
return "{:.1f}%\n({:d} g)".format(pct, absolute)
# Creating plot
fig, ax = plt.subplots(figsize=(10, 7))
wedges, texts, autotexts = ax.pie(data, autopct=lambda pct: func(pct, data), explode=explode, labels=cars,
shadow=True, colors=colors, startangle=90, wedgeprops=wp,
# Adding legend
ax.legend(wedges, cars, title="Cars", loc="center left",
bbox_to_anchor=(1, 0, 0.5, 1))
plt.setp(autotexts, size=8, weight="bold")
ax.set_title("Customizing pie chart")
# Show plot
# Import required module
import matplotlib.pyplot as plt
import numpy as np
# Assign axes
x = np.linspace(0,5.5,10)
y = 10*np.exp(-x)
# Assign errors regarding each axis
xerr = np.random.random_sample(10)
yerr = np.random.random_sample(10)
# Adjust plot
fig, ax = plt.subplots()
ax.errorbar(x, y, xerr=xerr, yerr=yerr, fmt='-o')
# Assign labels
ax.set_xlabel('x-axis'), ax.set_ylabel('y-axis')
ax.set_title('Line plot with error bars')
# Illustrate error bars
散点图或散点图是一种双变量图,在构建方式上与折线图更相似。折线图使用 XY 轴上的一条线来绘制连续函数,而散点图则依靠点来表示单个数据片段。这些图对于查看两个变量是否相关非常有用。散点图可以是 2 维或 3 维的。
Syntax: seaborn.scatterplot(x=None, y=None, hue=None, style=None, size=None, data=None, palette=None, hue_order=None, hue_norm=None, sizes=None, size_order=None, size_norm=None, markers=True, style_order=None, x_bins=None, y_bins=None, units=None, estimator=None, ci=95, n_boot=1000, alpha=’auto’, x_jitter=None, y_jitter=None, legend=’brief’, ax=None, **kwargs)
x, y: Input data variables that should be numeric.
data: Dataframe where each column is a variable and each row is an observation.
size: Grouping variable that will produce points with different sizes.
style: Grouping variable that will produce points with different markers.
palette: Grouping variable that will produce points with different markers.
markers: Object determining how to draw the markers for different levels.
alpha: Proportional opacity of the points.
Returns: This method returns the Axes object with the plot drawn onto it.
- 显示变量之间的相关性
- 适用于大数据集
- 更容易找到数据集群
- 更好地表示每个数据点
箱是构建直方图的组成部分,它们控制范围内的数据点。作为一种广泛接受的选择,我们通常将 bin 的大小限制为 5-20,但这完全取决于存在的数据点。
Syntax : seaborn.countplot(x=None, y=None, hue=None, data=None, order=None, hue_order=None, orient=None, color=None, palette=None, saturation=0.75, dodge=True, ax=None, **kwargs)
Parameters : This method is accepting the following parameters that are described below:
- x, y: This parameter take names of variables in data or vector data, optional, Inputs for plotting long-form data.
- hue : (optional) This parameter take column name for colour encoding.
- data : (optional) This parameter take DataFrame, array, or list of arrays, Dataset for plotting. If x and y are absent, this is interpreted as wide-form. Otherwise it is expected to be long-form.
- order, hue_order : (optional) This parameter take lists of strings. Order to plot the categorical levels in, otherwise the levels are inferred from the data objects.
- orient : (optional)This parameter take “v” | “h”, Orientation of the plot (vertical or horizontal). This is usually inferred from the dtype of the input variables but can be used to specify when the “categorical” variable is a numeric or when plotting wide-form data.
- color : (optional) This parameter take matplotlib color, Color for all of the elements, or seed for a gradient palette.
- palette : (optional) This parameter take palette name, list, or dict, Colors to use for the different levels of the hue variable. Should be something that can be interpreted by color_palette(), or a dictionary mapping hue levels to matplotlib colors.
- saturation : (optional) This parameter take float value, Proportion of the original saturation to draw colors at. Large patches often look better with slightly desaturated colors, but set this to 1 if you want the plot colors to perfectly match the input color spec.
- dodge : (optional) This parameter take bool value, When hue nesting is used, whether elements should be shifted along the categorical axis.
- ax : (optional) This parameter take matplotlib Axes, Axes object to draw the plot onto, otherwise uses the current Axes.
- kwargs : This parameter take key, value mappings, Other keyword arguments are passed through to matplotlib.axes.Axes.bar().
Returns: Returns the Axes object with the plot drawn onto it.
它只是根据某种类型的类别显示项目出现的次数。在Python中,我们可以使用seaborn库创建一个 couplot。 Seaborn是Python中的一个模块,它构建在matplotlib之上,用于绘制具有视觉吸引力的统计图。
可以计算两个变量之间的相关性,也可以是一对多相关性,我们可以看到下图。相关性可以是正的、负的或中性的,相关性的数学范围是从 -1 到 1。了解相关性可能对模型构建阶段和理解模型输出产生非常显着的影响。
seaborn.heatmap(data, *, vmin=None, vmax=None, cmap=None, center=None, robust=False, annot=None, fmt=’.2g’, annot_kws=None, linewidths=0, linecolor=’white’, cbar=True, cbar_kws=None, cbar_ax=None, square=False, xticklabels=’auto’, yticklabels=’auto’, mask=None, ax=None, **kwargs)
Parameters : This method is accepting the following parameters that are described below:
- x, y: This parameter take names of variables in data or vector data, optional, Inputs for plotting long-form data.
- hue : (optional) This parameter take column name for colour encoding.
- data : (optional) This parameter take DataFrame, array, or list of arrays, Dataset for plotting. If x and y are absent, this is interpreted as wide-form. Otherwise it is expected to be long-form.
- color : (optional) This parameter take matplotlib color, Color for all of the elements, or seed for a gradient palette.
- palette : (optional) This parameter take palette name, list, or dict, Colors to use for the different levels of the hue variable. Should be something that can be interpreted by color_palette(), or a dictionary mapping hue levels to matplotlib colors.
- ax : (optional) This parameter take matplotlib Axes, Axes object to draw the plot onto, otherwise uses the current Axes.
- kwargs : This parameter take key, value mappings, Other keyword arguments are passed through to matplotlib.axes.Axes.bar().
Returns: Returns the Axes object with the plot drawn onto it.
# import required module
import seaborn as sns
import numpy as np
# assign data
data = np.random.randn(50, 20)
# illustrate heat map
ax = sns.heatmap(data, xticklabels=2, yticklabels=False)
Syntax: matplotlib.pyplot.pie(data, explode=None, labels=None, colors=None, autopct=None, shadow=False)
data represents the array of data values to be plotted, the fractional area of each slice is represented by data/sum(data). If sum(data)<1, then the data values returns the fractional area directly, thus resulting pie will have empty wedge of size 1-sum(data).
labels is a list of sequence of strings which sets the label of each wedge.
color attribute is used to provide color to the wedges.
autopct is a string used to label the wedge with their numerical value.
shadow is used to create shadow of wedge.
- 更轻松地对大数据点进行可视化汇总
- 不同类的效果和大小很容易理解
- 百分比点用于表示数据点中的类别
- 可以轻松捕获数据点与阈值的偏差
- 轻松捕获大量数据点的偏差
- 它定义了底层数据
