📅  最后修改于: 2023-12-03 15:03:28.205000             🧑  作者: Mango
The hist()
function in Pandas DataFrame provides an easy way to visualize the distribution of data in a DataFrame. It generates histograms of the columns of a DataFrame, showing the count of observations that fall within each bin.
DataFrame.hist(column=None, by=None, grid=True, xlabelsize=None, xrot=None, ylabelsize=None, yrot=None, ax=None, sharex=False, sharey=False, figsize=None, layout=None, bins=10, **kwargs)
column
: The column name(s) of the DataFrame to be plotted. If not specified, all numerical columns will be plotted.
by
: Group the data by a categorical column and display the histograms for each group.
grid
: Display the grid on the plot or not. Default: True
.
xlabelsize
: The size of the x-axis label font. Default: None
.
xrot
: The rotation degree of the x-axis labels. Default: None
.
ylabelsize
: The size of the y-axis label font. Default: None
.
yrot
: The rotation degree of the y-axis labels. Default: None
.
ax
: The matplotlib.pyplot.axes object to draw the plot onto. Default: None
(creates a new figure with a default size).
sharex
: Share the x-axis among subplots or not. Default: False
.
sharey
: Share the y-axis among subplots or not. Default: False
.
figsize
: The size of the figure as a tuple of (width, height) in inches. Default: (6.4, 4.8)
.
layout
: The number of rows and columns of the subplot grid. Default: None
.
bins
: The number of histogram bins to be used. Default: 10
.
**kwargs
: Other parameters passed to the underlying hist()
method in matplotlib.
np.ndarray
or list of np.ndarray
: The values of the histogram bins.
list
of matplotlib.artist.Artist
: The corresponding list of artists for each histogram (bars, patches, etc.).
Let's create a DataFrame first:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(1000, 4), columns=['A', 'B', 'C', 'D'])
Now, we can visualize the distribution of each column using simple syntax:
df.hist(bins=20, figsize=(10,8))
We can also plot histograms for specific column(s) by providing the column names:
df[['A', 'B']].hist(bins=20, figsize=(10,8))
If we have a categorical column in our DataFrame, we can plot the histograms for each category using the by
parameter:
df['E'] = np.random.choice(['X', 'Y'], size=(1000,))
df.hist(column='A', by='E', bins=20, figsize=(10,8))
The hist()
function can also be used on grouped data:
g = df.groupby('E')
g.hist(column='A', bins=20, figsize=(10,8))
In this tutorial, we have learned about the hist()
function in Pandas DataFrame. We have seen how it can be used to visualize the distribution of data in a DataFrame. We have seen how we can plot histograms for specific columns, for category-wise data, and for grouped data.