📅  最后修改于: 2023-12-03 15:03:28.439000             🧑  作者: Mango
Pandas is a powerful data manipulation tool for Python. It provides a wide range of functions that can be used to transform, filter, and aggregate data. One common task in data preprocessing is to normalize the data, which means scaling data to a common range. In this tutorial, we will demonstrate how to normalize a Pandas dataframe in Python.
Normalization is a process of rescaling data to a common range. Normalization techniques can be used to transform a dataset to fit within a range of 0-1 or -1 to 1 scale. Two common methods for normalization are Min-Max normalization and Z-score normalization.
Min-Max normalization: This method scales the data to a range of 0 to 1. The formula to calculate Min-Max normalization is:
X' = (X - Xmin) / (Xmax - Xmin)
Z-score normalization: This method scales the data to have a mean of 0 and a standard deviation of 1. The formula to calculate Z-score normalization is:
X' = (X - mean(X)) / std(X)
In Pandas, we can use the apply
function to apply a normalization function to a dataframe. Let's start by creating a sample dataframe:
import pandas as pd
# sample dataframe
df = pd.DataFrame({
'A': [10, 20, 30, 40],
'B': [0, 5, 10, 15],
'C': [23, 26, 29, 32]
})
print(df)
Output:
A B C
0 10 0 23
1 20 5 26
2 30 10 29
3 40 15 32
We can now create a normalization function for our dataframe. Let's use the Min-Max normalization method:
# min-max normalization function
def min_max_normalize(x):
return (x - x.min()) / (x.max() - x.min())
# apply min-max normalization to dataframe
df_normalized = df.apply(min_max_normalize)
print(df_normalized)
Output:
A B C
0 0.00 0.00 0.0
1 0.25 0.25 0.5
2 0.50 0.50 0.8
3 1.00 1.00 1.0
We can see that the values in the dataframe are now scaled to a range of 0 to 1.
Similarly, we can create a normalization function for Z-score normalization method:
# z-score normalization function
def z_score_normalize(x):
return (x - x.mean()) / x.std()
# apply z-score normalization to dataframe
df_normalized = df.apply(z_score_normalize)
print(df_normalized)
Output:
A B C
0 -1.341641 -1.341641 -1.341641
1 -0.447214 -0.447214 0.447214
2 0.447214 0.447214 1.341641
3 1.341641 1.341641 1.341641
We can see that the values in the dataframe are now scaled to have a mean of 0 and a standard deviation of 1.
In this tutorial, we demonstrated how to normalize a Pandas dataframe in Python. We used the apply
function to apply a normalization function to a dataframe. We discussed two common normalization methods: Min-Max normalization and Z-score normalization. Understanding normalization techniques is important in data preprocessing to ensure that data is in a common range and can be used for analysis.