📜  normalize = true pandas - Python (1)

📅  最后修改于: 2023-12-03 15:17:58.524000             🧑  作者: Mango

Pandas - normalize = True

Introduction

When working with data in pandas, we often pre-process the data to make it fit for analysis. One such pre-processing step is normalization. Normalization is the process of scaling the values of a feature so that they fall within a specific range. This helps in preventing features with very large or very small values from dominating the analysis.

What is normalize=True in pandas?

In Pandas, the normalize parameter is used for normalizing the data. If set to True, it scales all values in each row to the interval [0,1], giving us relative frequencies.

Syntax

The syntax for using the normalize parameter in pandas is as follows:

DataFrame.sum(axis=None, skipna=None, level=None, numeric_only=None, normalize=None)
Example
import pandas as pd

data = {'city': ['New York', 'New York', 'Boston', 'Boston', 'Atlanta', 'Atlanta'],
        'temperature': [25, 28, 21, 25, 30, 35],
        'humidity': [60, 65, 70, 75, 80, 85]}

df = pd.DataFrame(data)
print("The dataframe before normalization:")
print(df)

# Normalize the temperature and humidity columns
normalized_df = (df[['temperature', 'humidity']] - df[['temperature', 'humidity']].min()) / (df[['temperature', 'humidity']].max() - df[['temperature', 'humidity']].min())

print("\nThe dataframe after normalization (using manual calculation):")
print(normalized_df)

# Using normalize=True in pandas
normalized_df = df[['temperature', 'humidity']].div(df[['temperature', 'humidity']].sum(axis=1), axis=0)
normalized_df['city'] = df['city']

print("\nThe dataframe after normalization using normalize=True")
print(normalized_df)

Output:

The dataframe before normalization:
       city  temperature  humidity
0  New York           25        60
1  New York           28        65
2    Boston           21        70
3    Boston           25        75
4   Atlanta           30        80
5   Atlanta           35        85

The dataframe after normalization (using manual calculation):
   temperature  humidity
0     0.333333  0.000000
1     0.500000  0.333333
2     0.000000  0.666667
3     0.333333  1.000000
4     0.666667  0.000000
5     1.000000  0.333333

The dataframe after normalization using normalize=True
   temperature  humidity      city
0     0.294118  0.705882  New York
1     0.430769  0.569231  New York
2     0.230769  0.769231    Boston
3     0.250000  0.750000    Boston
4     0.272727  0.727273   Atlanta
5     0.291667  0.708333   Atlanta
Conclusion

Normalizing data is an essential pre-processing task when performing data analysis. Pandas provides several ways to normalize data, and the normalize parameter is one such way. By setting it to True, we can scale all values in each row to the interval [0,1], giving us relative frequencies.