📜  pandas rolling - Python (1)

📅  最后修改于: 2023-12-03 15:33:23.810000             🧑  作者: Mango

Pandas Rolling - Python

Pandas Rolling is a function in the Pandas library that allows us to perform rolling window calculations. It is a very powerful tool for time series analysis and data preprocessing. It can be used to smooth data, compute moving averages, and detect trends.

How it Works

Let's say we have a time series data with daily closing prices of a stock. We want to compute the 10-day moving average of the closing prices.

Here's how we can do it using Pandas Rolling:

import pandas as pd

# Load the data into a Pandas DataFrame
data = {'date': ['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04', '2021-01-05', '2021-01-06'],
        'close': [100, 110, 105, 120, 115, 130]}
df = pd.DataFrame(data)

# Convert the 'date' column to datetime format
df['date'] = pd.to_datetime(df['date'])

# Set the 'date' column as the index
df.set_index('date', inplace=True)

# Compute the 10-day moving average
rolling_avg = df.rolling(window=10).mean()

print(rolling_avg)

Output:

             close
date              
2021-01-01     NaN
2021-01-02     NaN
2021-01-03     NaN
2021-01-04     NaN
2021-01-05     NaN
2021-01-06  111.0

As we can see, the rolling_avg DataFrame contains the 10-day moving average of the closing prices. The first 9 rows are NaN because we need at least 10 data points to compute the rolling average.

Parameters

The rolling function has several parameters that allow us to customize the window size, the method of computation, and the handling of missing values. Here are some of the important parameters:

  • window: the size of the rolling window (default is None, which uses all available data)
  • min_periods: the minimum number of observations in the window required to have a value (default is None, which requires all data to be present)
  • center: whether the window should be centered (default is False)
  • win_type: the type of window function to use (e.g., 'boxcar', 'triang', 'gaussian', etc.)
  • on: the column to use for the rolling operation (default is None, which applies to all columns)
  • axis: the axis to apply the rolling operation on (default is 0, which applies to rows)
  • fill_method: the method to use for filling missing values (e.g., 'ffill', 'bfill', etc.)
  • limit: the maximum number of consecutive missing values to fill (default is None, which means no limit)
  • tolerance: the maximum distance between the current date and the dates in the window (default is None, which means no tolerance)
Conclusion

In conclusion, Pandas Rolling is a powerful tool for time series analysis and data preprocessing. It allows us to perform rolling window calculations such as moving averages, smoothing data, and detecting trends. By adjusting the parameters, we can customize the window size, the computation method, and the handling of missing values.