📜  pandas filter non nan - Python (1)

📅  最后修改于: 2023-12-03 14:45:02.559000             🧑  作者: Mango

Pandas Filter Non-NaN - Python

In data analysis and manipulation, it is common to deal with missing values, represented as NaN (Not a Number) in pandas dataframes. Filtering non-NaN values is a frequent operation that allows you to remove rows or columns containing missing data. In this guide, we will explore different methods to filter non-NaN values using pandas in Python.

Data Setup

To begin, let's first create a pandas dataframe with some missing values.

import pandas as pd
import numpy as np

data = {
    'A': [1, 2, np.nan, 4, np.nan],
    'B': ['foo', np.nan, 'bar', np.nan, 'baz'],
    'C': [np.nan, 'qux', 'quux', 'quuz', np.nan]
}

df = pd.DataFrame(data)

The resulting dataframe df looks like this:

| | A | B | C | |----|----|------|------| | 0 | 1 | foo | NaN | | 1 | 2 | NaN | qux | | 2 |NaN | bar | quux | | 3 | 4 | NaN | quuz | | 4 |NaN | baz | NaN |

Filtering non-NaN Rows

To filter rows based on the presence of non-NaN values in any column, you can use the dropna() method.

filtered_rows = df.dropna()

The filtered_rows dataframe will contain only the rows with no NaN values:

| | A | B | C | |----|----|----|------| | 0 | 1 |foo | NaN | | 3 | 4 |NaN | quuz |

Filtering non-NaN Columns

To filter columns based on the presence of non-NaN values in any row, you can use the dropna() method along with the axis parameter set to 1.

filtered_columns = df.dropna(axis=1)

The filtered_columns dataframe will contain only the columns with no NaN values:

| | B | |----|------| | 0 | foo | | 1 | NaN | | 2 | bar | | 3 | NaN | | 4 | baz |

Filtering non-NaN Values in Specific Columns

If you want to filter the values in specific columns for non-NaN values, you can use the notnull() method along with boolean indexing.

filtered_values = df[df['A'].notnull() & df['B'].notnull()]

The filtered_values dataframe will contain only the rows where both columns A and B have non-NaN values:

| | A | B | C | |----|----|-----|------| | 0 | 1 |foo | NaN |

Conclusion

Filtering non-NaN values is an important step in data preprocessing for further analysis or model training. In this guide, we explored different methods to filter non-NaN values in pandas dataframes. We learned how to filter rows, columns, and specific values. Using these techniques, you can efficiently handle missing data and ensure the quality of your data analysis results.

Make sure to refer to the Pandas documentation for additional information and more advanced techniques.